Comparison between short and long reads
Our results showed a similar pattern for the habitat diversity of long and short-reads, corroborating the patterns previous reported10, 11, 25. These similarities support the view that our findings are real and independent of any possible methodological biases introduced by the different markers and platforms.
The importance of soil properties on the diversity and community turnover varied among markers. We acknowledge the different taxonomic coverages of each marker and the limitations of the available databases. For instance, the diversity of the early-diverging fungal lineages Chytridiomycota, Cryptomycota, and Zoopagomycota using 18S is higher and it is in stark contrast with the ITS and COI data. This difference may be the result of either PCR biases or of shortages of the reference databases used. The COI is usually used as barcode for metazoans80, with lower sequence available for fungi. Our COI data showed around 40% of unidentified OTUs25, which could represent at least in part some fungal lineages without public reference sequences. Uneven availability of reference sequences may have had impact on our diversity and community composition results for the various markers used, with the highest effect for the COI results.
The use of short-read fragments (for both 18S and COI) resulted in a higher number of OTUs, for all organisms, than did the long-read technique. Long-read ITS, on the other hand, registered more fungal OTUs even though the total number of OTUs was smaller than for short reads. It is important to stress here that, unlike for the ITS region, for short-reads we used general primers targeting all eukaryotes and not just fungi, such that only a portion of reads belonged to fungi in the 18S and COI datasets. Although the differences in primer design preclude us to reliably identify the “best” marker or sequencing platform choice for fungal assessments in general, we highlight the main advantages and disadvantages of those used here.
On the one hand, we showed that the use of 18S under the Illumina platform provides the overall highest taxonomic coverage. So for studies aiming to compare diversity and community turnover the use of short-reads can be recommended. In economic terms, this is also the more cost-efficient option at the moment. On the other hand, due to the short fragment size of Illumina reads, some OTUs could be potentially misidentified or categorised only at, for example, the family or genus level. For instance, in an earlier study comparing the taxonomic identification of short-read HTS, the choice of the ITS sub-region, ITS1 or ITS2, affected 51% of fungal identifications16. Long-read HTS methods have the potential to identify fungi with higher accuracy, despite recording fewer sequences per sample18. In our data, PacBio registered the highest number of OTUs classified as fungi but the lowest number of total OTUs. This is expected, since PacBio platforms have a small number of reads in total81 and also will not sequence partially degraded DNA. Additionally, long reads have the potential of combining population analysis with environmental data. This is limited with short-reads, which provide a more limited genetic variation for environmental diversity analysis or require the sequencing of several markers for a limited number of target individuals.