4. DISCUSSION

4.1 Assessing the total biodiversity of an ecosystem across the tree of life from extracellular eDNA

The results from this study demonstrate that extracellular eDNA is a genetic repertoire of all the organisms in the ecosystem and PCR-free deep sequencing of extracellular eDNA is a promising approach for total biodiversity assessments in large aquatic ecosystems. By generating one of the deepest shotgun sequencing datasets of extracellular eDNA, this study pushed the limits of biodiversity assessment and detected taxa across the tree of life, including the relatively low abundant non-microbial taxa in the ecosystem. This was possible due to the adaptations at every level of the workflow from sample collection, eDNA extraction, library preparation, sequencing, and bioinformatics. By enriching the extracellular eDNA using a lysis-free protocol rather than using the total eDNA, the potential DNA extraction bias was avoided which may arise due to the differences in lysis efficiencies between cell types from a wide range of taxa (Djurhuus et al., 2017). By eliminating the PCR from the laboratory workflow through PCR-free library preparation methods, very low sequence duplication rates were obtained which otherwise may render a considerable part of the data useless by reducing the effective depth and increasing the PCR-induced artifacts (Kebschull and Zador, 2015). As the probability of detection of low-abundant taxa is determined by the depth of sequencing, the required sequencing depth was estimated by analyzing the library complexity and then the extracellular eDNA libraries were sequenced to the point of saturation. Further, to achieve sensitive taxonomic classifications, two independent taxonomic assignments were derived for each paired-end read using protein-based classification algorithms, and then the lowest common ancestor was calculated. All of these factors together contributed to the successful detection of taxa across the tree of life.
However, the reads of bacterial origin dominated the taxonomic assignments (86.95%) due to their high abundance in aquatic ecosystems. Despite this fact, the family richness of Eukaryotes was higher than Bacteria, possibly due to a large number of eukaryotic families represented in the reference database compared to prokaryotes (Supplementary Fig. 4). Studies in the past could not detect a high diversity of Eukaryotes from shotgun sequencing of total eDNA mainly due to the shallow sequencing depth (22.3 million) and a low percentage of reads assigned to Eukaryota (0.34%) (Stat et al., 2017). We achieved over sixteen-fold more taxonomic assignments to Eukaryota (5.48%) and detected hundreds of families of Protists, Fungi, Plants, and Animals. Particularly, the high diversity of Metazoan families indicates detectable amounts of DNA from non-microbial species in shotgun sequencing data of extracellular eDNA. This opens up the possibility of detecting taxa across the tree of life without using any targeted enrichment techniques such as PCR or hybridization capture that can introduce a bias toward certain taxa (van der Loos and Nijland, 2021). We also showed that statistical extrapolation of taxonomic richness accumulation curves can be used to account for the undetected taxa with very low abundances and estimate the asymptotic richness across the tree of life. The estimates of asymptotic family richness were in line with the expected richness of well-characterized taxa in the ecosystem such as fish. Such estimates of total taxonomic richness can be used to monitor the changes in taxonomic richness across the tree of life over a long period and help in identifying and prioritizing taxa for conservation. Although we did not detect any substantial change in the composition of taxonomic families among the samples, we detected high variation in the relative abundance of the families across space and time. This indicates that the taxonomic families in the ecosystem can remain largely unchanged while their relative abundance may vary in the given spatiotemporal scale. Furthermore, the genome-scale data generated using this approach can also be repurposed for assessing diversity at the gene level, mapping functional traits to specific taxa, inferring species co-occurrence patterns, and linking community changes to ecosystem functioning and services.

4.2 Limitations

The taxonomic resolution achievable through deep sequencing of extracellular eDNA is generally lower compared to approaches targeting a barcoding region in the genome. The taxonomic classification of the extracellular eDNA sequences depends upon the taxonomic resolution of various genomic loci that are stochastically captured, the sensitivity of the algorithm used to detect homology, and the availability of reference sequences from the target organisms. Different regions in the genome provide variable taxonomic resolutions depending on the sequence complexity, mutation rate, selection pressure, recombination, and evolutionary history of the species (Coissac et al., 2016). Further, sensitive alignment-based homology detection algorithms such as BLAST (Altschul, 2014) are prohibitively slow to query billions of reads against large reference databases. Alternative alignment-free k-mer-based algorithms such as KRAKEN2 (Wood et al., 2019) are thousands of times faster than BLAST but far less sensitive and cannot find homology between highly divergent species (Lindgreen et al., 2016). Due to the sparsity of existing reference sequence databases, many underrepresented taxa may remain undetected and lead to underestimates of taxonomic diversity when using DNA-based classifiers. Further, the presence of high amounts of repetitive DNA in eukaryotic genomes can obscure the results of taxonomic classification. Hence, we adopted a protein-based classification algorithm as the protein sequences are more conserved than the genomic DNA sequences and offer better sensitivity with incomplete databases than DNA-based algorithms (Menzel et al., 2016). Even when the exact species is not represented in the database, the sequences can be taxonomically identified using the evolutionarily closest species present in the database as a proxy. Protein-based classification also eliminates erroneous taxonomic assignments from repetitive DNA sequences that are abundant in Eukaryotic genomes. However, the trade-off of using protein-based over DNA-based classification is the lower taxonomic resolution due to the conservation of protein sequences among closely related species. Such trade-offs are inevitable when accurate estimates of taxonomic richness are required, especially when assessing a tropical ecosystem like ours where the majority of the diversity is yet to be documented.
Sequencing costs and the availability of genome-scale data are the main limiting factors for the adoption of deep sequencing of extracellular eDNA for taxonomic assessment of ecosystems. For comparison, a typical PCR-based eDNA metabarcoding library may require about 10 million reads to achieve sequence saturation (Singer et al., 2019) but a PCR-free eDNA library may require over 40 times more sequencing depth (416 million reads) as shown in our study. Therefore, deep sequencing of samples to the point of saturation may quickly become infeasible for large-scale projects with hundreds of samples. Decreasing the sampling resolution and using statistical extrapolations as demonstrated in this study can bring down sequencing costs and enable the assessment of large ecosystems. Moreover, advancements in sequencing technologies are expected to decrease the sequencing cost to as little as $1 per GB in the near future which will make it more affordable. Furthermore, only a small fraction of all the known species have their genomes assembled, annotated, and archived in public sequence databases. Nevertheless, an international moonshot initiative in biology called the Earth BioGenome Project is set to change the scenario of incomplete databases by generating genomic resources for all the known eukaryotic species (about 1.5 million) in a record time of over a decade (Lewin et al., 2018). Several large-scale genome sequencing initiatives across the world have joined this massive effort targeting a wide variety of taxa. With the progress and completion of various genome sequencing initiatives, the increased availability of reference sequences in the databases will improve the sensitivity and specificity of the taxonomic assignments and provide a more accurate snapshot of taxonomic diversity.

4.3 Conclusion

Extracellular eDNA is a natural repertoire of genetic material from all the organisms inhabiting an ecosystem and is a reliable source for taxonomic diversity assessment. Organisms across the tree of life, including the low-abundant non-microbial Eukaryotes, can be effectively detected through PCR-free deep sequencing of extracellular eDNA. The total taxonomic richness of the ecosystem can be estimated through statistical extrapolation of richness accumulation curves and broad-scale spatiotemporal changes in biodiversity can be assessed across the tree of life in an ecosystem. With plummeting sequencing costs and increasing coverage of reference databases by large-scale genome sequencing projects, we envision the wide adoption of PCR-free deep sequencing of extracellular eDNA for large-scale biodiversity assessment across the tree of life. Although there is further scope to test and optimize the workflow, we believe that this study significantly improves our understanding of the capabilities and limits of extracellular eDNA for total biodiversity assessment. The advancements made in this study are fundamental for a paradigm shift toward implementing large-scale next-generation bioassessment and biomonitoring programs across the tree of life for the conservation, restoration, and management of ecosystems in the Anthropocene.

SUPPLEMENTARY FIGURES

Supplementary Figure 1. Geographic overview of the Chilika lagoon.
The brackish lagoon is located on the east coast of India and receives freshwater from the Mahanadi River system and marine water from the Bay of Bengal.