Material and Methods

Site description and eDNA sampling strategy

From the Black Forest Mountains to the Black Sea, the Danube River is the second largest European river, with a drainage area of 801,093 km2, a river length of approximately 2850 km and a mean discharge of 6480 m3.s-1. The river is divided into three main sections of comparable length, namely the Upper-, Middle- and Lower Danube (Eros et al., 2017) (Fig. 1 and 6). The 18 sampled tributaries, located all along the Danube, have an average flow rate varying from 5 to 1800 m3.s-1 (Supplementary Tab 1) and represent a very diverse range of rivers from torrential, fresh alpine rivers to large warm lowland streams (Kresser & Laszloffy, 1964).
From June 29 to August 6, 2019, 29 and 18 sites were sampled on the Danube River and its tributaries, respectively. During this period, these rivers were close to the average hydrological conditions (Supplementary Tab 1), with a mean daily flow rate of 1716 m3.s-1 at Vienna. The sites located on the main channel of the Danube were distributed regularly from the source to the mouth of the river. The distance between the sites (mean: 99.2 km, standard error: 26.0 km; range: 38-149 km) was sufficient to avoid the potential influence of eDNA transported downstream from one site to the next (Pont et al., 2018). For the same reason, the sampling sites were not located within several tens of km downstream of the confluence of a major tributary. The tributaries were sampled 1-55 km upstream of their confluence with the Danube. Due to the failure of DNA amplification, the Inn River site was resampled in May 2020. At each site, two surface samples were collected either by wading or from a boat moving from shore to shore to provide temporal and spatial integrative sampling of the river cross section (mean filtration time of 22.34 MN). The water was collected with a peristaltic pump inside a disposable sterile tube and was directly filtered through a cross flow filtration capsule (VigiDNA 0.45 μm, SPYGEN), and its volume was measured (3 to 40 L, mean of 28.73 L). At the end of each filtration, the water in the capsule was drained, and the capsule was refilled with 80 mL of conservation buffer CL1 (SPYGEN) to prevent eDNA degradation.

Conventional fishing

During the same period (July 3 to August 28, 2019), 41 sites were sampled by using TEF along the Danube River and its tributaries (Bammer et al., 2021). Two additional sites were sampled in October 2018 and January 2020. The sampling procedure followed both the European Standard (CEN, 2003) and recommendations for quantitative sampling in large rivers (Schmutz, Zauner, Eberstaller, & Jungwirth, 2001). Fish were sampled in a single pass along the bank of the main channel and in some places in the connected backwaters. The main mesohabitat types were sampled in their proportional distribution at the site level (length of river site at least 10 times the width of the river) to maximize the representativeness of the fish assemblage. The sampling effort varied between 300 and 28,412 m2, depending on the diameter of the anode (boom or hand-held) and on the river size. Fish were determined to the species level, measured (+/- 0.5 cm total length) and released alive immediately afterwards. Fish individual biomass was estimated using species-specific length-weight relationships. The data for one site (Russenski Lom River) where only 19 fish were caught were discarded. Eighteen of the remaining sites were located on the same river stretch as the eDNA sampling sites (distance less than 20 km), and only the main channel was sampled, allowing comparison between eDNA and TEF sampling methods at these sites (Supplementary Tab 1).

eDNA metabarcoding and taxonomic assignment

The eDNA metabarcoding workflow (extraction, amplification using “teleo” primers, high-throughput sequencing and bioinformatic analysis) was performed following a previously described protocol (Pont et al., 2018). After eDNA extraction, 12 PCR replicates were conducted per sample. Twelve libraries were prepared using the Fasteris MetaFast protocol, and twelve independent paired-end sequencing reactions (2 × 125 bp) was carried out on a MiSeq sequencer (Illumina) with the MiSeq Kit v3 (Illumina) following the manufacturer’s instructions at Fasteris facilities. To monitor possible contaminants, eleven negative extraction controls and seven negative PCR controls (ultrapure water) were amplified with 12 replicates and sequenced in parallel with the samples. Sequence reads were analysed using programs implemented in the OBITools package (Boyer et al., 2016). The forward and reverse reads were assembled with the ILLUMINAPAIREDEND program using a minimum score of 40 and retrieving only joined sequences. Then, we assigned the reads to each sample using NGSFILTER software, and a separate data set was created for each sample by splitting the original data set into several files using OBISPLIT. After this step, we analysed each sample individually before merging the taxon list for the final ecological analysis. Strictly identical sequences were clustered together using OBIUNIQ. Sequences shorter than 20 bp, or with fewer than 10 reads or labelled “internal” by the OBICLEAN program were excluded.
To optimize the taxonomic assignment of fish eDNA collected in our water samples, we assembled, in addition to our previous database (Valentini et al., 2016), a complementary “Danubian” reference database (Supplementary Tab 3). Tissue samples for 356 specimens belonging to 73 species were collected at locations situated in the Danube catchment. Total DNA was extracted from 10 mg of muscle tissue, amplified using the eDNA metabarcoding protocol with “teleo” primers and sequenced using a MiSeq sequencer at Fasteris facilities (Valentini et al., 2016). The sequences obtained were analysed using the ObiTools package following the same protocol as the eDNA samples excluding the taxonomic assignation step. The most abundant sequence was retrieved for reference database construction.
The final taxonomic assignment of molecular operational taxonomic units (MOTUs) was performed using the program ECOTAG, with our two reference databases and the sequences extracted from the release 142 (standard sequences) of the ENA database (http://www.ebi.ac.uk/ena). Considering the incorrect assignment of a few sequences to the sample due to tag jumps (Schnell, Bohmann, & Gilbert, 2015), all the sequences with a frequency of occurrence < 0.001 per sequence and per library were discarded. Then, the data were curated for Index-Hopping (MacConaill et al., 2018) with a threshold empirically determined per sequencing batch using experimental blanks (i.e., combinations of tags not present in the libraries) for a given sequencing batch between libraries.
The taxonomic nomenclature used referred to the European Freshwater Fish Fauna (Kottelat & Freyhof, 2007), except for the genera Cottusand Phoxinus at the species level due to our insufficient knowledge of the haplotype diversity within the Danube catchment. The corresponding reference sequences were denominated Cot_gob and Pho_pho (see Supplementary Tab 2 for species names corresponding to eDNA detected taxa name abbreviations). When reference sequences from the different reference databases were assigned to the same species, their corresponding number of reads was cumulated. When reference sequences were assigned at the genus level, they were finally denominated at the species level when only one species from the genus was known in the catchment (Anguilla anguilla , Barbatula barbatula ). If not, they were discarded (Acipenser spp., Alburnus spp., Barbus spp., Rutilus spp.) , as were sequences assigned to a higher taxonomic level (Cyprinidae, Salmonidae). The molecular markers did not discriminate between 2 to 3 detected taxa belonging to the same genus (Salvelinus , Carassius , Alosa , Acipenser ,Barbus , Lampetra ) and to different genera for five groups (Cypr_1, Cypr_2, Cypr_3, Cypr_4, Cypr_5). Within all groups, we only considered species-level assignment for taxa known to be present in the Danube catchment (Supplementary Tab 2). Of the two undifferentiated species in the Cypr_1 taxonomic group and present in the Danube River catchment (Chon_nas, Tel_sou), only Chon_nas was captured by using TEF during our survey. Because Tel_sou is a species well known to occur mainly in upstream fast-flowing river reaches, we considered Cypr_1 occurrence to be primarily related to the presence of Chon_nas. After the final taxonomic sequence identification, three categories of taxa were considered (see Supplementary Tab 2). The first category included all the taxa whose presence at the Danube River was confirmed (KNWTaxa) by previous traditional fish sampling surveys (Bammer et al., 2021; Eros et al., 2017) or from the literature (Kottelat & Freyhof, 2007; Meulenbroek et al., 2018; Sommerwerk et al., 2009). The second category (WASTaxa) included food fish, farmed fish, aquarium fish or fish with any other link to human activity allowing a rejection of extra-organism eDNA in the river (mainly wastewater). The third group included species unknown in the catchment and not known for any human use (UNKTaxa).Alosa spp. were detected in the upper Danube in one sample (KM 843) with only one positive PCR. The presence of this anadromous species in such an upstream location cannot be confirmed by any previous observation and was considered a false positive at this site.

Quantification of teleo_eDNA

For the quantification of fish DNA, the samples were amplified in a real-time quantitative PCR setup using the same “teleo” primers as for metabarcoding. qPCR was performed in a final volume of 25 µL, which included 3 µL of DNA, 12.5 µL of SYBR® Green Master Mix (BioRad®), 8.3 µL of ddH2O, 0.5 µL of each “teleo” primer (10 mM), 4 µM of human blocking primer (Valentini et al., 2016) and 0.2 μL bovine serum albumin (BSA, Roche Diagnostic). Each sample was analysed in 3 replicates. To obtain a standard curve, a known concentration of a synthetic gene was diluted from 1,13x108 to 1,13x105copies of DNA per reaction. The tubes containing the DNA samples were sealed, and then the qPCR standards were added to the qPCR plate in a room separate from the eDNA extraction room. The qPCR theroprofile and cycling conditions used were as follows: 95 °C for 10 min, followed by 55 cycles of 95 °C for 30 s and 55 °C for 30 s. Melting curves were produced by plotting fluorescence intensity against temperature as the temperature was increased from 65 to 95 °C in 0.5 °C steps every 5 s. The samples were analysed on a BIO-RAD® CFX96 Touch Real-Time PCR Detection System. To test the sensitivity of the primer for quantification, the limit of detection (LOD, i.e., the minimum amount of target DNA sequence that can be detected in the sample) and the limit of quantification (LOQ, i.e., the lowest amount of target DNA that yields an acceptable level of precision and accuracy) were calculated by running a dilution series of a known amount of synthetic gene, ranging from 1 ng.μL-1 (1.13 x 109 DNA copies) to 10-9 ng.L-1 (1.13 DNA copies) with 12 qPCR replicates per concentration below 10-3 ng.μL-1. The LOQ (Klymus et al., 2019) was estimated at 10-7 ng/µL, which corresponds to ca. 500 copies, and the LOD (Klymus et al., 2019) was estimated to be 80 copies. However, when performing 12 replicates, as in the case of the applied eDNA metabarcoding protocol, the LOD estimate was 6 DNA molecules.
The quantity of teleo-DNA per sample of KNWtaxa (eleo-DNA) was calculated from the ratio of KNWtaxa read counts over the total read count, multiplied by the teleo-DNA quantity extracted (van Bleijswijk et al., 2020). A similar computation was applied to each fish taxon, and the final concentration of fish species DNA per litre was computed from the ratio of the quantity of DNA per taxon by the volume sampled for each sample.

Statistical treatments

The mean site-specific richness calculated from the eDNA and TEF data was compared using two-tailed Student’s t test for paired samples (R Core Team, 2020; package MASS, function t-test).
Teleo-DNA concentrations and fish biomass/density data were log-transformed to satisfy normality assumptions before modelling the relationship between them using a type II linear regression (R Core Team, 2020; package lmodel2, function lmodel2, “main axis” method). Teleo-DNA concentrations were regressed against the mean annual waterflow values at each site (Kresser & Laszloffy, 1964).
The structures of fish assemblages revealed by eDNA and TEF at the 18 common sites were compared using co-inertia analysis (Doledec & Chessel, 1994; R Core Team, 2020; package ade-4, functions dudi.pca and coinertia). This multivariate method allowed the comparison of the ordinations of two data sets to find the orthogonal co-inertia principal components that maximize the co-variance between them. The RV co-inertia criterion (0 to 1) measured the adequacy between the two tables (Dray, Chessel, & Thioulouse, 2003) and was tested (Monte Carlo test with 10,000 permutations). We only considered common taxa with a similar level of taxonomic resolution (40 species) to test the similarity of the structure of fish assemblages obtained by the eDNA method and TEF abundance expressed in density or biomass.
To test the hypothesis that the number of KNWTaxa detected by eDNA was dependent on the quantity of teleo-DNA per sample or on the water volume (V) filtered from the 94 samples, we used an asymptotic function to describe our species-sampling effort relationship considering that, at any time, the richness Y is finite at a given area (Soberon & Llorente, 1993). The choice of the non-linear function remains largely empirical (Thompson, Withers, Pianka, & Thompson, 2003), and we chose a model (Tjorve, 2003) from the negative exponential family a * [1 - exp-bX], with an asymptotic value of richness, b proportional to the relative rate of Y increase while X increases, and X the sampling effort (teleo-eDNA or V). To control for variability in species richness between sites, we used non-linear mixed-effect (NLME) models (Comets, Lavenu, & Lavielle, 2017; R Core Team, 2020; package saemix, function saemix.model, 1000 simulations) with sites as a random factor and two alternative fixed effects (teleo-eDNA, V). These two models were compared between them and to the model with only the site random effect using the AIC (Burnham & Anderson, 2002). The significance of the fixed parameters was tested with a Wald chi-square test (Comets et al., 2017), the normality of the residuals with a Shapiro test, and the goodness of fit of the selected model by comparing the observed and predicted values at the individual level.