Material and Methods
Site description and eDNA sampling
strategy
From the Black Forest Mountains to the Black Sea, the Danube River is
the second largest European river, with a drainage area of 801,093
km2, a river length of approximately 2850 km and a
mean discharge of 6480 m3.s-1. The
river is divided into three main sections of comparable length, namely
the Upper-, Middle- and Lower Danube (Eros et al., 2017) (Fig. 1 and 6).
The 18 sampled tributaries, located all along the Danube, have an
average flow rate varying from 5 to 1800
m3.s-1 (Supplementary Tab 1) and
represent a very diverse range of rivers from torrential, fresh alpine
rivers to large warm lowland streams (Kresser & Laszloffy, 1964).
From June 29 to August 6, 2019, 29 and 18 sites were sampled on the
Danube River and its tributaries, respectively. During this period,
these rivers were close to the average hydrological conditions
(Supplementary Tab 1), with a mean daily flow rate of 1716
m3.s-1 at Vienna. The sites located
on the main channel of the Danube were distributed regularly from the
source to the mouth of the river. The distance between the sites (mean:
99.2 km, standard error: 26.0 km; range: 38-149 km) was sufficient to
avoid the potential influence of eDNA transported downstream from one
site to the next (Pont et al., 2018). For the same reason, the sampling
sites were not located within several tens of km downstream of the
confluence of a major tributary. The tributaries were sampled 1-55 km
upstream of their confluence with the Danube. Due to the failure of DNA
amplification, the Inn River site was resampled in May 2020. At each
site, two surface samples were collected either by wading or from a boat
moving from shore to shore to provide temporal and spatial integrative
sampling of the river cross section (mean filtration time of 22.34 MN).
The water was collected with a peristaltic pump inside a disposable
sterile tube and was directly filtered through a cross flow filtration
capsule (VigiDNA 0.45 μm, SPYGEN), and its volume was measured (3 to 40
L, mean of 28.73 L). At the end of each filtration, the water in the
capsule was drained, and the capsule was refilled with 80 mL of
conservation buffer CL1 (SPYGEN) to prevent eDNA degradation.
Conventional fishing
During the same period (July 3 to August 28, 2019), 41 sites were
sampled by using TEF along the Danube River and its tributaries (Bammer
et al., 2021). Two additional sites were sampled in October 2018 and
January 2020. The sampling procedure followed both the European Standard
(CEN, 2003) and recommendations for quantitative sampling in large
rivers (Schmutz, Zauner, Eberstaller, & Jungwirth, 2001). Fish were
sampled in a single pass along the bank of the main channel and in some
places in the connected backwaters. The main mesohabitat types were
sampled in their proportional distribution at the site level (length of
river site at least 10 times the width of the river) to maximize the
representativeness of the fish assemblage. The sampling effort varied
between 300 and 28,412 m2, depending on the diameter
of the anode (boom or hand-held) and on the river size. Fish were
determined to the species level, measured (+/- 0.5 cm total length) and
released alive immediately afterwards. Fish individual biomass was
estimated using species-specific length-weight relationships. The data
for one site (Russenski Lom River) where only 19 fish were caught were
discarded. Eighteen of the remaining sites were located on the same
river stretch as the eDNA sampling sites (distance less than 20 km), and
only the main channel was sampled, allowing comparison between eDNA and
TEF sampling methods at these sites (Supplementary Tab 1).
eDNA metabarcoding and taxonomic
assignment
The eDNA metabarcoding workflow (extraction, amplification using
“teleo” primers, high-throughput sequencing and bioinformatic
analysis) was performed following a previously described protocol (Pont
et al., 2018). After eDNA extraction, 12 PCR replicates were conducted
per sample. Twelve libraries were prepared using the Fasteris MetaFast
protocol, and twelve independent paired-end sequencing reactions (2 ×
125 bp) was carried out on a MiSeq sequencer (Illumina) with the MiSeq
Kit v3 (Illumina) following the manufacturer’s instructions at Fasteris
facilities. To monitor possible contaminants, eleven negative extraction
controls and seven negative PCR controls (ultrapure water) were
amplified with 12 replicates and sequenced in parallel with the samples.
Sequence reads were analysed using programs implemented in the OBITools
package (Boyer et al., 2016). The forward and reverse reads were
assembled with the ILLUMINAPAIREDEND program using a minimum score of 40
and retrieving only joined sequences. Then, we assigned the reads to
each sample using NGSFILTER software, and a separate data set was
created for each sample by splitting the original data set into several
files using OBISPLIT. After this step, we analysed each sample
individually before merging the taxon list for the final ecological
analysis. Strictly identical sequences were clustered together using
OBIUNIQ. Sequences shorter than 20 bp, or with fewer than 10 reads or
labelled “internal” by the OBICLEAN program were excluded.
To optimize the taxonomic assignment of fish eDNA collected in our water
samples, we assembled, in addition to our previous database (Valentini
et al., 2016), a complementary “Danubian” reference database
(Supplementary Tab 3). Tissue samples for 356 specimens belonging to 73
species were collected at locations situated in the Danube catchment.
Total DNA was extracted from 10 mg of muscle tissue, amplified using the
eDNA metabarcoding protocol with “teleo” primers and sequenced using a
MiSeq sequencer at Fasteris facilities (Valentini et al., 2016). The
sequences obtained were analysed using the ObiTools package following
the same protocol as the eDNA samples excluding the taxonomic
assignation step. The most abundant sequence was retrieved for reference
database construction.
The final taxonomic assignment of molecular operational taxonomic units
(MOTUs) was performed using the program ECOTAG, with our two reference
databases and the sequences extracted from the release 142 (standard
sequences) of the ENA database
(http://www.ebi.ac.uk/ena).
Considering the incorrect assignment of a few sequences to the sample
due to tag jumps (Schnell, Bohmann, & Gilbert, 2015), all the sequences
with a frequency of occurrence < 0.001 per sequence and per
library were discarded. Then, the data were curated for Index-Hopping
(MacConaill et al., 2018) with a threshold empirically determined per
sequencing batch using experimental blanks (i.e., combinations of tags
not present in the libraries) for a given sequencing batch between
libraries.
The taxonomic nomenclature used referred to the European Freshwater Fish
Fauna (Kottelat & Freyhof, 2007), except for the genera Cottusand Phoxinus at the species level due to our insufficient
knowledge of the haplotype diversity within the Danube catchment. The
corresponding reference sequences were denominated Cot_gob and Pho_pho
(see Supplementary Tab 2 for species names corresponding to eDNA
detected taxa name abbreviations). When reference sequences from the
different reference databases were assigned to the same species, their
corresponding number of reads was cumulated. When reference sequences
were assigned at the genus level, they were finally denominated at the
species level when only one species from the genus was known in the
catchment (Anguilla anguilla , Barbatula barbatula ). If
not, they were discarded (Acipenser spp., Alburnus spp., Barbus
spp., Rutilus spp.) , as were sequences assigned to a higher taxonomic
level (Cyprinidae, Salmonidae). The molecular markers did not
discriminate between 2 to 3 detected taxa belonging to the same genus
(Salvelinus , Carassius , Alosa , Acipenser ,Barbus , Lampetra ) and to different genera for five groups
(Cypr_1, Cypr_2, Cypr_3, Cypr_4, Cypr_5). Within all groups, we
only considered species-level assignment for taxa known to be present in
the Danube catchment (Supplementary Tab 2). Of the two undifferentiated
species in the Cypr_1 taxonomic group and present in the Danube River
catchment (Chon_nas, Tel_sou), only Chon_nas was captured by using
TEF during our survey. Because Tel_sou is a species well known to occur
mainly in upstream fast-flowing river reaches, we considered Cypr_1
occurrence to be primarily related to the presence of Chon_nas. After
the final taxonomic sequence identification, three categories of taxa
were considered (see Supplementary Tab 2). The first category included
all the taxa whose presence at the Danube River was confirmed (KNWTaxa)
by previous traditional fish sampling surveys (Bammer et al., 2021; Eros
et al., 2017) or from the literature (Kottelat & Freyhof, 2007;
Meulenbroek et al., 2018; Sommerwerk et al., 2009). The second category
(WASTaxa) included food fish, farmed fish, aquarium fish or fish with
any other link to human activity allowing a rejection of extra-organism
eDNA in the river (mainly wastewater). The third group included species
unknown in the catchment and not known for any human use (UNKTaxa).Alosa spp. were detected in the upper Danube in one sample (KM
843) with only one positive PCR. The presence of this anadromous species
in such an upstream location cannot be confirmed by any previous
observation and was considered a false positive at this site.
Quantification of
teleo_eDNA
For the quantification of fish DNA, the samples were amplified in a
real-time quantitative PCR setup using the same “teleo” primers as for
metabarcoding. qPCR was performed in a final volume of 25 µL, which
included 3 µL of DNA, 12.5 µL of SYBR® Green Master Mix (BioRad®), 8.3
µL of ddH2O, 0.5 µL of each “teleo” primer (10 mM), 4 µM of human
blocking primer (Valentini et al., 2016) and 0.2 μL bovine serum albumin
(BSA, Roche Diagnostic). Each sample was analysed in 3 replicates. To
obtain a standard curve, a known concentration of a synthetic gene was
diluted from 1,13x108 to 1,13x105copies of DNA per reaction. The tubes containing the DNA samples were
sealed, and then the qPCR standards were added to the qPCR plate in a
room separate from the eDNA extraction room. The qPCR theroprofile and
cycling conditions used were as follows: 95 °C for 10 min, followed by
55 cycles of 95 °C for 30 s and 55 °C for 30 s. Melting curves were
produced by plotting fluorescence intensity against temperature as the
temperature was increased from 65 to 95 °C in 0.5 °C steps every 5 s.
The samples were analysed on a BIO-RAD® CFX96 Touch Real-Time PCR
Detection System. To test the sensitivity of the primer for
quantification, the limit of detection (LOD, i.e., the minimum amount of
target DNA sequence that can be detected in the sample) and the limit of
quantification (LOQ, i.e., the lowest amount of target DNA that yields
an acceptable level of precision and accuracy) were calculated by
running a dilution series of a known amount of synthetic gene, ranging
from 1 ng.μL-1 (1.13 x 109 DNA
copies) to 10-9 ng.L-1 (1.13 DNA
copies) with 12 qPCR replicates per concentration below
10-3 ng.μL-1. The LOQ (Klymus et
al., 2019) was estimated at 10-7 ng/µL, which
corresponds to ca. 500 copies, and the LOD (Klymus et al., 2019) was
estimated to be 80 copies. However, when performing 12 replicates, as in
the case of the applied eDNA metabarcoding protocol, the LOD estimate
was 6 DNA molecules.
The quantity of teleo-DNA per sample of KNWtaxa (eleo-DNA) was
calculated from the ratio of KNWtaxa read counts over the total read
count, multiplied by the teleo-DNA quantity extracted (van Bleijswijk et
al., 2020). A similar computation was applied to each fish taxon, and
the final concentration of fish species DNA per litre was computed from
the ratio of the quantity of DNA per taxon by the volume sampled for
each sample.
Statistical treatments
The mean site-specific richness calculated from the eDNA and TEF data
was compared using two-tailed Student’s t test for paired samples (R
Core Team, 2020; package MASS, function t-test).
Teleo-DNA concentrations and fish biomass/density data were
log-transformed to satisfy normality assumptions before modelling the
relationship between them using a type II linear regression (R Core
Team, 2020; package lmodel2, function lmodel2, “main axis” method).
Teleo-DNA concentrations were regressed against the mean annual
waterflow values at each site (Kresser & Laszloffy, 1964).
The structures of fish assemblages revealed by eDNA and TEF at the 18
common sites were compared using co-inertia analysis (Doledec &
Chessel, 1994; R Core Team, 2020; package ade-4, functions dudi.pca and
coinertia). This multivariate method allowed the comparison of the
ordinations of two data sets to find the orthogonal co-inertia principal
components that maximize the co-variance between them. The RV co-inertia
criterion (0 to 1) measured the adequacy between the two tables (Dray,
Chessel, & Thioulouse, 2003) and was tested (Monte Carlo test with
10,000 permutations). We only considered common taxa with a similar
level of taxonomic resolution (40 species) to test the similarity of the
structure of fish assemblages obtained by the eDNA method and TEF
abundance expressed in density or biomass.
To test the hypothesis that the number of KNWTaxa detected by eDNA was
dependent on the quantity of teleo-DNA per sample or on the water volume
(V) filtered from the 94 samples, we used an asymptotic function to
describe our species-sampling effort relationship considering that, at
any time, the richness Y is finite at a given area (Soberon & Llorente,
1993). The choice of the non-linear function remains largely empirical
(Thompson, Withers, Pianka, & Thompson, 2003), and we chose a model
(Tjorve, 2003) from the negative exponential family a * [1 -
exp-bX], with an asymptotic value of richness, b
proportional to the relative rate of Y increase while X increases, and X
the sampling effort (teleo-eDNA or V). To control for variability in
species richness between sites, we used non-linear mixed-effect (NLME)
models (Comets, Lavenu, & Lavielle, 2017; R Core Team, 2020; package
saemix, function saemix.model, 1000 simulations) with sites as a random
factor and two alternative fixed effects (teleo-eDNA, V). These two
models were compared between them and to the model with only the site
random effect using the AIC (Burnham & Anderson, 2002). The
significance of the fixed parameters was tested with a Wald chi-square
test (Comets et al., 2017), the normality of the residuals with a
Shapiro test, and the goodness of fit of the selected model by comparing
the observed and predicted values at the individual level.