2.5 Data processing and statistical analysis
All statistical analyses were conducted in R 4.2.3 (R Core Team, 2021)
with the primary packages cluster , factoextra ,phyloseq , vegan , and ggplot2 . Nonmetric
multidimensional scaling (NMDS) was used to group the 38 sampling sites
into spatial zones with distinct differences in fish community
composition according to the relative OTU richness, fish individual
number and biomass. NMDS relies on the rank order of pairwise variable
dissimilarities (Euclidean distance in this study) and does not make any
underlying distributional assumptions of the data (Borcard et al.,
2011). Sampling sites were plotted in ordination space with the distance
between points positively related to the dissimilarity of output
parameters (i.e., sites with similar output parameters were plotted
closer to one another). The analysis of similarity (ANOSIM) test was
used to evaluate the
dissimilarity
matrix and test whether groups of objects had significantly (P< 0.05) different mean dissimilarities.
Based on the OTU richness, the pairwise taxonomic Bray‒Curtis
dissimilarity matrix between different samples was calculated using themicroeco package (Liu et al., 2021). Environmental factors and
fish OTU richness that showed significant variations in their values
were used, and stepwise forwards selection was performed to linearly
reduce the correlated variables along the axes. A permutation limit
(with a P value of 0.05) was used to determine which variables to
incorporate into the final model. The relationship between eDNA-based
and number-/biomass-based alpha diversity was estimated by linear
regression. Linear dependencies were explored by computing the variable
variance inflation factors to ensure no confounding colinearity. The
statistical significance of the axes derived from each analysis was
tested with a Monte Carlo test (999 permutations).
Linear discriminant analysis
effect size (LEfSe) is an algorithm for high-dimensional indicator
discovery that identifies taxa by characterizing the differences between
two or more biological conditions (Segata et al., 2011). LEfSe
emphasizes both statistical significance and biological relevance,
allowing researchers to identify
discriminative features that are
significantly different between biological classes. The nonparametric
factorial Kruskal−Wallis sum-rank test was first used to detect features
with significant differential abundance with respect to the class of
interest. Second, LEfSe uses linear discriminant analysis to estimate
the effect size of each differentially abundant feature and rank the
feature accordingly (Liu et al., 2021).
Two-way hierarchical clustering analysis was performed using thepheatmap package. The package used clustering distances and
methods implemented in the dist and hclust functions in R.
The clustering analysis divided fish species with similar responses to
the environmental factors into a group. Statistically significant
cluster trees were identified using a bootstrap randomization technique
in which the nonzero values were resampled and used to generate
pseudovalues under the null hypothesis. The result was displayed as a
heatmap.