Joint species distribution models to quantify occupancy response
to the different coastal areas
We used the SWARM pipeline to generate the MOTUs list in each site of
the two coastal areas. From this MOTUs composition matrix, we compared
the MOTUs occupancy in each area using
a Hierarchical Modelling of
Species Communities (HMSC; Ovaskeinen et al. 2017, Ovaskeinen and Abrego
2020): a joint species distribution model whereby latent variables help
explain shared species responses to environmental variation (Warton et
al. 2015). We further applied the HMSC to model the species responses
from the underwater visual census (UVC) of fishes using SCUBA surveys.
We applied Hierarchical Modelling
of Species Communities (HMSC; Ovaskeinen et al. 2017, Ovaskeinen and
Abrego 2020): a joint species distribution model whereby latent
variables help explain shared species responses to environmental
variation (Warton et al. 2015). The MOTU data set comprises the
occurrence of 79 MOTUs in 19 samples. The UVC data set comprises the
occurrence of 58 species in 32 samples. We excluded species that
occurred in fewer than 5 sampling units and no more than n-2 sampling
units to avoid spurious and unidentifiable environmental responses for
species with few data (Ovaskeinen and Abrego 2020). For both the UVC and
the MOTU, we also fitted a random effect associated with each sample to
ensure latent variables (e.g., species’ associations) are fitted in HMSC
(Ovaskeinen and Abrego 2020). To strictly compare with the eDNA data, we
both fitted a UVC model with the same number of samples as eDNA and a
second model with all 32 samples. In all models, we used the sampling
unit by species matrix as the response variable (i.e., the n x
ns ‘Y’ of HMSC; see Ovaskainen et al. 2017) propagated
with species occurrence or absences (0 or 1). We used a probit
regression in all analyses. We included a single fixed effect of the
anthropic area as our species by covariate matrix (i.e., the n x
nc ‘X’ of HMSC; see Ovaskainen et al. 2017). We
estimated a species-specific regression parameter to contrast their
occupancy in the two areas. For the MOTU data, we further fitted a
transect-level random effect to control for unexplained variation
amongst sampling units (e.g., 2x30L water filtrations per transect). We
used the R-package ‘Hmsc’ (Tikhonov et al. 2020) to fit our model
assuming default prior distributions (Ovaskeinen and Abrego 2020). We
sampled the prior distribution with four Markov Chain Monte Carlo (MCMC)
chains each run for 37,500 interactions of which the first 12,500 were
removed as burn-in. The chains were thinned by 100 to obtain 1000
posterior samples in total. We ensured model convergence by evaluating
the potential scale reduction factors (e.g., Gelman and Rubin 1992). We
evaluated the explanatory power of our models for each species by
comparing the observed and predicted occurrences using area under
receiver-operator curve (AUC; Pearce and Ferrier 2000) and Tjur’s
R2 (Tjur 2009) statistics. Due to the limited number
of replicates in our study, we did not expect good predictive
(out-of-sample) power and therefore only report model explanatory power
(within-sample prediction).
We evaluated the proportion of MOTUs and species that exhibit positive
or negative responses to anthropic areas with 95% credible intervals of
coefficients non-overlapping 0, assessed the continuity of these
responses across eDNA metabarcoding MOTU and UVC data sets, and we
computed the phylogenetic signal of the estimated coefficients for both
eDNA and UVC. We used 100 randomised phylogenetic trees previously
extracted from the phylogeny of Rabosky et al. (2018) that was pruned by
both taxa lists. As the taxa list extracted from the SWARM analysis is
not always at the species level, we selected one species representing
the genus/family detected in the eDNA table into phylogenetic trees.
Then, we calculated the mean Pagel’s lambda (λ) statistic and the mean
associated p value.