SWEEP analysis
To characterize genome-wide patterns of genetic variation and population
differentiation, the nucleotide diversity (π) and
population-differentiation statistic (F ST) were
calculated using a sliding window approach (a 40-kb window sliding in
20-kb steps). Population pairwise F ST based on
the mitochondrial cytb and nuclear snx33 gene were
calculated using Arlequin software version 3.5.2.2 (Excofer et al.
2005). π was used to estimate the genomic diversity of populations andF ST was used to evaluate the strength of genomic
differentiation between two populations.
The selection signatures of population-specific genomic regions were
measured between S. schlegeli populations (YS-ECS and SCS). We
applied a sliding window approach (a 40kb window sliding with a step
size of 20kb) to identify selected regions. Genomic regions with a
significantly high F ST statistic (corresponding
to the top 5% level) and θπ ratio (the θπ ratio was calculated as
log2(π1) - log2(π2), and the top 5%
level of θπ ratio was selected) values were identified as selective
sweep regions. Genes were submitted to the Gene Ontology databases for
annotation (Ashburner et al. 2000; Ogata et al.2000).
Gene ontology analyses were performed to annotate the genes to
functional ontology terms. Using these candidate genes, we performed a
targeted population genetic analysis. We used a higher resolution
sliding window analysis in a 10-kb window (compared with a 100-kb window
in selection analysis) to calculate F ST and the
θπ between populations.
Comparative morphometric analysis of
populations
We examined the cranial morphology of 163 individuals (included the 69
individuals involved in the RADseq) of S. schlegeli using
landmark-based geometric morphometry analysis. Digital images of the
right lateral side of each specimen were taken using a Canon camera (EOS
Rebel T4i; 18 mega pixels). All the morphometric analyses were performed
using the comprehensive programs of Thin-plate spline (tps) (available
at http://life.bio.sunysb.edu/morph/). Digital images were sorted
according to geographic population and converted to tps files using
tpsUtil. On the head images, we digitized 26 landmarks using tpsDig2,
and the tps files with landmarks were then processed in tpsRelw to
enable relative warp analysis and to obtained X and Y coordinates for
further analysis. ANOVAs were performed on the RW scores of all S.
schlegeli populations.
Ecological niche analyses
WeWe
collected 149 presence records of S. schlegeli from the Global
Biodiversity Information Facility (GBIF, 2019), field investigations,
and published literature
(Table
S1). To avoid clustered records, we used only one presence record per 5
arcminute grid cell, approx. 10 by 10 km, to match the spatial
resolution of marine environmental predictors. As a result, we retained
131 records for subsequent analyses (Figure S2). Presence records ofS. schlegeli along the coastal waters of mainland China were
assigned to three distinct clades (YS, ECS, and SCS) based on results of
genetic clustering and previous phylogeographic evidence from related
taxa in the same region (e.g. Dong et al. 2012; Liu et al. 2007; Ni et
al. 2012; Xu et al. 2009).
Marine environmental predictors at a spatial resolution of 5 arcmin were
retrieved from the MARSPEC database (http://marspec.weebly.com) (Sbrocco
& Barber 2013). We initially considered
bathymetry (i.e. depth of the
seafloor), and 10 marine environmental predictors, including annual
mean, range, variance, and extreme values of sea surface temperature
(SST) and sea surface salinity (SSS) (Figure S3). Bathymetry was
considered since seaweed pipefish prefers shallow waters (Figure S1)
(Chen et al. 2017; Zhang et al. 2017). We calculated the
pairwise Pearson’s correlation coefficients (r ) among predictors
and selected only one among highly correlated predictors (i.e.
|r | > 0.7) (Dormann et al. 2013).
Accordingly, we retained five predictors for model development:
bathymetry, mean annual SSS (bio8), annual range in SSS (bio11), mean
annual SST (bio13), and annual range in
SST (bio16) (Figure S3).
We used n -dimensional hypervolumes (Blonder et al. 2014)
to explore niche differentiation among the three clades. We delineated
hypervolumes using the five predictors selected for SDM and a gaussian
kernel density estimator (Blonder et al. 2018). We studied niche
overlap using both a similarity and a distance metric (Mammola 2019),
namely the hypervolumes’ distance between centroids and pairwise overall
differentiation (β diversity) (Carvalho & Cardoso 2018; Mammola &
Cardoso 2020). The latter approach decomposes overall niche
differentiation (βtotal) in two process: niche shifts
(βreplacement) and niche expansion/contraction
(βrichness) (Carvalho & Cardoso 2018).
βtotal values range from 0 (when niches are identical)
and 1 (fully dissimilar niches), holding true the equality
βtotal = βreplacement +
βrichness.
Furthermore, we explored habitat suitability of S. schlegeliduring present-day and the Last Glacial Maximum (LGM, about 21 kya)
using species distribution model (SDM).
SDMs represent a widely used tool
to quantify a species’ habitat suitability by exploring the
relationships between distribution data and environmental predictors
(Araujo et al., 2019; Thuiller et al.2019).
The MARSPEC database contains paleo marine environmental layers (Sbrocco
2014). For simplicity, Sbrocco (2014) assumed that sea level at the LGM
was 120 meters lower than today. Marine environmental predictors at the
LGM can be derived from general circulation models (GCMs). We considered
paleoclimate simulations from three GCMs (i.e. CCSM3, ECBilt-Clio, and
HadCM3M2) to reduce uncertainties derived from choices of GCMs (Thuilleret al. 2019).
We developed SDM using all distribution data (i.e. 131 records) and the
five non–collinear predictors. For the purpose of reducing
uncertainties associated with choices of SDM algorithms, we used an
ensemble modeling approach (Thuiller et al. 2019). We fitted and
tested the performance of ten algorithms available in biomod2 R
package (Figure S4), using default settings (Thuiller et al.2020). In addition to species presence data, we randomly generated
10,000 background points within the study area as pseudo-absences. We
evaluated the predictive abilities of the ten algorithms by a five-fold
cross-validation approach. True skill statistic (TSS) and the area under
the receiver operating characteristic curve (AUC) were used to measure
SDM predictive performance; algorithms with TSS over 0.75 and AUC over
0.9 were considered to exhibit excellent predictive capacity (Landis &
Koch 1977; Omri Allouche et al. 2006; Swets
1988).
All distribution data and selected algorithms were used to predict
habitat suitability of S. schlegeli under present-day and LGM
climate conditions. Predictions of single algorithms were ensembled by a
TSS weighted average strategy (Thuiller et al. 2020; Thuilleret al. 2019). The continuous habitat suitability results (ranging
continuously from 0 to 1) were transformed into binary
suitable/unsuitable maps by using the 10th percentile presence
probability threshold.