Materials and methods
Datasets and intraspecific genetic
clustering
We sampled the microsatellite datasets of two recently published
regional studies, i.e., 17 populations in the East Indo-Pacific
(Hernawan et al. 2017) and 11 populations in the Western Indian
Ocean (Jahnke et al. 2019a). We used twelve microsatellites
(i.e., Thh3, Thh15, Thh34, Thh41, TH07, TH34, TH37, TH43, TH52, TH66,
TH73) for population structuring and lineage sorting of 1021 individuals
from 28 populations across the tropical Indo-Pacific (Fig. 1a). We then
estimated pairwise genetic differences among populations using the
Cavalli-Sforza and Edwards chord distance and represented them in a
network using the R package IGRAPH (Csardi & Nepusz 2006) with the
addition of a custom script by Johansson et al. (2015). To
visually inspect the relationships within and between the main genetic
clusters inferred by STRUCTURE (Pritchard et al. 2000), we pruned
the full network by sequentially removing edges (i.e., network pairwise
links among sampling sites) of decreasing genetic distance until the
point at which the main groups of tightly connected nodes still remained
connected (in order to avoid the split of any large network cluster from
the main network). We estimated the classification of sampling sites
within network communities at each step of the pruning process with the
“fastgreedy” community detection algorithm implemented in IGRAPH
(Clauset et al. 2004, Blondel et al. 2008). Network
analysis (Fig. 1b), Bayesian-based STRUCTURE (Fig. 1c), and molecular
variation (AMOVA) (Table S1 in Supporting Information) revealed strong
overall genetic differentiation among two distinct lineages occupying
the Tropical Indo-Pacific. Based on the landscape genetic analysis of
Cushman et al. (2014) and the definitions of global marine
ecoregions (Spalding et al. 2007), we classified these two
lineages as distinct genotypes encompassed within two biogeographic
regions: the Western Tropical Indo-Pacific (WTIP) and the Central
Tropical Indo-Pacific (CTIP). We then used the two lineages in
subsequent ecological niche modelling.
Distribution data and marine
predictors
We collected a total of 62,465 presence records of T. hemprichiifrom a recently assembled and cleaned dataset of global marine forests
(Assis et al. 2020) and published literature (see Data
availability). In SDM studies, it is critical to correct for sampling
bias and remove clustered records, which may over-represent
environmental conditions in better-surveyed regions (Kramer-Schadtet al. 2013). Therefore, presence records were filtered by: i)
removing duplicated records at the resolution of our environmental
predictors (i.e., keeping only one record per 5 arcmin grid cell); ii)
removing records on land or with distance to land > 370 km
(following other SDM studies for coastal species; e.g., Zhang et
al. 2020a), and iii) performing spatial thinning using a distance of 20
km using the R package spThin (Aiello-Lammens et al.2015). This distance is a reasonable approximation of the dispersal
potential for this plant traveling via floating propagules (Lacapet al. 2002), and it can also reduce potential effects of
sampling bias while retaining sufficient numbers of presence records for
our analyses. As significant clustering was present in the data
(particularly around Australia), these procedures removed a good
proportion of the presence data. Ultimately, we kept 519 records for the
species-level model (hereafter “species model”), 479 records for the
CTIP lineage-level model (hereafter “CTIP model”), and 26 records for
the WTIP lineage-level model (hereafter “WTIP model”) (Fig. 1a).
It is important to properly select the extent of the study area used to
sample background records when constructing presence-background SDMs for
target species (Barve et al. 2011; Vale et al. 2014).
Given previous marine SDMs (e.g., Zhang et al. 2020a) and the
geographical range of T. hemprichii , we restricted our study to
the areas within 370 km of land between 25°E and 180°E, and between 50°S
and 40°N (Fig. 1a). Please note that our study extent includes southern
Australia and New Zealand, where this species does not naturally occur.
It is always challenging to estimate an appropriate study extent for a
species (Barve et al. 2011), but the extent we selected should
represent the plausible accessible areas to T. hemprichii over
evolutionary time. We subsetted this main study extent to create
separate study extents for the WTIP and CTIP lineages (Fig. 1a) based on
our molecular results (see details in the Lineage genetic diversity in
the Results section).
A number of marine predictors have been demonstrated to influence the
geographical distribution of marine species (Bosch et al. 2018).
Based on previous studies (e.g., Jayathilake & Costello 2018; Zhanget al. 2020a), we initially considered twenty such predictors for
modeling, including two geographical predictors (water depth and
distance to land) from the Global Marine Environment Datasets
(http://gmed.auckland.ac.nz; Basher et al. 2018) and eighteen
environmental predictors (including annual mean, maximum, minimum,
range, average of the minimum records per year, and average of the
maximum records per year) for SST, sea surface salinity, and sea surface
current velocity from the Bio-ORACLE database v2.1
(https://www.bio-oracle.org; Assis et al. 2018b). In SDM studies,
highly collinear predictors can lead to spurious interpretations of
variable importance and unexpected predictions if correlations change in
different projection scenarios (Dormann et al. 2013). Hence, we
checked collinearity by calculating the pairwise Pearson’s correlation
coefficients (r ) among the twenty predictors (Supporting
Information Fig. S1) and selected one among highly correlated predictors
(|r | > 0.7) (Dormann et al.2013) based on present-day and future data availability, biological
importance, and previous findings on important variables for estimating
seagrass distribution (Jayathilake & Costello 2018). In the end, we
retained the two geographical predictors and six environmental
predictors: annual mean current velocity, minimum current velocity,
annual mean sea surface salinity, annual range of sea surface salinity,
annual mean SST, and annual range of SST.
To project future habitat suitability of T. hemprichii , we
considered four representative concentration pathway (RCP) scenarios
(i.e., RCP 2.6, RCP 4.5, RCP 6.0, and RCP 8.5), and two time periods
(i.e., 2050s: the average for 2040–2050s; and 2100s: the average for
2090–2100). We obtained the corresponding projections of future marine
environmental layers from the Bio-ORACLE database v2.1. We assumed that
the two geographical predictors would remain unchanged for future
projections (Zhang et al. 2020a).
Niche differentiation
estimation
To estimate whether the two lineages
of T. hemprichii occupy different niche spaces, we characterized
their realized niches using Hutchinsonian n -dimensional
hypervolumes (Hutchinson 1957) sensu Blonder et al.(2018). We quantified the realized niches of the WTIP and CTIP lineages
using the eight selected marine predictor variables (see previous
section). In short, we extracted and standardized (i.e., zero means and
unit variance) marine predictor values associated with the presence
records for the two lineages. We then determined the volumes and shapes
of the realized niches with the R package hypervolume using the
Gaussian method (Blonder 2019). We measured the extent of niche
differentiation between the two lineages with the kernel.betafunction (Mammola & Cardoso 2020) in the R package BAT (Cardosoet al. 2015, 2020). Following Carvalho & Cardoso (2020), niche
differentiation between hypervolumes was partitioned into the following
two processes: niche shift (replacement of space between hypervolumes)
and niche contraction/expansion (net difference between hypervolumes).
The niche differentiation index ranges from 0 (niches overlap entirely)
to 1 (niches are fully dissimilar) (Carvalho & Cardoso 2020; Mammola &
Cardoso 2020).
Species distribution
modelling
We built SDMs using Maxent 3.4.4, a
presence-background machine learning algorithm with two main complexity
tuning parameters: regularization multiplier, which penalizes complexity
by removing predictors with low predictive ability, and feature class,
which allows for increasing complexity of the model response (Phillipset al. 2017). For each model (species model, WTIP model, and CTIP
model), we randomly generated 10,000 background points within the
corresponding study region. As Maxent’s default settings for the main
tuning parameters can result in overfit models (Radosavljevic &
Anderson 2014), we used a version of the R package ENMeval under
expansion (1.9.0; https://github.com/jamiemkass/ENMeval) to tune our
Maxent models over ranges of each parameter and chose models with
optimal complexity based on performance metrics calculated on withheld
data (Muscarella et al. 2014). In brief, we considered a total of
32 candidate models with different combinations of regularization
multipliers (RM; ranging from 0.5 to 4.0, at 0.5 interval), which
penalize complexity more with higher values, and feature classes
(linear, quadratic, hinge), which allow responses with differing
flexibility. Rather than using conventional random cross-validation to
judge model performance, we used a spatial block cross-validation
approach, which typically results in evaluations that better reflect the
model’s ability to transfer to non-analog conditions (Roberts et
al. 2017; Valavi et al. 2019). Briefly, each study region was
divided into four spatial blocks containing an equal number of presence
records, three blocks were used for model training and the remaining
block for validation, then this procedure was repeated until every block
was used for model validation. As with previous studies (e.g.,
Radosavljevic & Anderson 2014; Kass et al. 2020), the optimal
model was selected by sequentially considering a 10% omission rate
(i.e., the percentage of validation presences with habitat suitability
predictions lower than that of the 10th quantile of training
predictions), followed by the area under the receiver operating
characteristics curve (AUC) calculated on the validation data (i.e., the
model’s ability to discriminate between presence and background records)
to break ties. We acknowledge that AUC is a poor measure for the
absolute performance of presence-background models (e.g.,
Jiménez‐Valverde 2012), but nonetheless this metric can be used to make
relative comparisons of candidate models fitted with the same data (Loboet al. 2008).
Predictive performances of the three best-performing Maxent models were
further assessed using the continuous Boyce index, a reliable evaluation
measure of presence-only algorithms (Hirzel et al. 2006). The
continuous Boyce index ranges from –1 to 1, where positive values
suggest that model predictions match well with the presence data, and
negative values suggest a poor match (Hirzel et al. 2006).
Variable importance for each model was determined using permutation
importance calculated by Maxent. For this method, presence and
background data values for each predictor variable in turn were randomly
permuted and training AUC recalculated—a large drop in AUC indicates
higher importance (Phillips 2017). In addition, we estimated the
marginal response curves of important predictors (i.e., curves
representing habitat suitability along a range of the values of one
predictor variable while keeping the other predictors constant). We
converted continuous habitat suitability predictions for T.
hemprichii to binary values using the same 10% omission thresholds
that we used for model evaluation (Radosavljevic & Anderson 2014). We
then transformed the binary habitat suitability projections to the
Lambert Cylindrical Equal Area projection at a resolution of 10 km and
calculated areas of potential distribution (Zhang et al. 2020a).
It is of great importance to consider species dispersal ability into
SDMs when estimating climate change impacts (Araújo et al. 2006;
Guisan et al. 2017). Given the relatively high dispersal ability
of T. hemprichii (Lacap et al. 2002) and a lack of
apparent dispersal barriers in marine environments (Robinson et
al. 2011), we estimated range size change under an unlimited dispersal
scenario, which assumes that species have unrestricted dispersal ability
and can disperse to any suitable area (Araújo et al. 2006; Zhanget al. 2020c). Range size change was calculated as follows:
\(\mathrm{range\ size\ change\ =\ }\frac{future\ suitable\ area\ -\ present\ suitable\ area}{\text{present\ suitable\ area}}\mathrm{\times 100\%}\),
where negative and positive values represent range contraction and
expansion, respectively.
We used the optimal species- and lineage-level models to make
projections of future potential distribution based on the different RCP
scenarios for the two future time periods. Making projections using SDMs
into novel environmental space (i.e., outside the range of training
data) results in some degree of extrapolations, which should be
quantified to determine levels of uncertainty (Elith et al.2010). Therefore, we measured the similarity between present-day and
future environmental conditions using multivariate environmental
similarity surfaces (MESS) (Elith et al. 2010). In practice, we
calculated the MESS with the R package rmaxent (Baumgartner &
Wilson 2021) for each model using the top three most important
predictors via permutation importance: positive MESS values indicate
conditions more similar to the training data, while negative values
indicate conditions more different (i.e., novel).