2.7 Species distribution modeling
Occurrence data for each species was obtained from the specimens used in
this study, “expert” identified individual occurrences from GBIF, and
research grade locality records from iNaturalist (www.inaturalist.org).
This resulted in a total of 43 T. blandingii localities and 30T. pulverulenta localities (Fig. S1). Duplicate records were
removed, and points were thinned within a distance of 10-kilometers
using the spThin package (Aiello-Lammens, Boria, Radosavljevic, Vilela,
& Anderson, 2015) in R v. 3.4.4 (R Core Team, 2018). A subset of points
from each data set was set aside for model calibration (75%) and
internal testing (25%) following Cobos, Peterson, Barve, and
Osorio-Olvera (2019).
Environmental data were obtained from the WorldClim database v. 1.4
(Hijmans, Cameron, Parra, Jones, & Jarvis, 2005). Fifteen of the 19
bioclim variables were downloaded at a 2.5-minute resolution. We
excluded bio8, bio9, bio18, and bio19 which are known to create
artifacts in distribution models (Escobar, Lira-Noriega, Medina-Vogel,
& Peterson, 2014). The same 15 variables were used for the Last Glacial
Maximum (LGM) under three general circulation models (GCMs): CCSM4,
MIROC-ESM, and MPI-ESM-P. In order to reduce spatial autocorrelation,
principal component analyses (PCAs) were performed on present bioclim
variables and projected to the LGM for the extent of sub-Saharan Africa.
Model calibration areas were defined as a 1000-kilometer buffer around
occurrence points for each species. Model calibration, creation,
projection, and evaluation were done using the R package kuenm (Cobos et
al., 2019). In order to calibrate our models, we created 1479 candidate
models for each species by combining three sets of environmental
predictors (PCAs 1–6, 1–5, 1–4), 17 possible regularization
multipliers (0.1–1.0 at intervals of 0.1, 2–6 at intervals of 1, and 8
and 10), and all combinations of five feature classes (linear = l,
quadratic = q, product = p, threshold = t, and hinge = h; Cobos et al.,
2019).
Candidate models were run in Maxent (Phillips, Anderson, & Schapire,
2006) and chosen based on significant partial ROC scores (Peterson,
Papes, & Soberón, 2008), omission rates of E ≤ 5% (Anderson, Lew, &
Peterson, 2003), and AICc scores of ≤ 2 to minimize model complexity
(Warren & Seifert, 2011). These models determined the parameter set
used for final model creation.
Final models were created for each species using the full set of
occurrence records and the parameters chosen during model calibration.
Models were run in Maxent with ten bootstrap replicates and logistic
outputs. After models were run in the present, they were projected to
the LGM and mid-Holocene for the three GCMs. The mobility-oriented
parity (MOP) index was used to test for model extrapolation (Soberón &
Peterson, 2005). Models were visualized in QGIS 3.4 and thresholded to
5% to create presence-absence maps. Models from each time period were
summed to estimate potential LGM and mid-Holocene distributions as well
as continuous stability maps (Devitt, Devitt, Hollingsworth, McGuire, &
Moritz, 2013; Yannic et al., 2014).