(b) Model structure
To compare our hypotheses, we fit generalized linear mixed effect models
(GLMMs). This mixed effects framework allowed us to account for other
factors that could influence genetic diversity but that were not the
focus of our study. For models with log-transformed π as the response
variable, we ran linear GLMMs with a Gaussian error term using the lme4
package v.1.1.26 (Bates et al. 2014). For models withHe or Hd as the response
variable, we ran beta GLMMs using the glmmTMB package v.1.1.7 (Brooks et
al. 2017). All beta models were run specifying the ordbeta family, which
uses a logit link function and enables the incorporation of 0 and 1
values into the model (Kubinec 2022). For the mtDNA models ofHd , the length of the marker in base pairs was
included as an explanatory variable. For the microsatellite models, we
included whether or not the microsatellite primer was cross-species
amplified. Marker length and cross-species amplification, as well as
range position, were all scaled and centered to have a mean of 0 and a
standard deviation (SD) of 1. We incorporated the source (the study the
data came from) as a random intercept for all models to help account for
other study-specific methodological choices, while marker name (the
specific mtDNA marker used) was added as a random intercept for the
mtDNA models to help account for marker-specific mutation rates and
selective constraints. Marker name was included as a random intercept
because we recorded mtDNA genetic diversity from across the mitogenome
and did not limit our dataset to COI or cyt-b markers. Finally, a nested
genus/family random intercept was added to all models to account for
phylogenetic relationships.
For each estimate of diversity (π, Hd , orHe ), we fit a series of five models to identify
geographic patterns: (1) a baseline model with just the terms and random
effects specified above, (2) a latitude model, (3) an absolute latitude
model, (4) a longitude model, (4) a latitude and longitude model, and
(5) an absolute latitude and longitude model. The latitude and longitude
models contained the predictor variable of interest (e.g. latitude,
longitude, etc.) in addition to the baseline model structure. Latitude,
absolute latitude, and longitude were all scaled and centered (mean 0,
SD 1). Latitude was included as a quadratic term to allow a peak in the
tropics, while longitude was incorporated as a smoothing spline using
the R package splines v.4.2.2 (R Core Team 2023) to account for its
uniquely circular nature.
We used the same model structure to compare macroecological drivers of
genetic diversity. As with the latitude and longitude models, we fit a
series of models that incorporated either mean sea surface temperature
(SST) (°C), mean chlorophyll-a concentration (mg/m3),
or both. SST was scaled and centered (mean 0, SD 1) and chlorophyll-a
was log-transformed and included as a quadratic term. All environmental
data were monthly climatologies (9.2 km2 resolution)
and were extracted from Bio-ORACLE (Assis et al. 2017; Tyberghein et al.
2012) using the R package sdmpredictors v.0.2.10 (Bosch & Fernandez
2021).