(b) Model structure
To compare our hypotheses, we fit generalized linear mixed effect models (GLMMs). This mixed effects framework allowed us to account for other factors that could influence genetic diversity but that were not the focus of our study. For models with log-transformed π as the response variable, we ran linear GLMMs with a Gaussian error term using the lme4 package v.1.1.26 (Bates et al. 2014). For models withHe or Hd as the response variable, we ran beta GLMMs using the glmmTMB package v.1.1.7 (Brooks et al. 2017). All beta models were run specifying the ordbeta family, which uses a logit link function and enables the incorporation of 0 and 1 values into the model (Kubinec 2022). For the mtDNA models ofHd , the length of the marker in base pairs was included as an explanatory variable. For the microsatellite models, we included whether or not the microsatellite primer was cross-species amplified. Marker length and cross-species amplification, as well as range position, were all scaled and centered to have a mean of 0 and a standard deviation (SD) of 1. We incorporated the source (the study the data came from) as a random intercept for all models to help account for other study-specific methodological choices, while marker name (the specific mtDNA marker used) was added as a random intercept for the mtDNA models to help account for marker-specific mutation rates and selective constraints. Marker name was included as a random intercept because we recorded mtDNA genetic diversity from across the mitogenome and did not limit our dataset to COI or cyt-b markers. Finally, a nested genus/family random intercept was added to all models to account for phylogenetic relationships.
For each estimate of diversity (π, Hd , orHe ), we fit a series of five models to identify geographic patterns: (1) a baseline model with just the terms and random effects specified above, (2) a latitude model, (3) an absolute latitude model, (4) a longitude model, (4) a latitude and longitude model, and (5) an absolute latitude and longitude model. The latitude and longitude models contained the predictor variable of interest (e.g. latitude, longitude, etc.) in addition to the baseline model structure. Latitude, absolute latitude, and longitude were all scaled and centered (mean 0, SD 1). Latitude was included as a quadratic term to allow a peak in the tropics, while longitude was incorporated as a smoothing spline using the R package splines v.4.2.2 (R Core Team 2023) to account for its uniquely circular nature.
We used the same model structure to compare macroecological drivers of genetic diversity. As with the latitude and longitude models, we fit a series of models that incorporated either mean sea surface temperature (SST) (°C), mean chlorophyll-a concentration (mg/m3), or both. SST was scaled and centered (mean 0, SD 1) and chlorophyll-a was log-transformed and included as a quadratic term. All environmental data were monthly climatologies (9.2 km2 resolution) and were extracted from Bio-ORACLE (Assis et al. 2017; Tyberghein et al. 2012) using the R package sdmpredictors v.0.2.10 (Bosch & Fernandez 2021).