(a) Data collection
We conducted a literature search in Web of Science to build a
comprehensive database of published genetic diversity observations in
marine fishes. The following keyword search terms were used: fish*
microsatellite* (marine OR ocean OR sea) and fish* mtDNA* (marine
OR ocean OR sea) . Only studies published prior to 5 January 2020 were
included in the dataset. This was a Class II study in the sense of Leigh
et al. (2021) and had the benefits of more easily compiling nuclear
diversity data, accounting for allele frequencies in genetic diversity
estimates, accounting for methodological covariates that may explain
substantial diversity variation, applying more precise data quality
filters, and using expert-defined populations that do not
inappropriately split or lump different geographic locations. Class II
studies also often compile data across fewer species, in contrast to
Class III studies that use existing online databases like NCBI or BOLD
to download, grid, and analyze unique DNA sequences. During the
literature search, we excluded anadromous, catadromous, and estuarine
species from the database, as well as data from populations that were
captive, farmed, or stocked. We also excluded data from studies that
either did not report the corresponding latitudinal & longitudinal
coordinates, or only vaguely identified the sampling location (precision
less than 3°). For a more detailed explanation of further exclusion
criteria see the Supplemental Materials.
We recorded expected heterozygosity (He ) for
microsatellite studies, and nucleotide diversity (π) or haplotype
diversity (Hd ) for mtDNA studies as reported. The
standard errors of He , Hdor π were also recorded (or calculated from the standard deviations),
when provided. All measures of genetic diversity were recorded at the
population level. For mtDNA, marker length (in base pairs) was recorded.
For microsatellite studies, we recorded whether or not the primers were
originally developed in a different species, because cross-species
amplification can negatively influence diversity estimates (Barbará et
al. 2007). When possible, we recorded He on a
per-marker basis, though some studies reported only average
heterozygosity across markers. For these studies, we listed each locus
separately and extrapolated per-marker diversity by adding a normally
distributed error to the average diversity estimate (Pinsky & Palumbi
2014). This error distribution had a standard deviation equal to that
reported within the study. If a within-study standard deviation was not
available, we used the average standard deviation (0.24) reported across
all studies.
In addition to following global patterns, genetic diversity often
declines towards a species’ range margin, as populations at the edge
tend to be smaller in size relative to those at the range center (Clark
et al. 2021; Eckert et al. 2008). To help account for these cross-range
effects, which may be distinct from latitudinal effects, we used the R
package rfishbase v.3.1.6 (Boettiger et al. 2011) to download species
range data from Aquamaps (Kaschner et al. 2019). We then calculated the
latitudinal range position of each sampled population in our database.
This value ranged from 0 to 1, with 0 indicating the population was
located at the center of its species range and 1 indicating the
population was located at the very northern or southern edge of its
species range. Finally, we also recorded the order, family, and genus
for each species.