(a) Data collection
We conducted a literature search in Web of Science to build a comprehensive database of published genetic diversity observations in marine fishes. The following keyword search terms were used: fish* microsatellite* (marine OR ocean OR sea) and fish* mtDNA* (marine OR ocean OR sea) . Only studies published prior to 5 January 2020 were included in the dataset. This was a ”Class II” study in the sense of Leigh et al. (2021) and had the benefits of more easily compiling nuclear diversity data, accounting for allele frequencies in diversity estimates, and using expert-defined populations, but the downside of compiling data across fewer species, in contrast to ”Class III” studies that use existing online databases like NCBI or BOLD to download, grid, and analyze unique mitochondrial sequencies. During the literature search, we excluded anadromous, catadromous, and estuarine species from the database, as well as data from populations that were captive, farmed, or stocked. We also excluded data from studies that either did not report the corresponding latitudinal & longitudinal coordinates, or only vaguely identified the sampling location (precision less than 3°). For a more detailed explanation of further exclusion criteria see the Supplemental Methods.
We recorded expected heterozygosity (He ) for microsatellite studies, and nucleotide diversity (π) or haplotype diversity (Hd ) for mtDNA studies as reported. The standard errors of He , Hdor π were also recorded (or calculated from the standard deviations), when provided. For mtDNA, marker length (in base pairs) was recorded. For microsatellite studies, we recorded whether or not the primers were originally developed in a different species, as cross-species amplification can negatively influence diversity estimates (Barbará et al. 2007). When possible, we recorded He on a per-marker basis, though some studies reported only average heterozygosity across markers. For these studies, we listed each locus separately and extrapolated per-marker diversity by adding a normally distributed error to the average diversity estimate (Pinsky & Palumbi 2014). This error distribution had a standard deviation equal to that reported within the study. If a within-study standard deviation was not available, we used the average standard deviation (0.24) reported across all studies.
In addition to following global patterns, genetic diversity often declines towards a species’ range margin, as populations at the edge tend to be smaller in size relative to those at the range center (Clark et al. 2021; Eckert et al. 2008). To help account for these cross-range effects, we used the R package rfishbase v.3.1.6 (Boettiger et al. 2011) to download species range data from Aquamaps (Kaschner et al. 2019). We then calculated the latitudinal range position of each sampled population in our database. This value ranged from 0 to 1, with 0 indicating the population was located at the center of itsspecies range and 1 indicating the population was located at the very northern or southern edge of its species range. Finally, we also recorded the family and genus for each species.