Stickleback samples, DNA library preparation and sequencing
A precondition for our analysis of SGV in marine stickleback was the initial identification of genetic polymorphisms important to acidic adaptation. For this, we considered five acidic and five basic lakes from North Uist from which individual DNA was already available (Haenel et al. 2019) (Figure 1b, Table S1). We refer to the latter habitat type as ‘basic’ for terminological consistency with our previous work, but emphasize that the fish inhabiting these lakes represent the standard freshwater stickleback ecomorph wide-spread across G. aculeatus ’ range. We chose 20 individuals from each of these freshwater populations at random and combined their DNA to equal molarity without PCR-enrichment into either an acidic or a basic pool of 100 individuals each. The goal of this pooling (and the subsequent pooled sequencing, hereafter poolSeq) was to obtain relatively precise allele frequency estimates in acidic versus basic stickleback in general, while ignoring allele frequencies within each specific population. To nevertheless have access to individual genotypes and haplotype information, we additionally chose two individuals from each acidic and basic population at random for individual sequencing (indSeq).
To allow exploring the extent to which adaptive genetic variation discovered in freshwater fish is present as SGV in marine stickleback, we focused on samples from six locations across the Atlantic Ocean: North Uist (NU), Ireland (IR), The Netherlands (NL), Germany (DE), Iceland (IS) and Eastern Canada (CA) (Figure 1b, Table S1; note that North Uist subsumes two nearby marine sample sites, ARDH and OBSM). From each of these marine locations, we aimed for a sample size of around 25 individuals. Except for North Uist, from which marine individual-level whole-genome sequence data were already available (Haenel et al. 2019), individual DNA was extracted using the Quick-DNA TM Miniprep Plus Kit (Zymo Research, Irvine, CA, USA). For the estimation of population allele frequencies via poolSeq, individual DNA was then combined to equal molarity without PCR-enrichment within each of the five new locations. In addition, four individuals from each of these locations were chosen at random for indSeq (Table S1).
The 47 total DNA libraries (7 pools and 40 individuals) were paired-end sequenced to 150 base pairs on an S4 flow cell of an Illumina NovaSeq 6000 instrument, producing a genome-wide median read depth per base pair of 85x on average across the pools, and of 16x across the individuals (details given in Table S1).