Identifying alleles important to acidic adaptation, and quantifying their frequencies in marine stickleback
To identify alleles important to the adaptation of stickleback to acidic habitats, we performed genome-wide differentiation mapping between the acidic and basic sample pools. That is, we scanned the poolSeq SNPs for positions exhibiting extremely high global differentiation between stickleback from acidic versus basic lakes. The reason why we did not define genetic variation important for acidic adaptation simply as SNPs highly differentiated between acidic and marine fish is that this would mostly have uncovered genetic variation important to marine-freshwater divergence in general. Such variation is abundant in North Uist stickleback (Figure S3 in Haenel et al. 2019; see also Jones 2012b; Roesti et al. 2014; Bassham et al. 2018; Fang et al. 2020; Terekhanova et al. 2019). Our focus, however, was specifically on genetic variation for which gene flow into marine fish must be rare and geographically restricted. Acidic-basic differentiation was expressed by the absolute allele frequency difference AFD. Positions qualified as high-differentiation SNPs if they showed AFD equal or superior to 0.85, were autosomal, and were physically separated by at least 100 kb to ensure independence (tight linkage disequilibrium typically decays over much shorter distances in stickleback, e.g., Roesti et al. 2015). With these criteria, we obtained a panel of 50 ‘adaptive SNPs’, that is, positions at which one allele appears strongly and consistently selectively favored in acidic habitats. As a basis for comparison, we analogously selected a panel of 500 ‘baseline SNPs’ from the same genome scan. These latter polymorphisms were also required to be separated by at least 100 kb, but to exhibit minimal differentiation (AFD within 0.1% of the genome-wide median) between the acidic and the basic pool. The latter criterion ensured that these SNPs did not tag genome regions (consistently) involved in acidic adaptation. At each of the adaptive SNPs, we then defined the nucleotide predominant in the acidic pool as the ‘acidic allele’, and determined and graphed the frequency of these alleles in all six marine sample pools. An analogous analysis was performed for the baseline SNPs, here defining the acidic allele as the one relatively more common in the acidic than the basic pool. Our prediction was that if genetic variation at the adaptive SNPs in marine stickleback reflects gene flow-selection balance, the frequency of the acidic alleles at these markers (but not at the baseline SNPs) should be elevated in marine stickleback sampled on North Uist. As a resource, we additionally compiled all genes located within a 100 kb window centered at each adaptive SNP.
For three exemplary adaptive SNPs, we further visualized the diversity and distribution of surrounding haplotypes among our samples based on haplotype networks. The markers chosen included the adaptive SNP exhibiting the strongest acidic-basic differentiation in the present study (AFD = 0.96), the adaptive SNP tagging the genome region showing the strongest acidic-basic differentiation in a previous investigation (Figure 3A in Haenel et al. 2019), and the adaptive SNP located on a known inversion polymorphism (Jones et al. 2012b; Roesti et al. 2015; Haenel et al. 2019). Using the raw nucleotide counts derived from indSeq, we performed individual diploid genotyping for all nucleotide positions exhibiting a read depth of 10x or greater across a 5 kb window centered on the adaptive SNPs, considering positions heterozygous if their MAF was greater than 0.1. Individuals with >25% missing genotypes were omitted. Based on the remaining data, positions qualified as informative SNPs if they displayed <=40% missing genotypes and a MAF of at least 0.05. The resulting genotype matrices were subjected to phasing with fastPHASE v1.4.8 (Scheet & Stephens 2006; settings provided in Supplementary codes). Haplotype genealogies were then constructed with RAxML v8 (Stamatakis 2014) and visualized as haplotype networks in FITCHI (Matschiner 2016) (settings provided in Supplementary codes).