2.4 | Demographic reconstruction and admixture analyses
We made a consensus fastq file for each caribou and the reindeer from their bam files, using the Samtools and BCFtools 1.5. This was converted into an input file and run in Pairwise Sequentially Markovian Coalescent (PSMC) model in PSMC (Li & Durbin, 2011) to investigate past effective population size changes. These were plotted using the general mammal mutation rate of 1.0E-9 (Li & Durbin, 2011) and a generation time of 7 years (COSEWIC 2014-2017).
To calculate admixture statistics, we used the R package admixr (Petr, Vernot, & Kelso, 2019) to run ADMIXTOOLS (Patterson et al., 2012). We converted our VCF file containing the Sitka deer (to use as an outgroup) into EIGENSTRAT format using a C++ script (found here: https://github.com/bodkan/vcf2eigenstrat). As the package does not work when including more than 600 scaffolds, we filtered the dataset to include SNPs found only on the 600 largest scaffolds, which encompassed over 98% of the reference genome assembly (the scaffold L90 is 285; Taylor et al., 2019). We used the EIGENSTRAT files to run f3, f4, and f4-ratio statistics. See Reich, Thangaraj, Patterson, Price, and Singh (2009) and Patterson et al., (2012) for full explanations of these tests, but briefly, the f3 statistic is a three-population test that can calculate whether population ‘C’ is a mixture of two other populations, ‘A’ and ‘B’. A negative f3 statistic indicates that population ‘C’ is a mixture of ‘A’ and ‘B’. The f4 statistic is an ABBA BABBA test and acts similarly to D statistics. It is a four-population test which requires a phylogenetic set up including two sister groups, a test group to see if introgression has occurred into one of the two sister groups, and an outgroup (which we always set as the Sitka deer). An f4 statistic which significantly differs from 0 indicates gene flow, whether it is positive or negative tells you into which of the sister populations. In the f4-ratio test, alpha is calculated, which is the proportion of the genome in population ‘X’ that originates from population ‘B’ as opposed to population ‘A’ (the proportion of population ‘A’ is calculated as 1 – alpha).
For these tests, we grouped the four barrenground genomes from Bluenose and Qamanirjuaq as they show no differentiation and testing them separately made no difference to the results. The four boreal caribou genomes from Ontario and Manitoba were run separately as these do show differentiation and grouping them did affect the outcome. We focussed on using these tests to investigate 1) the amount of barrenground introgression into eastern migratory caribou in Ontario/Manitoba and Quebec/Labrador (f3, f4 and f4-ratio tests) separately as they have non-overlapping ranges, 2) introgression between eastern migratory caribou in Ontario/Manitoba and Quebec/Labrador (f4 test), 3) introgression between boreal caribou of NAL origin and the mountain caribou (f4 and f4-ratio tests for significant populations), and 4) introgression between boreal caribou of NAL origin and the Northwest Territories boreal caribou of BEL origin (f4 and f4-ratio tests), since our goal was to investigate the potential role of adaptive introgression in leading to parallel evolution.
We first investigated introgression from barrenground into the Manitoba and Ontario boreal populations (f4 test), and due to its current geographic isolation and low levels of introgression from the barrenground lineage, we used Ignace as the representative NAL boreal population. Similarly, to investigate introgression from the NAL into the BEL, we used the Grant’s caribou as the sister group as these showed the least amount of introgression from the NAL lineage (f4 test). In the tests to investigate BEL introgression into the boreal caribou of NAL origin, we used eastern migratory Quebec/Labrador caribou as the sister group which had the lowest introgression from the BEL (f3, f4 and f4-ratio tests). For the full set up our tests, see Supporting Information.
To investigate patterns of introgression across the genome we used the programme Dsuit (Malinsky, Matschiner, & Svardal, 2019). The Dinvestigate function can be used to calculate introgression in windows across the genome, and this was used to calculate the fDand fDM statistics (Malinsky et al., 2015) for sliding windows of 1,000 SNPs incremented by 250 SNPs across the genome. We used this programme to investigate introgression between NAL boreal caribou and the mountain caribou as well as between NAL boreal caribou and BEL boreal caribou to further investigate the process of parallel evolution. Again, we used Ignace as the representative NAL boreal population and as it is the most geographically isolated and has low levels of introgression from BEL. Similarly, we found Grant’s caribou to show the lowest levels of introgression from the NAL and so we used them as the sister group into the BEL boreal and Columbia North caribou. The Sitka deer was still used as the outgroup in all tests.
To further investigate the potential role of adaptive introgression in the parallel evolution of the BEL boreal caribou (see Results) we investigated the gene composition of the most introgressed regions within the BEL boreal caribou identified as having originated from Ignace. We compared these to the most introgressed regions from Ignace into all mountain populations as adaptive introgression is unlikely to have played a role in the parallel evolution of these populations due to the uncovered patterns of introgression (see Results). To do this, we extracted the sequences for all regions across the genome with an fDM score over 0.2 (as it is the most conservative statistic) using Bedtools 2.29 (Quinlan & Hall, 2010). To make sure the sister group used in the set-up of the test didn’t bias the results, we only included regions that were flagged as highly introgressed from the NAL group when using both Grant’s and Peary caribou as the sister group. We used the command line version of BLAST 2.6 (Altschul, Gish, Miller, Myers, & Lipman, 1990) to search for the genes present in these introgressed regions and genes with mRNA or predicted mRNA hits in at least two species and with an E score of 0.