2.8 | Haplotype analysis in candidate genomic regions
The VCF files corresponding to each previously identified candidate
region were extracted from the individual sequencing data using vcftools
v.0.1.14 (Danecek et
al., 2011) and converted into a FASTA file containing all individuals
using a custom script. The software RAxML v.8.2.4
(Stamatakis, 2014) was
used with a GTRGAMMA model (-m), rapid bootstrap method (-f), 630 seeds
for the parsimony inference procedure (-p) to build maximum likelihood
trees for each region and including the 63 individuals from location 1.
Trees were visualized with the R package ggtree(Yu et al., 2017). The
LD between haplotypes in pairs of candidate regions was tested with the
R GenePop package
(Rousset, 2008) and
p-values were corrected for the number of tests by computing q-values.
The R Poppr package
(Kamvar et al., 2014)
was used to estimate the index of association
(\(\overline{r}\)D) between combinations of haplotypes
in multiple candidate regions. Redundancy analysis (RDA,
(Rao, 1964)) was
performed with the R vegan package
(Oksanen et al.,
2012). We used a RDA to identify correlations between the haplotypes
defined in the candidate regions (explanatory variables) and the
quantitative traits (response variables), i.e., the DLA-R and the DLA-S
traits. The correlations between candidate regions and traits were
statistically tested with a permutation test with 1 000 permutations.
The multilocus adaptive pattern between populations was then further
investigated using pool-sequencing data. The software harp
(Kessner et al., 2013)
was used to estimate the frequency of the haplotypes defined from
individual sequencing in the 14 study pools.