SNP calling and statistical genomic analyses
Fastp v. 0.19.7 (Chen, Zhou, Chen, & Gu, 2018) was used to trim the
RAD-seq reads, keeping reads with a minimum quality of 15 before mapping
individual sequences against the reference genome of the Great tit
(Laine et al., 2016; GenBank assembly accession: GCA_001522545.3) with
BWA v0.7.17 (H. Li & Durbin, 2010). Genotyping was conducted with
stacks v2.41 (Rochette, Rivera-Colón, & Catchen, 2019)
“gstacks ” and “population ” functions, using
“snp ” model, filtering for mapping quality > 10,
alpha = 0.05, minor allele frequency > 0.05 and observed
heterozygosity < 0.65. Loci were retained if present in at
least 90% of individuals in each population. Loci with extreme low or
high coverage were removed (5% extremes filtered out using vcftools
v0.1.15 Danecek et al., 2011). After filtering, 74,137 SNPs were
retained for subsequent population genomic analyses.
To document genomic variation among urban and rural great tits from the
three locations we used a redundancy analyses (RDA), with location
(Barcelona, Montpellier and Warsaw), environment (urban or rural) and
sex as explanatory variables. Partial RDA was also produced to test for
each variable effect (environment, location or sex) alone after
controlling for all other variables. The effect of a given factor was
considered significant with a p-value < 0.05.
To estimate genome-wide differentiation between populations we used Weir
and Cockerham’s FST (Weir & Cockerham, 1984) computed
using the StAMPP R package (Pembleton, Cogan, & Forster, 2013). Average
FST was estimated using all SNPs, and confidence
intervals were assessed using 1000 bootstrap replicates.
We used two methods to investigate outlier SNPs potentially under
divergent selection between forest and urban populations: an
FST-outlier based method (using Bayescan v2.1, Foll &
Gaggiotti, 2008), and a multivariate method (using a RDA, Forester,
Jones, Joost, Landguth, & Lasky, 2016) aiming respectively at
identifying strong outliers indicating footprints of differentiation for
each population pair and weaker outliers that represents footprints of
divergent selection typically expected in polygenic adaptations in
response to complex environmental heterogeneity across several
population pairs studied at once.
First, we ran Bayescan (with default parameter options) for each pair of
populations (Barcelona, Montpellier & Warsaw) separately to detect SNPs
with outlier values of FST. As recommended by (Foll &
Gaggiotti, 2008) we considered SNPs as outliers when they displayed a
q-value above the 0.1 threshold. Second, following a similar procedure
as described above for the RDA analysis, we used a constrained RDA to
investigate the effect of habitat (forest vs. city) and to identify
outliers SNPs that displayed more than 3 times SD from the mean score on
the constrained axis (Forester et al., 2016).