Material and Methods
Sampled colonies
We sampled 265 Western honey bee colonies from two different subspecies,
namely A. m. mellifera (MEL) and A. m. carnica (CAR)
(Figure 1). Conserved MEL colonies were sampled in Switzerland (CS_CH)
and France (CS_FR). The majority of the colonies belonged to MEL from
the selection programme (SL_CH) in Switzerland, which simultaneously
represents five different patrilines (P1-P5). The sample size,
geographic origin and location of the five different patrilines and
conserved MEL colonies are summarised in Table 1 and Supplementary
Figure 1. It should be noted that P1 is located in close proximity to
the conservation area (CS_CH) and that P4 and P5 have a common maternal
origin. The 49 sampled CAR colonies originated from Switzerland (22),
Sweden (3), Norway (3) and the United States of America (21). For each
colony, approximately 500 workers were collected with a standardized
sampling method to include all existing patrilines among workers in the
colony.
DNA extraction and pool
sequencing
DNA extraction and pool sequencing of the sampled colonies are described
in detail by Guichard et al. (2021). Briefly, the approximately 500
workers per colony were shredded in a DNA extraction solution. Pair-end
sequencing was performed on an IlluminaTM HiSeq 3000
or a NovaSeqTM 6000 platform. To significantly
increase computing time, the pool sequence analysis was restricted to an
informative marker panel including 7,023,977 genome-wide SNPs. Raw reads
from pool sequencing of the 265 colonies were aligned to the honey bee
reference genome Amel_HAV3.1, Genebank assembly accession
GCA_003254395.2 (Wallberg et al., 2019). After the alignment, the
resulting BAM files were converted into pileup files using the samtools
mpileup utility (Li et al., 2009). Files produced by mpileup were
interpreted by the PoPoolation2 utility mpileup2sync (Kofler, Pandey, &
Schlötterer, 2011) for the Sanger Fastq format, with a minimum quality
of 20. Finally, sync files were converted to a depth file containing a
sequencing depth value for each SNP and count files summarising
reference and alternative allele counts for each SNP.
Quality filtering and dosage data
conversion
Based on the aforementioned count files, we removed 99,555 SNPs with
multiple alternative alleles and 207,904 SNPs with an excessively high
and low sequencing depth. After this quality control, we calculated the
frequencies of the reference and the alternative alleles for 6,716,518
SNPs and additionally excluded 771,835 homozygous loci. The remaining
5,944,683 SNPs were summarised in PLINK dosage and map files. To detect
ROH segments of the colonies, dosage data were converted to hard-called
genotypes using the command – import dosage with ahard-call threshold of 0.4, as implemented in PLINK 2.0 (Chang et
al., 2015). Genotypes set to missing during the conversion were imputed
with the software BEAGLE 5.2 (Browning, Zhou, & Browning, 2018). For
the population structure analyses, hard-called SNP genotypes were
further pruned for minor allelic frequency above 5%, which resulted in
1,505,596 genome-wide SNPs.
High-resolution population
network
To ascertain the high-resolution population structure of honey bees, we
performed a population network visualization. The different components
involved in the so-called NetView approach are described in detail by
Neuditschko et al. (2012) and Steining et al. (2016). Briefly, we
computed genetic distances by subtracting pairwise relationships
identical-by-state (IBS) from 1 and applied the algorithm in its default
setting (number of k nearest neighbours k -NN = 10). To
illustrate the genetic relatedness between neighbouring honey bee
colonies, we associated the thickness of edges (connecting lines) with
the proportion of the genetic distance, with thicker edges corresponding
to lower genetic distances. To identify highly inbred honey bee
colonies, we scaled the node size of each colony based on the individual
FROH. The node colour was associated according to the
sampled subpopulations and the individual level of admixture at the
optimal number of clusters.
Admixture
Colony admixture levels and genetic distances (FST)
between the subspecies were determined using the program Admixture 1.23
(Alexander, Novembre, & Lange, 2009). We ran Admixture for 100
iterations increasing K from 2 to 10. Convergence between independent
runs at the same K was monitored by comparing the resulting
log-likelihood scores (LLs) following 100 iterations, and was inferred
from stabilized LLs with less than 1 LL unit of variation between runs.
Cross validation (CV) error estimation for each K was performed to
determine the optimal number of clusters. Admixture results increasing K
from 2 to 7 were visualized with the program Distruct 1.1 (Rosenberg,
2004) and integrated in the high-resolution population network, as
described above.
Runs of homozygosity
Runs of homozygosity segments were determined with an overlapping window
approach implemented in PLINK v.1.9 (Chang et al., 2015) including the
aforementioned 5,944,683 genome-wide SNPs. The following settings were
applied: a minimum SNP density of one SNP per 80 kb, a maximum gap
length of 100 kb, and a minimum length of homozygous segment of 200 kb,
while two heterozygotes were permitted in each segment. The total number
of ROH (NROH), the total length of ROH segments
(SROH) and the average length of ROH
(LROH) were summarised for the two subspecies (CAR and
MEL) and the respective subpopulations. The genomic-based inbreeding
coefficients (FROH) were calculated by dividing
SROH by the length of the autosomal genome
(LAUTO), which was set to 220.76 Mb (Wallberg et al.,
2019). Furthermore, we compared FROH of 74 SL_CH
colonies with pedigree-based inbreeding coefficients
(FPED) using a linear regression model as implemented in
the statistical computing software R (R Core Team, 2013), whereas
FPED were calculated following the method described by
Brascamp et al. (2014).
Homozygosity islands and gene
functions
Homozygosity islands of the two subspecies were determined based on
overlapping homozygous regions present in more than 50% of the colonies
with the R package detectRUNS (Biscarini, Cozzi, Gaspa, & Marras,
2019). The length and distribution of the homozygosity islands on the
chromosomes were visualised using the R package RIdeogram (Hao et al.,
2020). We used the NCBI genome data viewer
(https://www.ncbi.nlm.nih.gov/genome/gdv/), and the reference
genome assembly Amel_HAv3.1 (Wallberg et al., 2019) to identify genes
located in homozygosity islands. For each subspecies, we regrouped the
characterised genes by the function term based on the annotation chart
from the open source Database for Annotation, Visualization, and
Integrated Discovery v.6.8 package (https://david.ncifcrf.gov),
using the Apis mellifera annotation file, with a minimum of two
genes per functional group and a Bonferroni-corrected significance
threshold of p<0.05. Furthermore, we specified the known
functions of the identified genes by conducting a literature review.