Results
Preliminary delimitation of Carob Evolutionary Units (CEUs) using
microsatellite data
Based on genotypes from 17 microsatellite loci (including SSR and SNPs)
and geographical coordinates, we grouped 1,020 carob trees into CEUs
(Fig. S1 in supplementary material). The Ward cluster method, which
considers geographic and genetic distances, found seven CEUs grouped in
two clusters. The western cluster included populations from South
Morocco (SM), South-West Spain (SWS) and South-East Spain (SES). The
eastern cluster included the remaining four CEUs: North Morocco (NM),
Central Mediterranean (CM), North-East Mediterranean (NEM) and
South-East Mediterranean (SEM). Based on microsatellite markers, the
overall genetic differentiation among these seven CEUs is 16% (Gst).
Pairwise Gst values visualized by a neighbor-joining tree (Fig. S1C in
supplementary material) confirmed the Ward clustering scheme, but placed
SEM and SM in intermediate positions along the NJ tree instead of
grouping these two CEUs together.
Assembly of RADseq markers and identification of outliers
The de novo assembly conducted on 36 samples with a clustering
threshold of 0.94 and a minimum number of samples by loci of 30,
produced a mean number of 48,828 loci by sample (Tab. S1) reduced to
13,371 loci retained after ipyrad filtering. The sequences of these loci
were used as a reference for the assembly of 350 genotypes, which
resulted in 10,012 loci with an
overall missing data rate of 64%. The structure of genetic diversity
revealed by this RADseq data set is congruent with the microsatellite
dataset (Fig. S1) and with a west-east pattern of differentiation (Fig.
S2A in supplementary material), even picturing a more pronounced
longitudinal structure. However, SEM and NM exchange their positions in
the RADseq dataset: NM appears in an intermediate position between west
and east CEUs close to CM whereas SEM forms a group with NEM (Fig. S2B).
The detection of the genotypes and loci having the highest missing rate
by Matrix condenser, and their subsequent removal, produced a matrix
with 190 samples and 12,767 loci, with 14% missing rate. After
filtering to keep one SNP by locus, the dataset included 190 samples for
3,613 loci (or SNPs), with 9.5% overall missing data and 3.5% median
missing rate by genotype. The Outflank method detected 56 outlier
markers within the dataset of 190 samples by 3,613 SNPs (Fig. S3 in
supplementary material). The global differentiation computed for these
outliers was six times higher than for other SNPs (Gst 0.74 vs. 0.12).
Nucleotide BLAST searches revealed that 27 outliers were assigned to
plastid genome, 14 to the mitochondrial genome, 11 to nuclear genomes
and 4 were not assigned. Excluding the outliers detected by Outflank,
the GWN dataset included 3,557 SNPs. The nuclear matrix produced for
phylogenetic inference after filtering out samples with missing data
rates above 20%, contained 171 C. siliqua and 4 C.
oreothauma genotypes for 3,284 unlinked SNPs and 8% missing data rate.
The MH matrix contained 190 C. siliqua genotypes for 946
nucleotides, 17 variable sites and 60% missing data. The PH matrix
contained 190 C. siliqua genotypes for 1,897 nucleotides, 31
variables sites and 47 % missing data.
Phylogenetic and genetic structure of carob trees across the
Mediterranean based on RADseq data
Phylogenetic analysis of MH and PH concatenated in two alignments
revealed the same pattern with two clearly differentiated haplogroups
with branch support over 95% (Fig. S4 in supplementary material). In
the Western Mediterranean CEUs (SES, SWS, NM and SM) a single haplogroup
represented 85 % and 94.5 % of plastid and mtDNA haplotypes
respectively. These western mtDNA and pDNA haplogroups were observed in
only 11 % of the samples from central and eastern CEUs. A
central-eastern haplogroup was found across CM, NEM and SEM CEUs, and
was very rare in Western CEUs. Although strictly matching otherwise,
mtDNA and pDNA haplogroups were not congruent in the northern part of SM
(population from Imouzzer Ida Ou Tanane area, in Morocco). In this area,
most samples (8/13) belong to the western haplogroup for mtDNA, but to
the central-eastern one for pDNA, the others have matching haplogroups
belonging to the central-eastern one (4/13) or the western one (1/13).
The strong West-East differentiation observed for the 350 genotypes
dataset (Fig. S2 in supplementary material) and for MH and PH datasets
(Fig. S4) was confirmed in the analysis conducted using GWN data. The
SVDquartets reconstruction of 171 samples of C. siliqua and four
samples of C. oreothauma as an outgroup, confirmed the monophyly
of C. siliqua and resolved a strongly supported first split of
the SM clade from the remaining C. siliqua lineages (Fig. 2A). A
subsequent split separated the remaining Western CEUs (SWS, SES and NM)
from Central and Eastern CEUs (CM, NEM, SEM). This phylogenetic topology
based on GWN data is congruent with MH and PH data in supporting
southwest Morroco as the ancestral area in the carob evolutionary
history.
SVDquartets phylogenetic topology is highly congruent with the K=4
genetic structure obtained with SSR data (Fig. 2A). The four genetic
groups match the following clades: SM, SWS + SES, NEM + SEM, CM. The
relative position of NM in the SVDquartets tree, although sister to the
other three western CEUs, is congruent with its mixed assignation to
different genetic clusters for SSR and RADseq data (Fig. 2A,D).
The genetic clustering estimated with snmf based on RADseq data was
repeated from K=2 to K=7 ancestral populations. The cross-entropy
criterion suggested two optimal solutions for 5 and 7 groups (Fig. S5B
in supplementary material), of which K=7 had the highest probability.
The five groups estimated with snmf (Fig. S5A) are congruent with SSR
data and the SVDquartets reconstruction for the Western CEUs: three
genetic groups have a good match with SM, SWS + SES and NEM + SEM,
respectively, whereas CM have ancestry mainly from the two remaining
genetic groups and NM shows admixture from the five genetic groups. The
most likely genetic clustering, K=7 solution (Fig. 2A,D), resolved the
differentiation of four ancestral populations roughly corresponding to
south Morocco (SM), south Spain (SWS and SES) and the eastern CEUs (NEM
and SEM). However, departures from a good match between
microsatellite-based CEUs and RADseq genetic groups are substantial too.
NEM exhibits admixture from SEM. CM includes individuals with admixture
from three ancestral groups, two of them (in orange and yellow) almost
restricted to this CEU and the last one, coloured in turquoise, present
in all CEUs although with higher proprotions in NM, SWS and CM. NM was
resolved as admixed with the contribution of all ancestral populations.
The PCA results are consistent with the main genetic groups found with
snmf, showing a clear delimitation of SM, SWS + SES and NEM+SEM groups,
but partial overlap between the NM and CM groups (Figure 2B, C). RADseq
genetic differentiation among the seven CEUs (Tab. S2. in supplementary
material) was moderate with an overall Gst of 11%; all values above the
mean involved a west-east differentiation, the highest differentiation
being SES between and SEM (Gst = 19%).
Impact of dispersals on the genetic diversity and structure of