Sampling, sequencing and genetic analysis
We sampled a total of 327 individuals of A. hueti (27), I.
mcclellandii (52), P. pectoralis (60), L. lutea (149),
and S. torqueola (39) from 36 sites (Supporting information,
Figure 1). From each individual, blood or muscle samples were collected
and stored at -80°C. Genomic DNA was extracted using a
QIAGEN Blood/Tissue Extraction Kit
following the manufacturer’s instructions. Two mitochondrial genes,
cytochrome b (Cytb ) and cytochrome c oxidase
subunit 1 (COI ), were selected for this part of experiments. The
mitochondrial genome is known to experience higher substitution rate
than nuclear genome (Ballard and Whitlock 2004). This is more evident in
vertebrates, such as birds and mammals (Lynch et al. 2006) with
substitution rate 25 folds higher in mitochondrial than in nuclear
genome. Therefore, the mitochondrial marker serves as a suitable marker
for the intraspecies study with rich informative polymorphism.
Both mitochondrial genes were amplified from all 327 samples using
published primers (Supporting information; Amer et al. 2013). PCR was
performed using High fidelity Superstar PCR Mix (GenStar, Beijing,
China). The 30 μL PCR mixture contained 15
μL PCR Mix, 1.5 μL of each primer of
10μM, 3 μL of template DNA and 9 μL of deionized water. The PCR program
was set as follows: denaturation at 94°C for 4 min; 30 cycles of 94°C
for 10 s; annealing for 30 s (temperature in Supporting information);
extension at 72°C for 90 s; and final extension at 72°C for 8 min. The
amplicons were sent to Tian Yi Hui Yuan (Guangzhou, China) for Sanger
sequencing from both ends. To rule out sequencing errors and produce an
exact alignment, sequences were manually checked and the ends trimmed in
MEGA X (Kumar et al. 2018).
We concatenated the 2 genes (COI + Cytb ) into one sequence
of 1681-1861 bp for further analysis (Table 1). Aligned sequences were
imported to DNaSP 6.0 to calculate θ, π and Tajima’s D, and to produce a
haplotype file (Rozas et al. 2017). Single nucleotide polymorphisms
(SNPs) were retained and formatted into 5 matrices for further analysis
of the 5 species, respectively. A haplotype network was calculated for
each species with Network 10.0, using the median joining algorithm
(Polzin and Daneshmand 2003). Because the uneven sample size was not
suitable for regular estimates of diversity based on allele frequency,
such as Fst (Nei and Miller 1990, Cruickshank and Hahn 2014), we
calculated the number of segregating non-synonymous (na) and synonymous
(ns) sites based on the JC69 model (Jukes and Cantor 1969). Pairwise
distance, N, was approximated (using Python scrips) as the mean of
among-individual ns+na values. Populations with only one individual were
removed in calculations based on population genetic data. Genetic
estimate based on species and genetic clustering, as mentioned in the
following text, still involved all the sequences.
To determine population clustering, we performed genetic clustering on
the time tree using gmyc function in the R package splits(Ezard et al. 2021). The aligned sequences were tested for best
substitution model in MEGA X (Kumar et al. 2018). A time tree was built
using BEAST with substitution rate of
10-8 site/year
((Nguyen and Ho 2016), 107 MCMC chain length, best
fitted substitution model “HKY” for all species and other parameters
as default. The time tree was imported to R for clustering analysis
using gmyc .