Data Analysis of DNA Sequences
All the DNA sequences were edited and assembled with the program SeqMan software package (DNA Star), and multiple alignments were generated with Bioedit, version 7.0.4.1 (Hall, 1999). The three cpDNAs were combined by PAUP, version 4.0b10 (Swofford, 2003). Nuclear genes had heterozygous sites in some individuals, identified by overlapping peaks in chromatograms. DnaSP, version 5.0 (Librado & Rozas, 2009) was used to resolve the nuclear DNA sequences by applying the algorithms of PHASE (Stephens & Donnelly, 2003; Stephens, Smith, & Donnelly, 2001). The phased nDNAs were used in the following analyses.
Sequences were collapsed into haplotypes with the DnaSP version 5.0 (Librado & Rozas, 2009). Haplotype geographic distribution was drawn in ArcGIS, version 10.5 (http://desktop.arcgis.com). For cpDNA, genetic variation indices, such as the number of haplotypes (h), haplotype diversity (H d), variable sites (S) and nucleotide diversity per site (π) were calculated by DnaSP, version 5.0 (Librado & Rozas, 2009). For each nuclear gene, besides the above indices, we also used DnaSP (Librado & Rozas, 2009) to calculate Watterson’s parameter (θ) and the minimum number of recombination events (R m).
Genetic differentiations between the two Cycas species were estimated based on the combined cpDNA and each low copy nuclear gene. The Wright’s fixation index (F ST) was calculated for each locus using AMOVA implemented in Arlequin, version 3.11 (Excoffier, Laval, & Schneider, 2005) with significance tested using 1000 permutations as described in Excoffier et al. (1992) (Excoffier, Smouse, & Quattro, 1992). We used Network, version 4.6.1.2 (Forster, Forster, & Watson, 2007) to construct haplotype relationships following the media-joining calculation at each locus, and all variation character states were specified as unordered and equally weighted.
Effective population size, migration rate and divergence time were estimated based on the isolation with migration (IM) model using the combined cpDNA and five low copy nuclear genes in the IMa program (Hey & Nielsen, 2007). The HKY model was chosen for all loci. We used the calculate mutation rate 2.48 × 10-10 per site per year for the cpDNA data (Appendix S1). For nuclear genes, the average mutation rate 6.7 × 10-10 per site per year which was derived from the average rates of Cycadales (De La Torre, Li, Van de Peer, & Ingvarsson, 2017) to estimate divergence time between C. bifida and C. micholitizii .
Ecological niche modeling (ENM) was performed to evaluate present suitable habitats for C. bifida and C. micholitzii in MaxEnt, version 3.4.1 (Phillips & Dudik, 2008). The current distribution information for both species was obtained from the sampling sites (Table 1), a total of five presence records of C.bifida and nine records of C. micholitzii . Nineteen bioclimatic variables under current conditions (1970-2000) in 30 arc-second resolution were downloaded from the WorldClim 2.1 (www.worldclim.org). Ten replicates of MaxEnt were run, and model performance was assessed using the area under the curve (AUC) of the receiver operating characteristic (ROC) plot with 25% of the localities randomly selected to test the model. Niche similarity between C. bifida and C. micholitzii was measured through two niche overlap indices, Hellinger-derived I and Schoener’s D. These metrics are implemented in the software ENMTools, version 1.4.3 (Warren, Glor, & Turelli, 2010). Firstly, the actual niche overlap (the empirical I and D) between the two species was calculated. Then, a niche identity test was conducted to compare the overlap of the species pair’s actual niches to a distribution of niche overlap obtained from 100 pseudoreplicates niches datasets that were obtained by pooling all the occurrences of the two species.
Demographic history of C. bifida and C. micholitzii was inferred in three ways, respectively. First, the signatures of demographic change were examined with neutrality tests, including Tajima’s D , Fu and Li’s D * andF * and Fu’s F S, using DnaSP, version 5.0 (Librado & Rozas, 2009). The sum-of-squared deviations (SSD) and raggedness index as well as their P-values were calculated with the software Arlequin, version 3.11 (Excoffier et al., 2005). Second, we used a pairwise mismatch distribution to test for population expansion in DnaSP, version 5.0 (Librado & Rozas, 2009). Third, an Extended Bayesian Skyline Plot was created by the program BEAST, version 1.8.2 (Drummond & Rambaut, 2007), to further investigate the demography history of the two species. A strict molecular clock models with 6.7 × 10-10 per site per year mutation rate (De La Torre et al., 2017) for each nuclear gene and 2.48 × 10-10 per site per year mutation rate for cpDNA as stated above, and the Coalescent: Extend Bayesian Skyline Plot (EBSP) was specified as tree prior. The analysis was run for 109 generations sampling every 100,000 generations. The ESS parameter of each process was checked by TRACER, version 1.5 (Rambaut & Drummond, 2009), until a stable value exceeding 200 was reached, suggesting that there was acceptable mixing and sufficient sampling.