2.7 Species tree reconstruction and species delimitation
analyses
For SNPs and mtDNA datasets, maximum likelihood (ML) and bayesian
inference (BI) approaches were performed. ML for both datasets was
implemented in IQ-TREE v1.6.10 (Nguyen, Schmidt, Von Haeseler, & Minh,
2015) using CIPRES (Miller, Pfeiffer, & Schwartz, 2010) with 10,000
ultrafast bootstrap (UFboot) iterations. ModelFinder estimated GTR as
the best-fitted model according to Akaike Information Criterion (AIC)
for SNPs dataset and GTR+Γ+I for mtDNA dataset. BI reconstruction was
assessed using MrBayes v3.2.6 (Ronquist et al., 2012), employing the
entire matrix of 1,707 SNPs and the mtDNA haplotypes following the same
fit-model implemented in ML analyses. For each dataset, two independent
runs were performed with four Markov Chain Monte Carlo (MCMC) for each
run for 50 million generations, sampling every 1,000 generations. The
first 25% of the runs were removed as burn-in, and stability and
sufficient mixing of parameters (ESS>1,000 for SNPs and
ESS>200, for mtDNA) were checked using Tracer v1.6
(Rambaut, Suchard, Xie, & Drummond, 2014). UFboot values and Posterior
Probabilities (PP) resulted from ML and BI, respectively, were plotted
on each node of the reconstructed tree of each dataset. Additionally,
based on SNPs dataset, a species tree was estimated using the
coalescent-based methods, SNAPP (Bryant, Bouckaert, Felsenstein,
Rosenberg, & RoyChoudhury, 2012) implemented in Beast v2.5.2 (Bouckaert
et al., 2014). In this case, we could only include two individuals,
selecting individuals with the lowest numbers of missing data, from each
one of the groups defined in clustering analyses (see the previous
section) as a result of the high computational demand of the method. Due
to the reduction of the number of individuals, some sites could become
monomorphic either because of the reference or the alternative allele,
and some sites might also have had no data left for one or more of the
sampled localities. Therefore, we decided to filter out monomorphic
sites and sites with no data for one or more populations with bcftools
v1.7 (Li et al., 2009), resulting in a new matrix of 1,679 SNPs. We used
the default model parameters in SNAPP for U and V equal to one and we
ran the analysis for 10,000,000 MCMC generations, sampling every 1,000
generations. Stationarity and convergence of data was checked using
Tracer v1.6 (Rambaut et al., 2014). The complete set of trees were
visualized in Densitree v2.2.5 (Bouckaert & Heled, 2014), removing the
first 10% of the trees as burn-in. Finally, we generated a maximum
clade credibility tree using TreeAnnotator v1.7.5 (Drummond, Suchard,
Xie, & Rambaut, 2012) to access the posterior probabilities of the
resulted topology.
Bayes Factor Species Delimitation (BFD*) (Leaché, Fujita, Minin, &
Bouckaert, 2014) analysis based on the SNPs dataset, was performed using
the SNAPP and Path Sampler packages included in Beast v2.5.2
(Bouckaert et al., 2014). We estimated and compared the marginal
likelihood estimates (MLE) for two models of species limits. Model 1
considered H. pungens as a complex of the following five species
according to the results obtained with sNMF, DAPC, and the estimated
best phylogenetic tree: i) H. pungen s sensu stricto(Hypogeococcus populations collected on Amaranthaceae in
Argentina); ii) mealybugs feeding on Cactaceae from
Argentina-Paraguay-Australia; iii) mealybugs feeding on Cactaceae from
southeastern Brazil and Puerto Rico; iv) mealybugs feeding on
Amaranthaceae and Portulacaceae from northeastern Brazil, Puerto Rico
and the United States; and v) mealybugs feeding on Amaranthaceae from
southeastern Brazil. Model 2 considered H. pungens as a complex
comprising four species according to the major clades found with mtDNA:
the same groups as in Model 1 with the difference that H. pungen ssensu stricto and mealybugs feeding on Amaranthaceae and
Portulacaceae from northeastern Brazil, Puerto Rico and the United
States appear as only one group. The MLE for each model was calculated
using the same mutation rate as above, considering 12 steps with an MCMC
length of 200,000 generations, a pre-burning of 20,000 and discarding
the first 10% as burning. The MLEs for each model were used to
calculate the Bayes factor (BF) test
statistics 2MLE(model1 )-MLE (model2 ) (Kass &
Raftery, 1995).
Finally, two single locus methods of species delimitation were run based
on mtDNA, the Generalized Mixed Yule Coalescent (GYMC) (Pons et al.,
2006) and Bayesian Poisson tree processes (bPTP) (Zhang, Kapli,
Pavlidis, & Stamatakis, 2013). For the GYMC analysis we first
calculated an ultrametric tree in Beast v2.5.2 (Bouckaert et al., 2014).
We used GTR+Γ+I model with an uncorrelated relaxed clock and constant
size tree prior. MCMC was set to 10,000,000 generations, sampling every
1,000th generation. The maximum credibility tree was
generated in TreeAnnotator v1.7.5 (Drummond et al., 2012). The GYMC
analysis was performed with the SPLITS package (Ezard, Fujisawa,
& Barraclough, 2009) in R, using the ultrametric tree inferred by
Beast. The bPTP analyses were performed in the bPTP server
(http://species.h-its.org/) using the ultrametric tree in nexus
format, with default values and using 200,000 MCMC generations.