2.7 Species tree reconstruction and species delimitation analyses
For SNPs and mtDNA datasets, maximum likelihood (ML) and bayesian inference (BI) approaches were performed. ML for both datasets was implemented in IQ-TREE v1.6.10 (Nguyen, Schmidt, Von Haeseler, & Minh, 2015) using CIPRES (Miller, Pfeiffer, & Schwartz, 2010) with 10,000 ultrafast bootstrap (UFboot) iterations. ModelFinder estimated GTR as the best-fitted model according to Akaike Information Criterion (AIC) for SNPs dataset and GTR+Γ+I for mtDNA dataset. BI reconstruction was assessed using MrBayes v3.2.6 (Ronquist et al., 2012), employing the entire matrix of 1,707 SNPs and the mtDNA haplotypes following the same fit-model implemented in ML analyses. For each dataset, two independent runs were performed with four Markov Chain Monte Carlo (MCMC) for each run for 50 million generations, sampling every 1,000 generations. The first 25% of the runs were removed as burn-in, and stability and sufficient mixing of parameters (ESS>1,000 for SNPs and ESS>200, for mtDNA) were checked using Tracer v1.6 (Rambaut, Suchard, Xie, & Drummond, 2014). UFboot values and Posterior Probabilities (PP) resulted from ML and BI, respectively, were plotted on each node of the reconstructed tree of each dataset. Additionally, based on SNPs dataset, a species tree was estimated using the coalescent-based methods, SNAPP (Bryant, Bouckaert, Felsenstein, Rosenberg, & RoyChoudhury, 2012) implemented in Beast v2.5.2 (Bouckaert et al., 2014). In this case, we could only include two individuals, selecting individuals with the lowest numbers of missing data, from each one of the groups defined in clustering analyses (see the previous section) as a result of the high computational demand of the method. Due to the reduction of the number of individuals, some sites could become monomorphic either because of the reference or the alternative allele, and some sites might also have had no data left for one or more of the sampled localities. Therefore, we decided to filter out monomorphic sites and sites with no data for one or more populations with bcftools v1.7 (Li et al., 2009), resulting in a new matrix of 1,679 SNPs. We used the default model parameters in SNAPP for U and V equal to one and we ran the analysis for 10,000,000 MCMC generations, sampling every 1,000 generations. Stationarity and convergence of data was checked using Tracer v1.6 (Rambaut et al., 2014). The complete set of trees were visualized in Densitree v2.2.5 (Bouckaert & Heled, 2014), removing the first 10% of the trees as burn-in. Finally, we generated a maximum clade credibility tree using TreeAnnotator v1.7.5 (Drummond, Suchard, Xie, & Rambaut, 2012) to access the posterior probabilities of the resulted topology.
Bayes Factor Species Delimitation (BFD*) (Leaché, Fujita, Minin, & Bouckaert, 2014) analysis based on the SNPs dataset, was performed using the SNAPP and Path Sampler packages included in Beast v2.5.2 (Bouckaert et al., 2014). We estimated and compared the marginal likelihood estimates (MLE) for two models of species limits. Model 1 considered H. pungens as a complex of the following five species according to the results obtained with sNMF, DAPC, and the estimated best phylogenetic tree: i) H. pungen s sensu stricto(Hypogeococcus populations collected on Amaranthaceae in Argentina); ii) mealybugs feeding on Cactaceae from Argentina-Paraguay-Australia; iii) mealybugs feeding on Cactaceae from southeastern Brazil and Puerto Rico; iv) mealybugs feeding on Amaranthaceae and Portulacaceae from northeastern Brazil, Puerto Rico and the United States; and v) mealybugs feeding on Amaranthaceae from southeastern Brazil. Model 2 considered H. pungens as a complex comprising four species according to the major clades found with mtDNA: the same groups as in Model 1 with the difference that H. pungen ssensu stricto and mealybugs feeding on Amaranthaceae and Portulacaceae from northeastern Brazil, Puerto Rico and the United States appear as only one group. The MLE for each model was calculated using the same mutation rate as above, considering 12 steps with an MCMC length of 200,000 generations, a pre-burning of 20,000 and discarding the first 10% as burning. The MLEs for each model were used to calculate the Bayes factor (BF) test statistics 2MLE(model1 )-MLE (model2 ) (Kass & Raftery, 1995).
Finally, two single locus methods of species delimitation were run based on mtDNA, the Generalized Mixed Yule Coalescent (GYMC) (Pons et al., 2006) and Bayesian Poisson tree processes (bPTP) (Zhang, Kapli, Pavlidis, & Stamatakis, 2013). For the GYMC analysis we first calculated an ultrametric tree in Beast v2.5.2 (Bouckaert et al., 2014). We used GTR+Γ+I model with an uncorrelated relaxed clock and constant size tree prior. MCMC was set to 10,000,000 generations, sampling every 1,000th generation. The maximum credibility tree was generated in TreeAnnotator v1.7.5 (Drummond et al., 2012). The GYMC analysis was performed with the SPLITS package (Ezard, Fujisawa, & Barraclough, 2009) in R, using the ultrametric tree inferred by Beast. The bPTP analyses were performed in the bPTP server (http://species.h-its.org/) using the ultrametric tree in nexus format, with default values and using 200,000 MCMC generations.