2.5 Population genomics, clustering and phylogeographic analyses
Based on SNPs datasets, observed heterozygosity (Ho) and allele richness (Ar) were calculated with R package hierfstat (Goudet, 2005). Expected heterozygosity (He), global differentiation estimates (Gst) and pairwise Fst estimates between populations were calculated usingmmod and Adegenet (Jombart, 2008) packages available in R (www.r-project.org). Populations were defined by sampling location and host plant collection site; however, samples from Argentina, Australia and Paraguay were pooled due to low sample size after SNPs filtering, and also because Australian mealybugs have been shown to be part of the same clade with mealybugs from Argentina feeding on Cactaceae (Poveda-Martínez et al., 2019). To assess population structure, we employed two approaches. First, we used sNMF (Frichot, Mathieu, Trouillon, Bouchard, & François, 2014) as implemented in R using the LEA package (Frichot & François, 2015) to assign samples to genetic clusters. sNMF estimates individual ancestry coefficients utilizing the same likelihood model underlying structure (Pritchard, Stephens, & Donnelly, 2000) and admixture (Alexander, Novembre, & Lange, 2009), but uses nonnegative matrix factorization and least squares optimization to accommodate genome scale datasets. Ancestry coefficients were calculated by sNMF for K values ranging from 1 to 10, with 100 repetitions for each value of K , and the optimal K value was assessed using the cross-entropy criterion based on the prediction of masked genotypes to evaluate the error of ancestry estimation. The second approach used to assess population structure was a discriminant analysis of principal components DAPC (Jombart, Devillard, & Balloux, 2010) as implemented in R inadegenet package, using K means clustering to identify the optimal number of clusters for K values ranging from 1 to 10, and estimating individual admixture coefficients. We then calculated Bayesian information criterion scores (BIC) to identify the K value with the best fit. Additional pairwise Fst estimates between groups identified in prior clustering analyses were performed using the SNPs dataset (142 individuals, 1,707 SNPs).
Mitochondrial DNA sequences were aligned using Clustal W implemented in MEGA v7 (Kumar, Stecher, & Tamura, 2016) and used to build a haplotype Network using Median Joining algorithm (Bandelt, Forster, & Röhl, 1999) implemented in PopArt v1.7.1 (Leigh & Bryant, 2015). Additionally, we calculated nucleotide diversity (π), haplotype diversity (Hd) and number of haplotypes (H). Pairwise Fst estimates between populations (as defined earlier) were calculated using DNAsp v6 (Rozas et al., 2017).