2.5 Population genomics, clustering and phylogeographic
analyses
Based on SNPs datasets, observed heterozygosity (Ho) and allele richness
(Ar) were calculated with R package hierfstat (Goudet, 2005).
Expected heterozygosity (He), global differentiation estimates (Gst) and
pairwise Fst estimates between populations were calculated usingmmod and Adegenet (Jombart, 2008) packages available in R
(www.r-project.org). Populations were
defined by sampling location and host plant collection site; however,
samples from Argentina, Australia and Paraguay were pooled due to low
sample size after SNPs filtering, and also because Australian mealybugs
have been shown to be part of the same clade with mealybugs from
Argentina feeding on Cactaceae (Poveda-Martínez et al., 2019). To assess
population structure, we employed two approaches. First, we used sNMF
(Frichot, Mathieu, Trouillon, Bouchard, & François, 2014) as
implemented in R using the LEA package (Frichot & François, 2015) to
assign samples to genetic clusters. sNMF estimates individual ancestry
coefficients utilizing the same likelihood model underlying structure
(Pritchard, Stephens, & Donnelly, 2000) and admixture (Alexander,
Novembre, & Lange, 2009), but uses nonnegative matrix factorization and
least squares optimization to accommodate genome scale datasets.
Ancestry coefficients were calculated by sNMF for K values ranging from
1 to 10, with 100 repetitions for each value of K , and the
optimal K value was assessed using the cross-entropy criterion
based on the prediction of masked genotypes to evaluate the error of
ancestry estimation. The second approach used to assess population
structure was a discriminant analysis of principal components DAPC
(Jombart, Devillard, & Balloux, 2010) as implemented in R inadegenet package, using K means clustering to identify the
optimal number of clusters for K values ranging from 1 to 10, and
estimating individual admixture coefficients. We then calculated
Bayesian information criterion scores (BIC) to identify the K value with
the best fit. Additional pairwise Fst estimates between groups
identified in prior clustering analyses were performed using the SNPs
dataset (142 individuals, 1,707 SNPs).
Mitochondrial DNA sequences were aligned using Clustal W implemented in
MEGA v7 (Kumar, Stecher, & Tamura, 2016) and used to build a haplotype
Network using Median Joining algorithm (Bandelt, Forster, & Röhl, 1999)
implemented in PopArt v1.7.1 (Leigh & Bryant, 2015). Additionally, we
calculated nucleotide diversity (π), haplotype diversity (Hd) and number
of haplotypes (H). Pairwise Fst estimates between populations (as
defined earlier) were calculated using DNAsp v6 (Rozas et al., 2017).