2.4 Population structure analysis
As linkage disequilibrium (LD) may affect the inference of population structure, the diploid SNPs were firstly filtered with LD (r 2) < 0.2 using PLINK 1.9 (Chang et al., 2015) with the parameters ‘–indep-pairwise 100 10 0.2’. To analyze data of diploids and triploids together, the pruned diploid SNPs were then compared with the triploid SNPs using the isec function in BCFtools (Danecek et al., 2021), and the intersection of SNPs was used for population structure analysis. Three methods were used to infer population genetic structure including principal component analysis (PCA), structure analysis and genetic distance analysis. The first method was used for diploids and triploids separately as well as all samples together, while the other two methods were used for diploids and triploids separately.
For PCA, the genotype data at each locus was firstly converted into the frequency of the reference allele, that is, 0/0.5/1 for diploids and 0/0.33/0.67/1 for triploids. The PCA was then performed using the R built-in function prcomp with default parameters. The STRUCTURE software Version 2.3.4 (Pritchard et al., 2000) was used for genetic structure analysis with five run times for each K value ranging from 1 to 12. The optimal K , which indicates the most likely number of genetic clusters, was determined according to the method described in Evanno et al. (2005) (Evanno et al., 2005). For genetic distance analysis, the identity-by-state (IBS) which describes the genetic relationship among individuals was calculated using a custom R script. The minimum evolution phylogeny trees were constructed based on the genetic distance matrix of 1-IBS values using the FastME program (Lefort et al., 2015) and visualized using the online tool iTOL (http://itol.embl.de) (Letunic & Bork, 2019).
Pairwise genetic differentiation (F ST) and tests for significance were estimated for invasive populations and regionally defined genetic clusters of native populations using the R packageStAMPP (Pembleton et al., 2013). Additionally, genetic differentiation among populations was analyzed by an Analysis of Molecular Variance (AMOVA) using 100 permutations, where the variance components were partitioned between regions (invasive and source ranges), among populations within regions and within populations.