2.3 SNP calling and genomic data filtering
The genotyping analysis used custom scripts (SNPsaurus, LLC) that trimmed the reads using bbduk (BBMap, http://sourceforge.net/projects/bbmap/) (Bushnell, 2017) to remove nextera adaptors and low-quality sequences. A reference draft genome was built using DNA from a single male mealybug (Table S1). By means of this procedure we were able to obtain a clean reference since males do not feed as adults, allowing us to identify reads of bacteria, parasitoids or food scraps present in females’ sequences. To build the reference genome, 150 bp paired-end reads were sequenced in a lane of a HiSeq 4000 (SNPSaurus at University of Oregon). Illumina paired-end sequences were then trimmed for Nextera adapters using bbduk (BBMap, sourceforge.net/projects/bbmap) (Bushnell, 2017). The assembly was done using abyss-pe (Jackman et al., 2017). Assembled contigs shorter than 250 bp were removed and then aligned using blastn to the NCBI nt database. Blast hits to bacterial species were removed. Reference draft genome is available as FuEDEI_HPun_1.1.fa at the NCBI under accession number: JAAOIU000000000. Cleaned reads were then mapped to the reference mealybug draft genome with an alignment identity threshold of 0.95 using bbmap (BBMap tools). Genotype calling was done by using callvariants (BBMap tools). The Variant Calling File (VCF) generated was filtered to remove individuals with more than 10% missing data, sites with more than 20% missing genotypes and loci with minor allele frequency (MAF) lower than 0.01 using VCFtools v1.15 (Danecek et al., 2011). We randomly selected one SNP per contig to minimize linkage disequilibrium (LD) and to ensure the independence of the SNPs employed in the next analyses. We also excluded from our dataset SNPs with more than two allele variants and indels. We used Bayescan to test for outlier SNPs. Fit to Hardy-Weinberg expectations (HWE) of variant frequencies for each locus within populations was tested using the exact test implemented in dDocent (Puritz, Hollenbeck, & Gold, 2014).