2.3 SNP calling and genomic data filtering
The genotyping analysis used custom scripts (SNPsaurus, LLC) that
trimmed the reads using bbduk (BBMap,
http://sourceforge.net/projects/bbmap/) (Bushnell, 2017) to remove
nextera adaptors and low-quality sequences. A reference draft genome was
built using DNA from a single male mealybug (Table S1). By means of this
procedure we were able to obtain a clean reference since males do not
feed as adults, allowing us to identify reads of bacteria, parasitoids
or food scraps present in females’ sequences. To build the reference
genome, 150 bp paired-end reads were sequenced in a lane of a HiSeq 4000
(SNPSaurus at University of Oregon). Illumina paired-end sequences were
then trimmed for Nextera adapters using bbduk (BBMap,
sourceforge.net/projects/bbmap) (Bushnell, 2017). The assembly was done
using abyss-pe (Jackman et al., 2017). Assembled contigs shorter than
250 bp were removed and then aligned using blastn to the NCBI nt
database. Blast hits to bacterial species were removed. Reference draft
genome is available as FuEDEI_HPun_1.1.fa at the NCBI under accession
number: JAAOIU000000000. Cleaned reads were then mapped to the reference
mealybug draft genome with an alignment identity threshold of 0.95 using
bbmap (BBMap tools). Genotype calling was done by using callvariants
(BBMap tools). The Variant Calling File (VCF) generated was filtered to
remove individuals with more than 10% missing data, sites with more
than 20% missing genotypes and loci with minor allele frequency (MAF)
lower than 0.01 using VCFtools v1.15 (Danecek et al., 2011). We randomly
selected one SNP per contig to minimize linkage disequilibrium (LD) and
to ensure the independence of the SNPs employed in the next analyses. We
also excluded from our dataset SNPs with more than two allele variants
and indels. We used Bayescan to test for outlier SNPs. Fit to
Hardy-Weinberg expectations (HWE) of variant frequencies for each locus
within populations was tested using the exact test implemented in
dDocent (Puritz, Hollenbeck, & Gold, 2014).