2.3 | Mapping, variant calling and filtering
Genomic reads obtained from individuals and pools were mapped against
the P. fijiensis reference genome
(https://genome.jgi.doe.gov/Mycfi2/Mycfi2.home.html ,
Arango Isaza et al.,
2016). Pool-sequencing data were treated using the same pipeline and
filtering parameters as in Carlier et al. 2021b. Data available from the
2011 samples were rerun with the 2013 samples so the same versions of
software were used for both. SNP calling was performed separately for
the samples from the two years because some analyses were only possible
using samples from 2011, for which some phenotypic data were available
(see explanation below). After filtration (mapping quality
> 30, minimum read count=3, minimum allelic frequency=0.03)
, 981 001 and 1 792 219 biallelic SNPs were detected in the six and
eight populations collected in 2011 and 2013, respectively. For the
sequencing of individuals, the genomic reads of 63 isolates were mapped
separately using bwa v0.7.15 software
(Li & Durbin, 2010)
with bwa_men commands and default parameters. Duplicates were tagged
and eliminated using Picard Toolkit v 2.7.0 (Picard Toolkit, 2019, Broad
Institute, GitHubRepository:http://broadinstitute.github.io/picard/)
and mark_duplicates command. Genome Analysis Toolkit (GATK) v 4.1.4.0
(McKenna et al., 2010)
was used for SNP calling with Haplotypecaller command and all
individuals were merged in the same file in variant call format (VCF)
with the GenotypeGVCFs_merge. The VCF file was filtered to keep only
SNPs using GATK’s SelectVariants command and variants were filtered for
quality with the VariantFiltration command with the same parameters as
in Derbyshire et al. 2019. A second filter was then applied to each
genotype from the VCF file using vcftools v.0.1.14
(Danecek et al., 2011)
with the following parameters: maf 0.01, minDP 4 maxDP 100, minGQ 20,
max-missing 0.7. After filtration, 758 407 SNPs were identified among
the 63 isolates. The VCF file was converted using a custom script into
FASTA files containing all individuals, the required format for some
analyses below.