Whole-genome sequencing and Variant Calling.
Genomic DNA from L. cidri colonies was prepared for whole-genome
sequencing using a Qiagen Genomic-tip 20/G kit (Qiagen, Hilden, Germany)
as previously described (Nespolo et al., 2020b) and sent for sequencing
using DNBseq technology (BGISEQ-G400 platform) (Liu et al., 2018). Read
quality was checked using FastQC 0.11.8 (Andrews, 2010). Reads were
processed with fastp 0.19.4 (low quality 3’ end trimming, 40 bp minimum
read size) (Brickwedde et al., 2018; Chen et al., 2018). We also
obtained publicly available sequencing reads of L. cidri CBS2950
(Agier et al., 2018), which were processed identically. Reads were
aligned against the L. cidri CBS2950 reference genome (Vakirlis
et al., 2016) using BWA-mem (options: -M -R) (Li, 2013). Mapping quality
and overall statistics were collected and examined with Qualimap
(García-Alcalde et al., 2012). Sorting and indexing of output bam files
were performed using SAMTOOLS 1.9 (Li et al., 2009). A L.
fermentai isolate (CBS770) was also mapped against the L. cidriCBS2950 genome for phylogenetic analysis (Bellut et al., 2020). Mapping
files were tagged for duplicates using MarkDuplicates of Picard tools
2.18.14 (http://broadinstitute.github.io/picard/). Variant calling and
filtering were done with GATK version 4.0.10.1 (DePristo et al., 2011).
More specifically, variants were called per sample and chromosome using
HaplotypeCaller (default settings), after which variant databases were
build using GenomicsDBImport. Genotypes for each chromosome were called
using GenotypeGVCFs (-G StandardAnnotation). Variant files were merged
into one genome-wide file using MergeVcfs. This file was divided into
SNP calls and INDEL calls using SelectVariants. We applied recommended
filters for coverage (> 10 mapping reads =
“FORMAT/DP>10”) and quality (–minQ 30) (Van der Auwera
et al., 2013b). This VCF file was further filtered, depending on the
requirements of the given analysis, using vcftools (Van der Auwera et
al., 2013a, b). For all datasets, we only considered SNPs that had no
missing data using vcftools option –max-missing 1.