Genomic analysis
Leaf material from 2 A1(c+) individuals (F0-tolerant pool), 2 T6(c-) individuals (F0-sensitive pool) and 19 F3 individuals from 7 tolerant families (2 T6xA1 and 5 A1xT6) (F3-tolerant pool) were used for DNA extraction (Dataset SD5). DNA was isolated with a Qiagen DNeasy kit and libraries were prepared using a TruSeq DNA PCR-Free Sample Preparation Kit from Illumina. Whole-genome sequencing was performed on Illumina HiSeq 2000 by Novogene (UK). Read depth was measured after processing and alignment to the TAIR10 reference (below) at 15x coverage for each individual sample and 180x for the pooled sample (19 F3 individuals).
Raw sequence data was processed as follows: (1) read trimming by default parameters with Sickle [23]; (2) alignment to the TAIR10 reference with bwa mem 0.7.12 (processing with samtools 1.3); (3) removal of duplicate reads using Picard (MarkDuplicates); (4) application to read group name correction to the bam files using Picard (AddOrReplaceReadGroups); (5) realigning Indels using the GATK ‘GenomeAnalysis’ Toolkit [24].
Bi-allelic SNPs were identified using ‘HaplotypeCaller’ and genotyped using ‘GenotypeGVCF’ (both tools in GATK). Data were filtered using GATK SelectVariants using these ‘GATK best practices’ guidelines: QD < 2.0 || MQ < 40.00 || FS > 60.0 || SOR > 4.0 || MQRankSum < -8.0 || ReadPosRankSum < -8.0 and a minimum coverage of 10 per sample. Genotyping of the pooled sample (19A1xT6 individuals) was performed with LoFreq [25], as GATK failed with ploidy levels required (ploidy-38 for 19 diploid samples). LoFreq was run with default settings. VCFs were then merged with BCFtools [26]. The allele frequency of all variants in the F3-tolerant pool versus the F0-sensitive pool was determined using the Multipool program [27]. Several intervals with bi-polar peaks were identified across the genome. Intervals where the allele frequency between the two pools differed considerably and had a positive LOD score were further explored for candidate genes.
The whole genome of 4 A1(c+) and 4 T6(c-) individuals was sequenced in a previous study [28] and were merged with the 2 parental samples sequenced for the BSA-seq experiment. To obtain a consensus sequence for each deme, private SNPs of T6 and A1 (shared for the 6 samples) were selected using GATK and Col-0 TAIR10 sequence as a reference. Once we obtained the consensus sequence, we selected the sites that differ between demes using GATK ‘concordance’ command obtaining one vcf file with T16 AF<0.1 and A1 AF > 0.9 variants and a second vcf file with T16 AF>0.9 and A1 AF< 0.1 variants. We merged the files using VCFtools [29] and the amino acid changes between A1 and T6 were obtained and quantified using SNPeff [30]. Genes with 3 to 10 genetic modifiers were selected as candidates (Dataset SD3).