DNA extraction, pooling, sequencing
For all the population of our crossing scheme, Genomic DNA was extracted from young leaves previously sampled, snap frozen and stored at -80C. DNA was extracted following a modified CTAB method from Maloof lab (https://openwetware.org/wiki/Maloof_Lab:96well_CTAB). DNA concentration and purity was estimated with a NanoDrop ND-1000 spectrophotometer (Thermo Scientific, MA, USA). The DNA integrity was confirmed was assessed using a 1% agarose gel with ethidium bromide. Prior to sequencing, DNA concentration was measured with a Qubit Fluorometer (Invitrogen, MA, USA).
For the bulk segregant analysis (BSA) resistant (R, plants showing HR) and susceptible (S, plants without HR) bulks (n = 10) were prepared by pooling equal amounts of DNA from each individual plant. Further, the three accessions that originated the initial crossing were also used for DNA isolation (the S parent DG1-S1, the R parent F1_#130 and the donor of HR SF48-O1, “R donor” (Fig. 2). For the WGS experiment, 1 ug of genomic DNA of each sample (three accessions and two bulks) was used for library preparation. Library preparation and whole genome sequencing were carried out by Novogene (Cambridge, UK). Sequencing libraries were generated using the NEBNext Ultra II DNA Library Prep Kit for Illumina (New England Biolabs, UK) following the manufactures’ protocol. The genomic DNA was randomly fragmented to a size of 350bp by Bioruptor, then DNA fragments were size-selected with sample purification beads. The selected fragments were then end-polished, A-tailed, and ligated with the full-length adapter. After these treatments, these fragments were filtered with beads. At last, the library was analysed for size distribution by Agilent 2100 Bioanalyzer (Agilent technologies, CA, USA) and quantified using real-time PCR. Libraries were sequenced on an Illumina NovaSeq 6000 platform using 150 bp paired end (PE) reads.
K-mer based genetic mapping
K -mer based genetic mapping was performed following the recommendations of Comparative subsequence sets analysis (CoSSA) workflows (Prodhomme et al., 2019). First, sequencing read quality was assessed with FastQC (Andrews, 2010). Then, k- mer tables were built with a k -mer size of 31 nucleotides using GlistMaker of the GenomeTester4 v4.0 suite (Kaplinski, Lepamets, & Remm, 2015).k- mer that occurred only once were removed as likely resulting from sequencing errors. To identify resistant (R ) haplotype-specific k -mers, GlistCompare of GenomeTester4 was used to perform basic set operations such as unions, intersections of differences between k -mer tables of different samples (Supporting information: Fig. S1). An additional filtering step was carried out to retain R haplotype-specific k -mers. The sequencing yielded 14.4 Gb for the R bulk and an approximate 24x depth considering a haploid B. nigra genome of ~550 Mb. Given that B. nigra genome is diploid (2n = 2x = 16) and assuming uniform sequencing coverage, k -mers originated from the Rhaplotype should have sequencing depth of ~12x. Thus, we decided to retain k -mers with a depth between 10x and 20x which represented k -mers derived from R-specific haplotype. Retainedk -mers were aligned to B. nigra reference genomes NI100 v2.0, C2 v1.0 and Sangam v1.0 using BWA aln (v0.7.17) allowing 3 mismatches (Li & Durbin, 2009). The number of mapped k -mers mapped per 1 Mb bins were counted using bedtools v2.25 (Quinlan & Hall, 2010).
KASP markers genotyping
Kompetitive Allele Specific Polymorphisms (KASP) PCR markers were used to validate the results of ­k -mer based genetic mapping. For each sample, DNA concentration was adjusted to 5-50 ng/µl. Primers were designed on the sequences flanking the SNPs identified by k -mers of the R-specific haplotype. KASP genotyping assays were performed according to the manufacturer’s instructions (LGC Genomics, UK). In brief, 2 µL DNA at a concentration of 5–50 ng/µL was added to 96-well plate with KASP PCR mix (5 µL 2× KASP Master mix, 0.6 µL 10mM primer mix, 2.4 Milli-Q water). The PCR was performed in a CFX96 Touch Real-Time PCR Detection System combined with CFX Maestro Software for data reading (Bio-Rad, Hercules, CA, USA).
Variant calling within PEK locus
Single nucleotide polymorphisms (SNPs) and Insertion/Deletion (InDels) variants were called using a workflow based on GATK Best Practices ( et al., . Reads were aligned to the B. nigra reference genome C2 v1.0, to which the mitochondrial sequence (Genbank accession no. NC_029182) and chloroplast sequence (Genbank accession no. NC_030450) were added, using BWA mem v0.7.17 with default parameters (Li, 2013). The resulting alignment files were sorted and indexed using SAMtools v.1.11 (Li et al., 2009). Alignment files were filtered to restric variant calling to the PEK locus. Duplicate read pairs were marked using the MarkDuplicates tool of the GATK suite v.4.1.9.0. Variants (SNPs and InDels) were called in each sample on a window of x Mb around the PEK locus using GATK HaplotypeCaller. Samples were then jointly genotyped using GATK GenomicsDBImport and GATK GenotypeGVCFs, with default parameters. SNPs filtration was performed with the following parameters: QD < 2, QUAL < 30, SOR > 3, FS > 60, MQ < 40, MQRankSum < -12.5, ReadPosRankSum < -8. InDels filtration was performed with the following parameters: QD < 2, QUAL < 30, SOR > 3, FS > 200, ReadPosRankSum < -20. Finally, only variants that in agreement with our genetic model were retained: i) heterozygous in resistant (HR+) material; ii) homozygous DG1-S1 allele in susceptible (HR-) material. The functional effect of the retained variants was predicted using SnpEff with default parameters (Cingolani et al., 2012).
Copy number variations (CNVs) at PEK locus between three B. nigra reference genomes was performed with the tool GEvo on the CoGe web platform (Lyons et al., 2008). All genomes were accessed in September 2021 from the following databases: B. nigra genomes NI100 v2.0 and C2 v1.0 were downloaded from http://cruciferseq.ca/, genome Sangam v1.0 from http://brassicadb.cn/. Pairwise alignment of genomes was performed with default settings: Alignment algorithm “(B)LastZ: Large regions”; Word size: 8; Gap start penalty: 400; Gap extend penalty: 30; Chaining: NO; Score threshold: 3000; Max threshold: 0; Minimum HSP length: 50. CNVs were identified by visualizing the region between markers flankingPEK locus (M27 and M28) and identifying conserved regions/genes that were highlighted as High Scoring Pairs (HSP).