Processing of genomic data and SNP calling for P. oryzaeisolates
As for rice genomic data, we used Toggle to implement a
pipeline for raw reads processing, mapping and SNP calling. Raw reads
were trimmed to remove barcodes, adapters and ambiguous base calls.
Trimmed reads were mapped against reference genome 70-15 version 8
(R. A. Dean et al.,
2005) using BWA with option –n 5 for sub-command aln and option –a 500
for paired-end analyses sub-command sampe . The alignments were
sorted with PICARDTOOLSSORTSAM and SAMTOOLSVIEW
(http://broadinstitute.github.io/picard/, Li 2011). Intervals to target
for local realignment were defined using
Realignertargetcreator, and local realignment of reads around
indels were performed with Indelrealigner. Duplicates were
removed with Markduplicates. SNPs were then called using the
UnifiedGenotyper tool in GATK, while keeping all sites of the
reference genome using the option Emit_all_sites.
High-confidence SNPs were identified using GATK’s
variantfiltration option with the following parameters:
MQ0< 3.0 (total mapping quality zero reads), depth ≥ 15.0
(number of reference alleles + number of alternative alleles, computed
as the sum of allelic depths for the reference and alternative alleles
in the order listed), and RA ≤ 0.1 (number of reference alleles / number
of alternative alleles).