Introduction
The prodigious throughput of short-read sequencing technology has revolutionized quantitative genetics by allowing multiplexed genome-wide genotyping of large numbers of individuals with minimal ascertainment bias (Davey et al., 2011; Andrews et al., 2016). A major technical challenge to this approach is accurate calling of heterozygous genotypes at low sequencing depth. To circumvent this problem, reduced-representation libraries are often generated using restriction enzymes (Baird et al., 2008; Elshire et al., 2011) or sequence capture (Gnirke et al., 2009; Ali et al., 2016), increasing sequencing depth across a subset of the genome. However, haploid or inbred individuals can generally be genotyped and imputed much more accurately and inexpensively than heterozygous individuals (Swarts et al., 2014). A second challenge to genotyping with low-depth short-read data is the possibility of “homeo-SNPs” arising from alignment of reads from homeologous regions of the genome (Tinker et al., 2014; Hulse-Kemp et al., 2015). These false polymorphisms can often be identified by their excess heterozygosity relative to Hardy-Weinberg equilibrium, but homeo-SNPs that escape filtering may interfere with imputation and estimation of relatedness between individuals. Homeo-SNPs are particularly problematic in polyploids and interspecific hybrids.
Polyploidy and interspecific hybridization are common features of plant evolution that are exploited in plant breeding to generate novelty, increase vigor, and stack desirable alleles from different species (Alix et al., 2017). Tree and vine crops often rely on interspecific hybrid rootstocks to increase vigor and resilience to biotic or abiotic stresses without affecting fruit or nut quality in the grafted scion (Warschefsky et al., 2015). In California, for example, production of almonds (Prunus dulcis) (Ledbetter and Sisterson, 2008), walnuts(Juglans regia) (Ramasamy et al., 2021), and pistachios(Pistacia vera) (Ferguson et al., 2002) relies on rootstocks that are interspecific hybrids. Each of these nut crops has a mating system that can be exploited to generate large numbers of hybrid progeny (self-incompatibility, monoecy, and dioecy, respectively), and superior hybrid genotypes can be propagated clonally. However, genetic gain in tree breeding programs is generally slow due to the time and space required for evaluation, as well as the difficulty of genotyping highly heterozygous material.
This study evaluates different methods for generating genotype data from elite populations of interspecific hybrid pistachio (P. atlantica X P. integerrima ; n=725) and walnut (J. microcarpa X J. regia ; n=228) rootstocks. Short read sequencing was performed on reduced-representation libraries for each species. A typical workflow would be to align the resulting reads against either the maternal (P1) or paternal (P2) genome (Figure 1). Because interspecific hybrids are composed of one haploid gamete from each parent, we expected that alignment to both parental genomes simultaneously (P1+P2) would result in haploid data, avoiding depth thresholding and greatly increasing genotyping efficiency for heterozygous germplasm.