2.5 Genetic variation in S-locus genes and their paralogs
To quantify genetic variation in the five S-locus genes (CCMT , CYPT ,GLOT , KFBT ,PUMT ) and their four paralogs (CCM1 ,CYP734A51 , GLO1 , KFB1 ), we calculated nucleotide diversity (π) at synonymous (πS) and non-synonymous (πN) sites. For this analysis, we mapped the WGR reads of P. vulgaristo the chromosome-scale reference genome of P. veris (Potenteet al., 2022) because it provides a better annotation of all S-locus genes and their paralogs outside the S-locus than the P. vulgaris reference genome. Prior to mapping, we annotated sites at 4- and 0-fold degenerate sites as synonymous and non-synonymous sites, respectively, for all nine genes using the script NewAnnotateRef.py (https://github.com/fabbyrob/science/tree/master/pileup_analyzers[last accessed on March 25, 2022]) (Williamson et al. , 2014). Mapping of sequencing reads, variant calling, and filtering were performed with BWA-mem v7.17 (Li & Durbin, 2009) and GATK v4.1.2.0 (McKenna et al., 2010) as described in section 2.3. Subsequently, we split the VCF files of each of the nine genes into synonymous and non-synonymous sites using BEDtools v2.29.2 (Quilan & Hall, 2010). Using these VCF files, we estimated πSand πNwith pixy v.1.0.0 (Korunes & Samuk, 2021). Specifically, we estimated πSand πNof S-locus paralogs in heterostyles (i.e., pins and thrums combined) and homostyles, while the estimates of πSand πNfor S-locus genes included thrums (since pins lack the S-locus) and homostyles. Finally, we estimated the strength of purifying selection in all S-locus genes and their paralogs by calculating the ratio of nucleotide diversity at non-synonymous vs. synonymous sites (πNS).
2.6 S- locus genotypes in natural populations of P. vulgaris
To determine whether homostyles and thrums have either a haploid (i.e., S*/0 or S/0, respectively) or diploid (i.e., S*/S* or S/S, respectively) S-locus, we calculated the relative sequencing depth of the S-locus (RelS-locus depth ), as follows. We first estimated the average site depth of the S-locus coding regions (depthS-locus) and of the genome-wide coding regions (depthGenome-wide) from filtered VCF files using BCFtools v1.9, as indicated in section 2.5, then we calculatedRelS-locus depth as depthS-locus/depthGenome-wide. This normalization allowed us to account for differences of sequencing depth among individuals. A RelS-locus depth value of approximately 0.5 ± 0.25 indicates a haploid S-locus (S*/0 or S/0 for homostyles and thrums, respectively), while a value close to 1 ± 0.25 indicates a diploid S-locus (S*/S* or S/S for homostyles and thrums, respectively).
To determine whether the proportions of individuals carrying 0/0-, S/0-, S*/S*-, and S*/0-genotypes in natural populations support the model with lower or the one with equal viability of S*/S*-homostyles (Figure 1C and D), we compared the observed frequencies of the four genotypes in EN4-T, EN5-T, and EN6-M with the expected genotype frequencies predicted by Crosby’s model (1949; Figure 1C and D). First, using Crosby’s equations (1949), we calculated the expected frequencies of all four genotypes (0/0-, S/0-, S*/S*-, and S*/0) at generations 10, 20, 30, and 40 after the onset of homostyly, specifying either lower (v = 0.65) or equal (v = 1) viability of S*/S*-homostyles. Since natural frequencies of the four genotypes might be compatible with levels of viability not included in Crosby’s (1949) model, we additionally estimated expected genotype frequencies under v = 0.9, 0.8, 0.7, 0.6, and 0.5 . Viability values below 0.5 were not used because phenotypic frequencies reflecting these conditions (roughly equal frequencies of pins and homostyles and absence of thrums) have never been reported in natural populations.
Secondly, we estimated the observed frequencies of all four genotypes as follows. We tallied the raw counts of pins (0/0-genotypes) and thrums (S/0-genotypes) in EN4-T and EN5-T based on previous population surveys (T04 and T07, respectively, in Mora-Carrera et al., 2021; Material and Methods section 2.2, here). Furthermore, we calculated the number of homostyles with S*/0- and S*/S*-genotypes in EN4-T, EN5-T, and EN6-M by multiplying the proportion of S*/0- and S*/S*-genotypes reported here (see Results section 3.3) by the raw number of homostyles in each of the three populations. Finally, to determine if observed and expected genotype frequencies are compatible with Crosby’s model under lower or equal viability of S*/S*-homostyles at 10, 20, 30, and 40 generations, we used chi-squared tests with Bonferroni corrections. A significant difference between observed and expected frequencies indicates that observed genotype frequencies are not compatible with Crosby’s model, while non-significant results indicate that they are compatible with it.