Mitochondrial Cytochrome b
Following amplification of either the entire cytochrome b gene (tissue) or the first 300 base pairs (museum specimens), all tested samples recovered sequences which mapped to the G. sabrinusreference and ranged from 0-100% amplification success as assessed from gel electrophoresis (details in Appendix 1). Coverage of the museum specimens ranged from an average of 22.7- 1933x. Interestingly, the highest coverage was from one of the poor quality museum specimens (MVZ 5211), as was the lowest (MVZ 2088). Regardless, all samples recovered reliable cytochrome b sequences, and only the lowest coverage (MVZ 2088) had an error rate over 0.0001% (0.064%, Appendix 2). Despite poor gel visualization confirmation, all samples recovered mitochondrial sequences, even when many PCRs had been deemed as failed (see all poor quality sample results in Appendix 1). Average coverage across all sample types was 624x, 612x across the high quality museum specimens, and 760x for the poor quality museum specimens (which was largely biased by the very high coverage recovered in MVZ 5211, without this sample the average was 174x). Additional quality metrics for each sample are detailed in Appendix 2. The quality scores across the samples did not vary substantially. The sample recovering the lowest Q20 was HSU 1836 with 95.8%, and the highest was MVZ 2088 with 98.6%. At a quality of Q30 the lowest was again HSU 1836 and the highest was UMMZ 79755 with 98%. The tissue sample had 98.2% score at Q20, and 94% at Q30, which is likely artificially low since this sample had the entire cytochromeb amplified, then fragmented prior to library preparation (Yuan, 2020). Regardless of sample type, the reads appear to be high quality as determined by the quality metrics, expected rates of errors and Q scores. It is also noteworthy that the cytochrome b data was extracted solely from the pooled data where all microsatellites and cytochrome b fragments were pooled prior to library prep.
The haplotypes recovered from the different samples included here consisted of three closely related haplotypes, separated by three to six substitutions. All of the G. o. californicus recovered the same haplotype, the single G. o. lascivus (HSU 8180) recovered a second, the G. o. stephensi (HSU 1836) sample recovered the third haplotype (Appendix 3). The haplotypes were common across a wider study of G. oregonensis (Yuan, 2020), and the fact that none of the LQMS recovered a unique haplotype provides support for the authenticity of the data. Importantly, samples included in this study represent three subspecies from four geographic locations, so numerous haplotypes were expected.