Microsatellites- effects of sample quality
When prinseq was run on all reads from individually library prepared
replicates a general pattern emerged where PCR success was predictive of
quality. The mean quality across all sample types was 85.6%, with a
range from 57.43% - 99.48% (MVZ 5211 GS-2 replicate two and HSU 1836
GLSA-52 replicate one represent the lowest and highest). The median was
92.6% and mode was 96.82%. If sample types were separated the mean
quality was as follows: 95.99% (SE ±0.99%), 95.06% (SE ±0.68%), and
73.84% (SE ±2.10%) for tissue, HQMS and LQMS respectively (Table 5).
All aforementioned and additional descriptive metrics are shown in
Appendix 2. The ANOVA was significant (P= <0.001) and the
regression resulted in an R2 of 0.44 which was also
highly significant (P=<0.001), indicating our assessed quality
from gel electrophoresis was predictive of genotyping success.
The CHIIMP genotypes were accurate for the single tissue sample,
especially if the PCR replicates were pooled and genotyped together. The
reads per replicate and percentage of good reads (as determined by
standard prinseq quality filtration) are reported in Table 3. All
recovered genotypes are summarized in Table 4. In the GLSA-52 locus the
pooled run recovered a second allele 251 from HSU 8180 despite none of
the other genotypes recovering that allele. In the pooled run HSU 8180
recovered a total of 79,417 reads, across all microsatellite loci and
complete cytochrome b . The 251 allele was recovered in the pooled
dataset with a frequency of 5.3% whereas the 257 allele was recovered
at 17.6%, more than three times the frequency of the 251 allele. When
all other replicates of this sample were evaluated only the 257 allele
was recovered, and with rates ranging from 12.9-17.6% (see details in
Appendix 4).
Mismatched alleles were recovered most frequently in low quality samples
which routinely appeared to fail PCR across numerous replicates.
Mismatches were often associated with one or more of the following: PCR
stutter sequences, PCR artifacts and more than two prominent sequences
as identified by the CHIIMP pipeline (Table 4). Individual samples did
not appear to recover specific CHIIMP flags across all replicates,
neither did specific microsatellites, however the locus GS-2 recovered
frequent flags for all three metrics (Table 4).
The HQMS samples had routinely high quality sequences as determined by
prinseq metrics. Only a single PCR replica from UMMZ 79755 had less than
85% of sequences pass quality metrics. All other replicates were over
85%, and most recovered over 95% of sequences passing quality
filtration. Interestingly, the LQMS samples had high variation between
PCR replicates performed here (Figure 1). The average for each of the
LQMS samples were 81.3, 69.6, and 70.6% with a high degree of variation
between replicates. For example, LACM 95619 ranged from 67.7-92.05%
passing quality filters for GS-4. In this instance, across the three
replicates three completely different sets of alleles were recovered
providing no confidence in those genotypes despite one replicate
recovering 92.05% high quality sequences.