Identifiability of genotypic and genetic signals using
machine learning
Supervised machine learning, as expected, delivers estimates consistent
with the description of parameter evolution with increasing cwhen all genotypes in the population are known (Figures 3 and S3).
Genotypic indices allow a reasonable estimate of c throughout its
range, while genetic parameters allow such precision only for very high
values. Genotypic parameters evolve gradually with high accuracy of the
estimated c based on R and a slightly wider but still
rather precise distribution when based on an intermediate Pareto\(\beta\). In contrast, but logically (as the mean values of genetic
parameters are nearly unaffected by increasing clonality until extreme
rates are reached; Figures 1 and S1), machine learning produced a wide
distribution of estimates around simulated values of c up toc =0.6 for F IS and c =0.9 for\({\overset{\overline{}}{r}}_{d}\). This distribution, however, is not
entirely flat, and although c estimates are poor at modest rates
of clonality, they become precise near values of c between 0.7
and 0.99.
Based on supervised machine learning, the variance inF IS was the most identifiable signal among the
studied genetic parameters (Figure S4). The mean and variance ofF IS contain more identifiable signals than\({\overset{\overline{}}{r}}_{d}\) in the range of \(0<c<0.9.\ \)The
mean and variance of F IS values even show rather
accurate inferred rates of clonality from c =0.7 to c =1.
The variance in F IS showed the best ability to
quantitatively infer c <0.5 but produced an error of\(\pm 0.3\). Using all moment values of F IS and\({\overset{\overline{}}{r}}_{d}\), the supervised learning algorithm
groups the strength of these parameters, increasing the precision to
quantitatively infer c . Rates of clonality from c =0.7 toc =1 were inferred with no error; from c =0.4 toc =0.7, with low error (\(\pm 0.1\)); and from c =0 toc =0.4, with larger errors (\(\pm 0.3\)).
Taken together, the genotypic and genetic parameters thus complement
each other to properly estimate c , with the first allowing very
precise estimates of c up to 0.95, where the latter become useful
and precise. The combination of genotypic and genetic parameters should
thus be considered to precisely estimate the whole range of possible
rates of clonality in natural populations, although genotypic parametersa priori appear to be the most important to retain across the
widest range of possible c values. Theoretically, a combination
of R and (variance in) F IS would be best
for obtaining a good estimate of c for any natural population
when no a priori information on its extent is available.