Identifiability of genotypic and genetic signals using machine learning
Supervised machine learning, as expected, delivers estimates consistent with the description of parameter evolution with increasing cwhen all genotypes in the population are known (Figures 3 and S3). Genotypic indices allow a reasonable estimate of c throughout its range, while genetic parameters allow such precision only for very high values. Genotypic parameters evolve gradually with high accuracy of the estimated c based on R and a slightly wider but still rather precise distribution when based on an intermediate Pareto\(\beta\). In contrast, but logically (as the mean values of genetic parameters are nearly unaffected by increasing clonality until extreme rates are reached; Figures 1 and S1), machine learning produced a wide distribution of estimates around simulated values of c up toc =0.6 for F IS and c =0.9 for\({\overset{\overline{}}{r}}_{d}\). This distribution, however, is not entirely flat, and although c estimates are poor at modest rates of clonality, they become precise near values of c between 0.7 and 0.99.
Based on supervised machine learning, the variance inF IS was the most identifiable signal among the studied genetic parameters (Figure S4). The mean and variance ofF IS contain more identifiable signals than\({\overset{\overline{}}{r}}_{d}\) in the range of \(0<c<0.9.\ \)The mean and variance of F IS values even show rather accurate inferred rates of clonality from c =0.7 to c =1. The variance in F IS showed the best ability to quantitatively infer c <0.5 but produced an error of\(\pm 0.3\). Using all moment values of F IS and\({\overset{\overline{}}{r}}_{d}\), the supervised learning algorithm groups the strength of these parameters, increasing the precision to quantitatively infer c . Rates of clonality from c =0.7 toc =1 were inferred with no error; from c =0.4 toc =0.7, with low error (\(\pm 0.1\)); and from c =0 toc =0.4, with larger errors (\(\pm 0.3\)).
Taken together, the genotypic and genetic parameters thus complement each other to properly estimate c , with the first allowing very precise estimates of c up to 0.95, where the latter become useful and precise. The combination of genotypic and genetic parameters should thus be considered to precisely estimate the whole range of possible rates of clonality in natural populations, although genotypic parametersa priori appear to be the most important to retain across the widest range of possible c values. Theoretically, a combination of R and (variance in) F IS would be best for obtaining a good estimate of c for any natural population when no a priori information on its extent is available.