Consistency and accuracy tests
To test the consistency and accuracy of our software, we used simulated
data and empirical datasets as control data. To obtain simulated data,
we used the embedded simulator that can also be used for testing and
teaching purposes (Tab Simulation, see Documentation).
First, we perform consistency test of GenAPoPop with the output
of Spagedi reference software (Hardy & Vekemans 2001) on four
basic population genetic indices (Ae, He, Ho, Rhost). We simulated four
test datasets simple enough to be checked by hand calculations for unit
testing. For further and future unit testing, the raw datasets were
deposited on the European general-purpose open repository
Zenodo (Barloy et al. 2022). These four scenarios correspond
respectively to panmictic (A), highly selfed (B), highly clonal (C) and
half-clonal-half-selfed (D) reproductive modes. Quantitative values are
explicitly indicated in the two first lines of the datasets. In each
scenario, we simulated two connected populations of 100 individuals each
with a migration rate of 0.01 and mutating at a rate of 0.01, genotyped
at 10 SNPs, 1000 generations after an initial randomly drawing
population. For each scenario, we recorded the populations’ genotyping
states over two consecutive generations (generations 1000 and 1001). In
addition, we tested GenAPoPop on two field datasets genotyped
with confident allele dosage, one SNPs set from the autotetraploid
genome part of Ludwigia grandiflora subsp. hexapetala (hereafterLgh , Genitoni et al. 2020) and one microsatellite set from the
autotetraploid artic sea anemone Aulactinia stella (hereafterAs , Bocharova et al. 2018). These two datasets are genetic
samples of larger metapopulations genotyped with confident allele
dosage, including missing alleles and genotypes, and including some loci
fixed in one of the populations. We draw attention of users that the
different softwares present different ways to handle fixed, missing
alleles and genotypes.
Second, to analyse how population genetic indices of a snapshot of
genotyped populations behave in autopolyploids and how they compare to
diploids as reported in Stoeckel et al. 2021, we simulated 6300
different datasets following 21 reproductive mode scenarios and three
different ploidies (2,4 and 6). Each reproductive scenario at one ploidy
level was independently simulated a hundred times to get a confident
picture of the range of the possible genetic trajectories. Each of the
21 different reproductive mode scenarios consists on a triplet of values
including one rate of clonality, one rate of selfing and one
complementary rate of allogamy, the three rates necessarily summing to
one. Rates of clonality, selfing and allogamy took complementary values
of all the possible combinations within the set [0., 0.2, 0.4, 0.6,
0.8, 1.], constrained to sum to one. For example, one scenario was
(rate of clonality=0.2, rate of selfing=0.4, rate of allogamy=0.4).
Hereafter, for easier representation, we reported couple of rates of
clonality and of selfing in figures and text, implicitly considering
that rate of allogamy was one minus the rates of clonality and selfing.
In each scenario, we simulated two connected populations of 100
individuals each with a migration rate of 0.01 and mutating at a rate of
0.01, genotyped at 30 markers with 4 possible alleles randomly
introduced within individuals in the first generation with the same
frequency. Analysed datasets were recorded 1000 generations after the
initial generation, corresponding to 5 times the overall instantaneous
population size (N=200). Distributions of population genetic indices
obtained with the 21 reproductive modes and 3 ploidy levels were
reported as violin plots, each made from one hundred independent
simulations.
Third, to test the accuracy and precision of ClonEstiMatePoly
method to jointly infer rates of clonality, selfing and allogamy in
autopolyploid populations genotyped at two-time step, we simulated again
6300 different datasets following the same 21 reproductive mode
scenarios, for diploid, tetraploid and hexaploid populations. For each
quantitative reproductive mode (i.e. , a precise couple of values
of one rate of clonality and one rate of selfing), we simulated 100
couples of populations, each of size N=100, mutating at a rate 1/N and
exchanging migrants at a rate of 1/N, over 1000 generations. We
submitted the genotypes found in parents (generation 999) and in their
descendants (generation 1000) to ClonEstiMatePoly with flat
priors to get the inferred posterior distribution of the joint rates of
clonality and selfing. The 100 posterior distributions per couple of
rates of clonality and selfing were summed and reported as a confusion
matrix for ploidy 2, 4 and 6.
All the results were aggregated and deposited in Barloy et al. (2022).