Consistency and accuracy tests
To test the consistency and accuracy of our software, we used simulated data and empirical datasets as control data. To obtain simulated data, we used the embedded simulator that can also be used for testing and teaching purposes (Tab Simulation, see Documentation).
First, we perform consistency test of GenAPoPop with the output of Spagedi reference software (Hardy & Vekemans 2001) on four basic population genetic indices (Ae, He, Ho, Rhost). We simulated four test datasets simple enough to be checked by hand calculations for unit testing. For further and future unit testing, the raw datasets were deposited on the European general-purpose open repository Zenodo (Barloy et al. 2022). These four scenarios correspond respectively to panmictic (A), highly selfed (B), highly clonal (C) and half-clonal-half-selfed (D) reproductive modes. Quantitative values are explicitly indicated in the two first lines of the datasets. In each scenario, we simulated two connected populations of 100 individuals each with a migration rate of 0.01 and mutating at a rate of 0.01, genotyped at 10 SNPs, 1000 generations after an initial randomly drawing population. For each scenario, we recorded the populations’ genotyping states over two consecutive generations (generations 1000 and 1001). In addition, we tested GenAPoPop on two field datasets genotyped with confident allele dosage, one SNPs set from the autotetraploid genome part of Ludwigia grandiflora subsp. hexapetala (hereafterLgh , Genitoni et al. 2020) and one microsatellite set from the autotetraploid artic sea anemone Aulactinia stella (hereafterAs , Bocharova et al. 2018). These two datasets are genetic samples of larger metapopulations genotyped with confident allele dosage, including missing alleles and genotypes, and including some loci fixed in one of the populations. We draw attention of users that the different softwares present different ways to handle fixed, missing alleles and genotypes.
Second, to analyse how population genetic indices of a snapshot of genotyped populations behave in autopolyploids and how they compare to diploids as reported in Stoeckel et al. 2021, we simulated 6300 different datasets following 21 reproductive mode scenarios and three different ploidies (2,4 and 6). Each reproductive scenario at one ploidy level was independently simulated a hundred times to get a confident picture of the range of the possible genetic trajectories. Each of the 21 different reproductive mode scenarios consists on a triplet of values including one rate of clonality, one rate of selfing and one complementary rate of allogamy, the three rates necessarily summing to one. Rates of clonality, selfing and allogamy took complementary values of all the possible combinations within the set [0., 0.2, 0.4, 0.6, 0.8, 1.], constrained to sum to one. For example, one scenario was (rate of clonality=0.2, rate of selfing=0.4, rate of allogamy=0.4). Hereafter, for easier representation, we reported couple of rates of clonality and of selfing in figures and text, implicitly considering that rate of allogamy was one minus the rates of clonality and selfing. In each scenario, we simulated two connected populations of 100 individuals each with a migration rate of 0.01 and mutating at a rate of 0.01, genotyped at 30 markers with 4 possible alleles randomly introduced within individuals in the first generation with the same frequency. Analysed datasets were recorded 1000 generations after the initial generation, corresponding to 5 times the overall instantaneous population size (N=200). Distributions of population genetic indices obtained with the 21 reproductive modes and 3 ploidy levels were reported as violin plots, each made from one hundred independent simulations.
Third, to test the accuracy and precision of ClonEstiMatePoly method to jointly infer rates of clonality, selfing and allogamy in autopolyploid populations genotyped at two-time step, we simulated again 6300 different datasets following the same 21 reproductive mode scenarios, for diploid, tetraploid and hexaploid populations. For each quantitative reproductive mode (i.e. , a precise couple of values of one rate of clonality and one rate of selfing), we simulated 100 couples of populations, each of size N=100, mutating at a rate 1/N and exchanging migrants at a rate of 1/N, over 1000 generations. We submitted the genotypes found in parents (generation 999) and in their descendants (generation 1000) to ClonEstiMatePoly with flat priors to get the inferred posterior distribution of the joint rates of clonality and selfing. The 100 posterior distributions per couple of rates of clonality and selfing were summed and reported as a confusion matrix for ploidy 2, 4 and 6.
All the results were aggregated and deposited in Barloy et al. (2022).