Introduction
Population genetics is a robust, cost- and time-efficient framework to
predict, understand and infer the ecology and evolution of species
(Ewens 2004, Ellegren & Galtier 2016). This paradigm at the center of
biological evolution theory has stood the test of time to predict and
track the ancestral relatedness between individuals at the scale of
studied populations (Wakeley 2005). Using changes of genetic variations
over time and space, population genetic models allow quantifying
evolutionary forces in populations and interpreting them as hypothesized
biological and environmental influences on lineages (Ellegren & Galtier
2016). Among all the possible biological features driving evolution,
reproductive mode is one of the most significant evolutionary force
impacting the dynamics of genetic diversity and its structure among
populations as it determines the transmission of the hereditary DNA
signal over generations (Duminil et al. 2007). In return,
analysing the genetic diversity within populations allows inferring
their reproductive modes, providing a precious knowledge to predict and
understand their ecological and biological evolution. It also helps
better targeting ecological scenarios and more robust inferences of
other evolutionary forces (Fehrer 2010, Yu et al. 2016, Stoeckel et al.
2021). However, to date and despite nearly one century of research,
population genetic models and tools were mostly developed for sexual,
diploid species (Orive & Krueger-Hadfield 2021, Dufresne et al. 2014).
Eukaryotes with more than two sets of homologous chromosomes
(autopolyploids) or duplicated genomic segments are very common in
ferns, flowering plant and fungi species (Barker et al. 2015, Albertin
& Marullo 2012, Wood et al. 2009). Polyploidy seems less frequent in
animals albeit significant in a handful of clades such as in fishes,
cnidarians, amphibians and reptiles (Gregory & Mable 2005, Mableet al. 2011, Boots et al. 2023). It also occurs in some species
only for some chromosomes (aneuploidy), like commonly observed in
partially clonal parasitic protozoa (Tibayrenc & Ayala 2013, Rougeron
et al. 2015).
Polyploidization influences genetic and phenotypic diversity including
potential ecological adaptations and radiations, with a long-term
dynamic from whole genome duplication to re-diploidization (Baduel et
al. 2018, Wu et al. 2019). Interestingly, polyploidy strongly co-occurs
with reproductive modes involving partial clonality, both in natural and
experimental populations (Herben et al. 2017; Van Drunen & Husband
2019). It also seems to be an influential complementary factor to the
more classical Baker’s hypothesis of the advantage of uniparental
reproductive mode, including selfing and clonality, when peripatric
populations establish in new areas (Pandit et al. 2011, Barrett 2018,
Rutland et al. 2021). Studying the reciprocal influences of reproductive
modes on the ecology and evolution of populations is now usual in
diploid populations using their genetic diversity, favoured by a wide
range of tools adapted to analyse their genetic diversity like
Genclone (Arnaud-Haond & Belkhir 2007), RMES (David
et al. 2007) and Rclone (Bailleul et al. 2015). However, it is
less common in polyploid populations. The lack of adapted and easily
accessible analysis solution leads previous studies to consider such
datasets as haplotypes or analyse them as diploid.
Indeed, population genetic studies of polyploid organisms were long
limited by two main difficulties (Dufresne et al. 2014, Jighly et al.
2018). First, accessing robust genotyping in such populations has long
been a true challenge due to the problematic allele dosage in
individuals. For example, it was methodologically impractical to
distinguish between AABB , ABBB and AAAB individuals
at a tetraploid genetic marker with two alleles, A and B, without
assuming hypotheses difficult to verify (Dufresne et al. 2014, Bourke et
al. 2019). Allele dosage difficulties intensify with increasing ploidy
and number of possible alleles at the considered genetic marker, as the
number of combinations of alleles determining the number of possible
genotypes itself increases. However, recent advances in genotyping
methods exploiting deep sequencing with low errors rates combined to
individuals and marker tags unlocked the possibility to genotype
polyploid individuals with confident allele dosage, even in species with
large sets of chromosomes (Delord et al. 2018). These methods benefit
both from the advances made on the sequencing process itself that
decrease sequencing errors and from the development of upstream
molecular processing of genetic samples to tag and target very-specific
genomic regions. These processings increase the sequencing depth of the
genotyped marker and allow reproducible replicates. It is now easier to
access for a limited cost to more than 20 to hundreds of replicated
sequences per SNP or microsatellite allele within each individual in a
pool of individuals using genotype-by-sequence method. For example,
Hiplex genotyping method allows genotyping ~500
individuals at 100 SNPs using one sequencing run (e.g. , MiSeq
2x150 Heflin), with a sequencing depth of ~50 sequences
per allele in tetraploids and ~33 sequences per allele
in hexaploids, resulting in genotype assignations with a confidence
superior to 99% (Delord et al. 2018, Besnard et al. 2023).
Second, we also long lacked of adapted models and analysis methods to
compute population genetic indices and quantify evolutionary forces in
polyploid populations (Dufresne et al. 2014), especially considering
that partially clonal and selfed populations can result in repeated
genotypes (i.e., the same multi-locus genotype found in different
samples, Arnaud-Haond et al. 2007) or patterns of high probabilities of
identity between genotypes (David et al. 2007; Jullien et al, 2019).
Due to challenges introduced by
data formats and difficulties in generalizing the mathematical formula
of population genetic indices (Ewens 2004), common population genetics
softwares, such as Genalex (Peakall & Smouse 2012) and GenClone
(Arnaud-Haond & Belkhir 2007) are not designed to work with partially
clonal populations with more than two allelic copies per gene (Excoffier
& Heckel 2006). A handful of library and software emerged in the last
years, like the command-line Spagedi (Hardy & Vekemans 2002),
the more user-friendly recent and multiplateforme Polygene
(Huang et al. 2020) or Genodive (Meirmans & Tienderen 2004) a
software restricted to MacOS X operating system. However, all
these programs do not compute all the population genetic indices used to
understand and interprete all reproductive modes, including selfing and
clonality in populations, such as indices based on genotypic diversity.
Polygene for example cannot handle repeated genotypes that can
be commonly observed in partially clonal populations. Polysat
(Clarck & Jasieniuk 2011) cannot currently deal with data with
confident allele dosage, which becomes a standard with massive
sequencing & tagging methods. Some R librairies like
Poppr (Kamvar et al. 2014), RClone and
Polysat, and command-line solutions like Spagedi may
help analysing genotypes of polyploid populations with different modes
of reproduction, but they require an exhaustive exploration of their
documentation and some trainings in scripting languages to use them.
During practical courses, they involve a preliminary introduction about
scripting or on the reasons for using some options over another,
complicating teaching population genetics for polyploid species by
dispersing the topic in technical considerations.