2.3 Impact of filtering sex-linked loci on genetic diversity, individual heterozygosity, genetic structure and parentage analyses
Population genetic diversity. Six measures of population genetic diversity were calculated for ‘before’ and ‘after’ datasets: observed (Ho) and expected heterozygosity (He), Wright’s fixation index (F IS), polymorphism (P), number of private alleles not present in any other population (PA), and allelic richness (AR). Ho, He, F IS and PA were calculated withdartR package v2.0.4 (function gl.report.heterozygosity method = ‘pop’, and function gl.report.pa method = ‘one2rest’). AR was calculated using hierfstat package v0.5-11 (function allelic.richness; Goudet 2004). P was calculated as the proportion of loci that were polymorphic in a given population.
Individual observed heterozygosity (Ho). Individual Ho was calculated with dartR function gl.report.heterozygosity (method = ‘ind’). In order to measure whether individual Ho changed when sex-linked loci were removed, we compared ‘before’ and ‘after’ individual Ho with a paired t-test (α = 0.05) per sex. We also tested for significant differences in individual Ho between males and females (independent sample t-test), with ‘before’ and ‘after’ datasets. Cohen’s d was used to measure effect sizes.
Genetic structure. Genetic structure between populations was qualitatively assessed with Pearson Principal Component Analyses (PCA,dartR function gl.pcoa). In order to reduce computation time, loci whose Minor Allele Count (MAC) was below 3 were removed from all datasets (dartR function gl.filter.maf, threshold = 3). We report results for the first two PCs, but the six major PCs were explored.
Parentage analyses. Given the potential for sex-linked chromosomes to affect the inference of parentage relationships, we performed separate parentage analyses using ‘before’ and ‘after’ datasets. We analysed 677 EYR individuals, and 527 YTH individuals (cassidix only). In both cases, MAC = 3 was applied to keep only loci shared between at least two individuals in order to reduce computation time. The genetic datasets for EYR consisted of 13,685 and 12,618 SNPs for the ‘before and ‘after’ datasets, respectively. Forcassidix , the ‘before’ dataset comprised 11,477 SNPs, and the ‘after’ dataset, 10,848 SNPs (Table 2).
Parentage analyses were run in COLONY v2.0.6.8 (Jones & Wang 2010). The function gl2colony was used to transform the genetic datasets to a COLONY input file. We assigned all individuals as candidate offspring, all females as candidate mothers (EYR: n = 308, cassidix : n = 255), and all males as candidate fathers (EYR: n = 369, cassidix: n = 272). In the case of EYR, candidate parents for 203 offspring were excluded based on year of birth, year of death (when known) and excessive geographical distance (Austin et al. unpublished manuscript ). For both species, we used a full-likelihood approach (‘likelihood = 1’) with medium runs (‘length_run = 2’) at medium precision (‘precision_fl = 1’). We assumed polygamy (‘polygamy_male = 0’, ‘polygamy_female = 0’) and a prior probability that the true parent is present in the sample of 0.5 (‘probability_mother’, ‘probability_father’). Allele frequencies were not updated in order to minimize computational time (‘update_allele_freq = 0’). Forcassidix , we indicated the presence of inbreeding (‘inbreed = 1’) and set genotyping error to 0.05 (‘other_typ_err = 0.05@’) after Robledo-Ruiz et al. (2022). Genotyping error for EYR was set to empirically-determined 0.03, following Austin et al. (unpublished manuscript ). Due to the stochasticity of the method implemented in COLONY (Jones & Wang 2010), we performed five independent runs per dataset (each with a different seed) to better explore the space of potential pedigree configurations.
Parentage assignments per run were compared to a set of known parentage relationships: 119 social EYR mothers observed consistently attending the nest and incubating (Austin et al. unpublished manuscript ), and 45 YTH known parent-offspring relationships from cassidixcaptive breeding (Robledo-Ruiz et al. 2022). The accuracy of parentage assignments was measured in two ways: (i) by counting how many runs out of five correctly identified a parent per known parentage relationship, and comparing before and after averages using a paired t-test, and (ii) by assigning as final parents those that were identified in at least three out of five runs (following Robledo-Ruiz et al. 2022) and testing whether the number of correct final assignments was positively associated with the removal of sex-linked loci with a χ2-test.
Minimum number of known-sex individuals forfilter.sex.linked function
We used both EYR and YTH datasets to estimate the number of sex-linked loci that are identified with subsets of known-sex individuals of variable size. We created eight subsets: 20, 24, 30, 40, 50, 100, 200 and 400 individuals chosen at random, all with 1:1 sex ratio, and applied function filter.sex.linked to each. We then identified the smallest subset of known-sex individuals with which it was still possible to identify sex-linked loci, and tested whether those loci were useful to sex the rest of the individuals and in turn, use the new sex assignments to identify all sex-linked loci. For this, we created five random subsets of known-sex individuals of the smallest size (24 and 30 known-sex individuals for EYR and YTH, respectively; see Results 3), applied function filter.sex.linked followed by functioninfer.sex , and used the new sex assignments to re-runfilter.sex.linked .