Searching for sex-linked loci in whole genome sequencing (WGS) data for Macquarie perch
To explore sex-linkage across the draft genome, we obtained whole genome resequencing data (WGS) for 100 Macquarie perch from Dartmouth and Yarra; 25 individuals of each sex per population (all of these were also DArT-sequenced). DNA was extracted as above. For each individual, an Illumina Novaseq library was constructed, as above. Libraries were sequenced at Deakin Genome Centre to at least 10X depth (>7 Gb of data per library; list of samples and individual read-depth coverage in Supplementary Material S1).
A total of 25 million paired-end reads were randomly subsampled from each individual’s WGS data and combined into four pools (25 individuals of each sex per population): Dartmouth females, DF, Dartmouth males, DM, Yarra females, YF, and Yarra males, YM. Reads from the four pools were mapped onto the Macquarie perch reference genome (NCBI WGS Project accessions: SEMN01) using bwa-mem v0.7.17 (Li & Durbin, 2009). A Cochran-Mantel-Haenszel (CMH) test implemented in PoPoolation2(Kofler, Pandey, & Schlötterer, 2011) was used to detect significant and consistent allele frequency differences between sexes in two biological replicates. SNP loci with allele frequencies highly significantly (p<1e-20) differentiated between sexes were examined for being clustered in the same genomic region. Because one scaffold (scaffold 633, 274905 bp length, GenBank accession SEMN01000633) contained a set of the most differentiated loci, we also examined whether it had other loci with lower CMH test significance levels (p<1e-5) and whether some loci were polymorphic in one sex but monomorphic in the other.
Individual genotyping using all WGS read data was also performed, by aligning the trimmed paired-end reads for each individual to the reference genome using bwa-mem v0.7.17-r1188, followed by variant calling on the merged BAM files from all individuals, using strelka v2.9.10 (Kim et al., 2018). Individual genotypes for loci inferred to be sex-linked by CMH tests were examined for consistency with being XY-gametologs, old-X-linked loci or bearing recent-Y-specific polymorphism (ii-iv above). For the set of most strongly sex-differentiated SNP loci on scaffold 633, we visualized individual alignments for males in Tablet (Milne et al., 2012) to investigate whether the mate-pair architecture of the sequenced libraries resolved female- and male-specific haplotypes via manual inspection (similar approach in Kijas et al., 2018b). We also explored ploidy of these loci by investigating read-depth coverage for each base of the scaffold 633 region enriched with sex-linked loci. Read depth was collected using theCollectAllelicCounts function of GATK v4.1.0 (DePristo et al., 2011) from individual alignment files, output files were merged and analysed in R. Read depth for each individual was normalized by its genome-wide average read depth, and averaged across the four sex-by-population samples. The genome-wide average for each individual was calculated as the number of all mapped reads multiplied by 151 bp (read length) and divided by genome size. We calculated read depth for reference and alternate alleles separately.
Finally, we investigated whether scaffold 633 enriched with sex-linked loci has more indels in males than females, and whether these indels are clustered, as would be expected from accumulation of Y-specific polymorphisms due to recent reduction in recombination. Manta (Chen et al., 2016) was used to score structural variants (insertions, deletions, inversions and repeats) based on supporting paired and split-read evidence, using default parameters. Fisher and beta-binomial tests implemented in R libraries R-core and ibb (Pham & Jimenez, 2012) respectively were used to compare counts of different classes of structural variants between sexes, and sex-by-population samples.