Searching for sex-linked loci in whole genome sequencing (WGS)
data for Macquarie perch
To explore sex-linkage across the draft genome, we obtained whole genome
resequencing data (WGS) for 100 Macquarie perch from Dartmouth and
Yarra; 25 individuals of each sex per population (all of these were also
DArT-sequenced). DNA was extracted as above. For each individual, an
Illumina Novaseq library was constructed, as above. Libraries were
sequenced at Deakin Genome Centre to at least 10X depth (>7
Gb of data per library; list of samples and individual read-depth
coverage in Supplementary Material S1).
A total of 25 million paired-end reads were randomly subsampled from
each individual’s WGS data and combined into four pools (25 individuals
of each sex per population): Dartmouth females, DF, Dartmouth males, DM,
Yarra females, YF, and Yarra males, YM. Reads from the four pools were
mapped onto the Macquarie perch reference genome (NCBI WGS Project
accessions: SEMN01) using bwa-mem v0.7.17 (Li & Durbin, 2009). A
Cochran-Mantel-Haenszel (CMH) test implemented in PoPoolation2(Kofler, Pandey, & Schlötterer, 2011) was used to detect significant
and consistent allele frequency differences between sexes in two
biological replicates. SNP loci with allele frequencies highly
significantly (p<1e-20) differentiated between sexes were
examined for being clustered in the same genomic region. Because one
scaffold (scaffold 633, 274905 bp length, GenBank accession
SEMN01000633) contained a set of the most differentiated loci, we also
examined whether it had other loci with lower CMH test significance
levels (p<1e-5) and whether some loci were polymorphic in one
sex but monomorphic in the other.
Individual genotyping using all WGS read data was also performed, by
aligning the trimmed paired-end reads for each individual to the
reference genome using bwa-mem v0.7.17-r1188, followed by variant
calling on the merged BAM files from all individuals, using strelka
v2.9.10 (Kim et al., 2018). Individual genotypes for loci inferred to be
sex-linked by CMH tests were examined for consistency with being
XY-gametologs, old-X-linked loci or bearing recent-Y-specific
polymorphism (ii-iv above). For the set of most strongly
sex-differentiated SNP loci on scaffold 633, we visualized individual
alignments for males in Tablet (Milne et al., 2012) to investigate
whether the mate-pair architecture of the sequenced libraries resolved
female- and male-specific haplotypes via manual inspection (similar
approach in Kijas et al., 2018b). We also explored ploidy of these loci
by investigating read-depth coverage for each base of the scaffold 633
region enriched with sex-linked loci. Read depth was collected using theCollectAllelicCounts function of GATK v4.1.0 (DePristo et al.,
2011) from individual alignment files, output files were merged and
analysed in R. Read depth for each individual was normalized by its
genome-wide average read depth, and averaged across the four
sex-by-population samples. The genome-wide average for each individual
was calculated as the number of all mapped reads multiplied by 151 bp
(read length) and divided by genome size. We calculated read depth for
reference and alternate alleles separately.
Finally, we investigated whether scaffold 633 enriched with sex-linked
loci has more indels in males than females, and whether these indels are
clustered, as would be expected from accumulation of Y-specific
polymorphisms due to recent reduction in recombination. Manta (Chen et
al., 2016) was used to score structural variants (insertions, deletions,
inversions and repeats) based on supporting paired and split-read
evidence, using default parameters. Fisher and beta-binomial tests
implemented in R libraries R-core and ibb (Pham &
Jimenez, 2012) respectively were used to compare counts of different
classes of structural variants between sexes, and sex-by-population
samples.