Assembly and annotation of genomes for Macquarie perch and golden perch
Illumina reads, adapter-trimmed using fastp v0.19.5 (Chen, Zhou, Chen, & Gu, 2018), and Nanopore long reads were hybrid assembled de novo using MaSuRCA v3.2.4 (Zimin et al., 2017). The short Illumina reads were first error-corrected and used to construct contigs by the de Bruijn graph approach, which then were used to error-correct the Nanopore long reads, generating “mega read” contigs for Overlap-Layout-Consensus assembly. Genome completeness was assessed using BUSCO v4 (Seppey, Manni, & Zdobnov, 2019) with default setting, based on the actinopterygii_odb10 database.
A repeat library was constructed de novo for the assembled genome with RepeatModeler2 (Flynn et al., 2020), and used to repeat-mask (soft-mask) the genome with RepeatMasker v 4.0.9 (Smit, Hubley, & Green, 2013-2015). Transcriptome reads were aligned to the repeat-masked genome using STAR v2.7.1a (Dobin et al., 2013). The transcriptome alignment (single-species bam file) and repeat-masked genome were used as the input for protein coding gene prediction in BRAKER v2.1.2 (Bruna, Hoff, Stanke, Lomsadze, & Borodovsky, 2020). Functional annotation of the predicted proteomes was completed using InterProScan 5 (Jones et al., 2014).