Assembly and annotation of genomes for Macquarie perch and golden
perch
Illumina reads, adapter-trimmed using fastp v0.19.5 (Chen, Zhou, Chen,
& Gu, 2018), and Nanopore long reads were hybrid assembled de
novo using MaSuRCA v3.2.4 (Zimin et al., 2017). The short Illumina
reads were first error-corrected and used to construct contigs by the de
Bruijn graph approach, which then were used to error-correct the
Nanopore long reads, generating “mega read” contigs for
Overlap-Layout-Consensus assembly. Genome completeness was assessed
using BUSCO v4 (Seppey, Manni, & Zdobnov, 2019) with default setting,
based on the actinopterygii_odb10 database.
A repeat library was constructed de novo for the assembled genome
with RepeatModeler2 (Flynn et al., 2020), and used to repeat-mask
(soft-mask) the genome with RepeatMasker v 4.0.9 (Smit, Hubley, &
Green, 2013-2015). Transcriptome reads were aligned to the repeat-masked
genome using STAR v2.7.1a (Dobin et al., 2013). The transcriptome
alignment (single-species bam file) and repeat-masked genome were used
as the input for protein coding gene prediction in BRAKER v2.1.2 (Bruna,
Hoff, Stanke, Lomsadze, & Borodovsky, 2020). Functional annotation of
the predicted proteomes was completed using InterProScan 5 (Jones et
al., 2014).