Protein-coding gene annotation
Protein-coding genes of A. flavipes and Didelphis virginiana were annotated using homology-based prediction, de novo prediction, and RNA-seq-assisted prediction methods. Sequences of homologous proteins from five mammals [human (Homo sapiens ),M. domestica ), P. cinereus , S. harrisii , andV. ursinus )] were downloaded from NCBI. These protein sequences were aligned to the repeat-masked genome using BLAT v0.36 (Kent, 2002). Genewise v2.4.1 (Birney, Clamp, & Durbin, 2004) was employed to generate gene structures based on the alignments of proteins to the genome assembly. De novo gene prediction was performed using AUGUSTUS v3.2.3 (Stanke et al., 2006), GENSCAN v1.0 (Burge & Karlin, 1997), and GlimmerHMM v3.0.1 (Majoros, Pertea, & Salzberg, 2004) with a human training set. Transcriptome data were mapped to the assembled genome using HISAT2 v2.1.0 (Kim, Paggi, Park, Bennett, & Salzberg, 2019) and SAMtools v1.9 (Li et al., 2009), and coding regions were predicted using TransDecoder v5.5.0 (Grabherr et al., 2011; Haas et al., 2013). A final non-redundant reference gene set was generated by merging the three annotated gene sets using EVidenceModeler v1.1.1 (EVM) (Haas et al., 2008) and excluding EVM gene models with only ab initio support. The gene models were translated into amino acid sequences and used in local BLASTp (Camacho et al., 2009) searches against the public databases Kyoto Encyclopedia of Genes and Genomes (KEGG; v89.1) (Kanehisa & Goto, 2000), Clusters of Orthologous Groups (COG) (Tatusov, Galperin, Natale, & Koonin, 2000), NCBI non-redundant protein sequences (NR; v20170924) (O’Leary et al., 2016), Swiss-Prot (release-2018_07) (UniProt, 2012), TrEMBL (TRanslation of EMBL [nucleotide sequences that are not in Swiss-Prot]; release-2018_07) (O’Donovan et al., 2002), and InterPro (v69.0) (A. L. Mitchell et al., 2019). A total of 18,068 (93.5%) ofA. flavipes genes could be functionally annotated. Where specific genes are named in this manuscript, human nomenclature assignments are used unless otherwise noted.