2.3 | Detection and classification of repetitive elements
For de novo identification of repeat elements, we constructed a transposable element (TE) library of the B. schroederi genome to identified types of repeat elements by using Tandem Repeat Finder (TRF) (Benson & G.), LTR_FINDER (X. Zhao & Hao, 2007) and RepeatModeler (v1.0.8) (Smit, Hubley, & Green, 2015). RepeatMasker (Tarailo-Graovac & Chen, 2009) and RepeatProteinMask (Tempel, 2012) were used to search the genome sequences for known repeat elements against the Repbase database (Jurka et al.).
2.4 | Prediction of protein-coding genes and functional annotation
A combined strategy of de novo gene prediction, homology-based search and RNA sequencing-aided annotation was used to perform gene prediction. For homology-based annotation, we selected the protein-coding sequences of five homologous species (Brugia malayi , C. elegans ,Pristionchus pacificus , Steinernema carpocapsae andT. canis ) from NCBI (https://www.ncbi.nlm.nih.gov/). For RNA-based prediction, a male and a female transcriptome sequences were aligned to the genome for assembly using TopHat (v2.1.0) (Trapnell, Pachter, & Salzberg) plus Trinity (v2.0.6) (Haas et al.) strategy. PASApipeline (v.2.1.0) was applied to predict gene structure after which the inferred gene structures were used in AUGUSTUS (v.3.2.3) (Mario et al., 2006) to train gene models based on transcript evidence. In addition, the genome sequence was analyzed by the program GeneMark (v1.0) (John & Mark, 2005) utilizing unsupervised training to build a hidden Markov model. The consistent gene sets were generated by combining all above evidence using MAKER (v.2.31.8) (Campbell, Law, Holt, Stein, & Yandell, 2013). All gene evidence was merged to form a comprehensive and non-redundant gene set using EvidenceModeler (v1.1.1, EVM) (Haas et al., 2008).
In order to perform gene functional annotation, we aligned the gene sets against several known databases, including SwissProt (Amos & Rolf, 2000), TrEMBL (Amos & Rolf, 2000), Kyoto encyclopedia of genes and genomes (KEGG) (Pitk, 2006), clusters of orthologous groups of proteins (COG) (Tatusov et al., 2003) and NR (Jian, 2015). Gene ontology (GO) information was obtained through Blast2go (v.2.5.0) (Conesa et al., 2005). tRNAscan-SE (v1.3.1) (Lowe & Eddy, 1997) was used to predict tRNAs. We aligned the B. schroederi genome against Rfam (v12.0) (Kalvari et al., 2018) database and invertebrate rRNA database to predict snRNA, miRNA and rRNA, respectively. In addition, proteases, protease inhibitors (PIs), and excretory/secretory proteins (EPs) were annotated and identified (for details, see supplementary text).
2.5 | Reconstruction of phylogeny and evolutional analysis using genomes from six nematodes
Genomes and annotation files of three parasitic roundworms (A. suum , T. canis and P. univalens ), one free-living nematode (C. elegans ) and one parasitic plant root nematode (M. hapla ) were download from NCBI or the WormBase database (Table S10) (Michelle, Dubaj, Price, Daryl, & Hurd, 2019). The syntenic analyses both at whole-genome nucleotide-level and protein-level were performed by aligning the five nematodes genomes to our assembledB. schroederi genome by using MCScanX (Y. Wang et al., 2012) software. Orthogroups among the six nematodes were defined using TreeFam (v4.0) (H. Li et al., 2006). Next, the protein sequences from each family were aligned using MUSCLE (v3.8.31) (Edgar, 2004) with default parameters. The conserved CDS alignments were extracted by Gblocks (Gerard & Jose, 2007). We selected single-copy gene families to construct phylogenetic trees based on maximum likelihood using RAxML (v8.2.4) (Alexandros, 2014) with PROTCATGTR nucleotide substitution model with 500 bootstrap replicates.
Using the divergence time between C. elegans and A. suum,which was calculated based on fossil evidence, as the reference time points (Mcgill, Fitzpatrick, Pisani, & Burnell, 2017), we estimated the divergence time between each species by MCMCtree from the PAML (v4.8) (Bo & Yang, 2013) package with default parameters.
2.6 | E xpanded and contracted gene families
Based on the phylogenetic tree we constructed using the 2451 single-copy genes, we explored significantly expanded and contracted gene families in the six nematodes (A. suum , B. schroederi , T. canis , P. univalens , C. elegans andM. hapla ). Gene family expansion and contraction analyses were performed using CAFÉ (v4.2) (Bie, Cristianini, Demuth, & Hahn, 2006). We used a median gene number to estimate the changes in gene family size. The overall P-value of each gene family were calculated and the details of each branch and node with the exact P-values of each significant overall P-value (≤ 0.01) gene family were also calculated. Here we performed analysis in two levels: 1) Comparing gene families of roundworms to those of other nematodes to determine the expanded and contracted genes in roundworms; 2) Comparing the B. schroederiand other three roundworms (A. suum , P. univalens andT. canis ) to identify gain and loss of genes in B. schroederi , to further determine their differentiation within the roundworm branch. In addition, seven detoxification-related gene families of four roundworms were identified by HMMER3 (for details, see supplementary text) (Jaina, Finn, Eddy, Alex, & Marco, 2013).