2.3 | Detection and classification of repetitive
elements
For de novo identification of repeat elements, we constructed a
transposable element (TE) library of the B. schroederi genome to
identified types of repeat elements by using Tandem Repeat Finder (TRF)
(Benson & G.), LTR_FINDER (X. Zhao & Hao, 2007) and RepeatModeler
(v1.0.8) (Smit, Hubley, & Green, 2015). RepeatMasker (Tarailo-Graovac
& Chen, 2009) and RepeatProteinMask (Tempel, 2012) were used to search
the genome sequences for known repeat elements against the Repbase
database (Jurka et al.).
2.4 | Prediction of protein-coding genes and
functional annotation
A combined strategy of de novo gene prediction, homology-based search
and RNA sequencing-aided annotation was used to perform gene prediction.
For homology-based annotation, we selected the protein-coding sequences
of five homologous species (Brugia malayi , C. elegans ,Pristionchus pacificus , Steinernema carpocapsae andT. canis ) from NCBI (https://www.ncbi.nlm.nih.gov/). For
RNA-based prediction, a male and a female transcriptome sequences were
aligned to the genome for assembly using TopHat (v2.1.0) (Trapnell,
Pachter, & Salzberg) plus Trinity (v2.0.6) (Haas et al.) strategy.
PASApipeline (v.2.1.0) was applied to predict gene structure after which
the inferred gene structures were used in AUGUSTUS (v.3.2.3) (Mario et
al., 2006) to train gene models based on transcript evidence. In
addition, the genome sequence was analyzed by the program GeneMark
(v1.0) (John & Mark, 2005) utilizing unsupervised training to build a
hidden Markov model. The consistent gene sets were generated by
combining all above evidence using MAKER (v.2.31.8) (Campbell, Law,
Holt, Stein, & Yandell, 2013). All gene evidence was merged to form a
comprehensive and non-redundant gene set using EvidenceModeler (v1.1.1,
EVM) (Haas et al., 2008).
In order to perform gene functional annotation, we aligned the gene sets
against several known databases, including SwissProt (Amos & Rolf,
2000), TrEMBL (Amos & Rolf, 2000), Kyoto encyclopedia of genes and
genomes (KEGG) (Pitk, 2006), clusters of orthologous groups of proteins
(COG) (Tatusov et al., 2003) and NR (Jian, 2015). Gene ontology (GO)
information was obtained through Blast2go (v.2.5.0) (Conesa et al.,
2005). tRNAscan-SE (v1.3.1) (Lowe
& Eddy, 1997) was used to predict tRNAs. We aligned the B.
schroederi genome against Rfam (v12.0) (Kalvari et al., 2018) database
and invertebrate rRNA database to predict snRNA, miRNA and rRNA,
respectively. In addition,
proteases, protease inhibitors
(PIs), and excretory/secretory proteins (EPs) were annotated and
identified (for details, see supplementary text).
2.5 | Reconstruction of phylogeny and evolutional
analysis using genomes from six nematodes
Genomes and annotation files of three parasitic roundworms (A.
suum , T. canis and P. univalens ), one free-living
nematode (C. elegans ) and one parasitic plant root nematode
(M. hapla ) were download from NCBI or the WormBase database
(Table S10) (Michelle, Dubaj, Price, Daryl, & Hurd, 2019). The syntenic
analyses both at whole-genome nucleotide-level and protein-level were
performed by aligning the five nematodes genomes to our assembledB. schroederi genome by using MCScanX (Y. Wang et al.,
2012) software. Orthogroups among the six nematodes were defined using
TreeFam (v4.0) (H. Li et al., 2006). Next, the protein sequences from
each family were aligned using MUSCLE (v3.8.31) (Edgar, 2004) with
default parameters. The conserved CDS alignments were extracted by
Gblocks (Gerard & Jose, 2007). We selected single-copy gene families to
construct phylogenetic trees based on maximum likelihood using RAxML
(v8.2.4) (Alexandros, 2014) with PROTCATGTR nucleotide substitution
model with 500 bootstrap replicates.
Using the divergence time between C. elegans and A. suum,which was calculated based on fossil evidence, as the reference time
points (Mcgill, Fitzpatrick, Pisani, & Burnell, 2017), we estimated the
divergence time between each species by MCMCtree from the PAML (v4.8)
(Bo & Yang, 2013) package with default parameters.
2.6 |
E xpanded and contracted
gene families
Based on the phylogenetic tree we constructed using the 2451 single-copy
genes, we explored significantly expanded and contracted gene families
in the six nematodes (A. suum , B. schroederi , T.
canis , P. univalens , C. elegans andM. hapla ). Gene family
expansion and contraction analyses
were performed using CAFÉ (v4.2) (Bie, Cristianini, Demuth, & Hahn,
2006). We used a median gene number to estimate the changes in gene
family size. The overall P-value of each gene family were calculated and
the details of each branch and node with the exact P-values of each
significant overall P-value (≤ 0.01) gene family were also calculated.
Here we performed analysis in two levels: 1) Comparing gene families of
roundworms to those of other nematodes to determine the expanded and
contracted genes in roundworms; 2) Comparing the B. schroederiand other three roundworms (A. suum , P. univalens andT. canis ) to identify gain and loss of genes in B.
schroederi , to further determine their differentiation within the
roundworm branch. In addition, seven detoxification-related gene
families of four roundworms were identified by HMMER3 (for details, see
supplementary text) (Jaina, Finn, Eddy, Alex, & Marco, 2013).