Running title: De novo assembly of Takin (B.
taxicolor)
Anning Li1#, Qimeng Yang1,2#, Ran
Li1,2, Xuelei Dai1,2, Keli
Cai1, Yinghu Lei3, Kangsheng
Jia3, Yu Jiang1,2*, Linsen
Zan1,3*
1 College of Animal Science and Technology, Northwest
A&F University, Yangling 712100,
Shaanxi, P. R. China
2 Center for Ruminant Genetic and Evolution, Northwest
A&F University, Yangling 712100, Shaanxi, P. R. China
3 Research Center for the Qinling Giant Panda (Shaanxi
Rare Wildlife Rescue Base), Shaanxi Academy of Forestry Sciences,
Zhouzhi 710402, Shaanxi, P. R. China
# These authors contributed equally to this study.
* Corresponding E-mail: yu.jiang@nwafu.edu.cn ;zanlinsen@163.com .
Abstract: The takin (Budorcas taxicolor ) is one of the
largest bovid herbivores across caprinae subfamily. The takin is at high
risk of extinction, however, its taxonomic status is still unclear. In
this study, we constructed the first reference genome of B.
taxicolor using PacBio long High-Fidelity reads and Hi-C technology.
The assembled genome is ~2.95 Gb with a contig N50 of
68.05 Mb and a scaffold N50 of 101.27 Mb, which were anchored onto 25+XY
chromosomes. Compared to the common ancestral karyotype of bovidae
(2n=60), we found the takin (2n=52) experienced four chromosome fusions
and one large translocation. We also found that the takin was most
closely related to muskox, not other caprinae species. Further, we
re-sequenced nine golden takins from the main distribution area, Qinling
Mountains, and identified 3.3 million SNPs. The genetic diversity of
takin was very low (θπ=0.00028 and heterozygosity=0.00038), which was
among the lowest detected in the domestic and wild mammals. We also
found takin genomes showed high inbreeding coefficient (FROH=0.217)
suggesting severe inbreeding depression. The genome analysis show that
the effective population size of takins declined significantly from
~100,000 years ago. Our results provide valuable
information for protection of takins and insights into its evolution.
Keywords : Takin; PacBio HiFi; Hi-C; Chromosomal evolution;
Inbreeding depression
Introduction
The takin (Budorcas taxicolor ) is a large bovid herbivore,
belonging to the Caprinae subfamily. The number of takins was estimated
only about 7,000 ~ 12,000 in the world (Cheng et al.
2020). Takin has been listed as vulnerable by the International Union
for Conservation of Nature (IUCN) (Zeng and Song 2002). It was divided
into four subspecies according to the physiological characteristics and
geographical location (Wu 1986). Qinling takin (B. t. bedfordi )
and Sichuan takin (B. t. tibetana ) are only confined in China,
while the Mishmi takin (B. t. taxicolor ) has a distributed area
from Gaoligong Mountains in northwestern Yunnan province, China to India
and Myanmar (Wu 1986). In addition, Bhutan takin (B. t. whieti )
is found in Bhutan and the Yarlung Zangbo River in Tibet, China (Li et
al. 2003). Qinling takin (B. t. bedfordi ), also known as golden
takin, is mainly distributed in the Qinling Mountains of China
(Figure 1a ) (Li et al. 2021).
Compared to other Caprinae animals, the morphology, ecological traits
and G-banded karyotype of takin were found to be similar to muskox
(Ovibos moschatus ) (Pasitschniak-Arts et al. 1994). However, the
analysis of mitochondrial cytochrome b genes (Cytb ) sequences
showed that muskox and takin were not close with each other but under
convergent evolution (Groves and Shields 1997, Ren et al. 2012).
Recently, based on the complete mitochondrial genome, takin was found to
be closely related to goat (Capra hircus ) rather than sheep
(Ovis aries ) (Feng et al. 2016, Kumar et al. 2019, Zhou et al.
2019). In our previous study, takin was found to be closely related to
sheep rather than goat according to the transcriptome analysis (Qiu et
al. 2021). Thus, to clarify on its taxonomic status, further analysis is
needed at the whole genomic level.
Recently, highly accurate long high-fidelity (HiFi) reads was generated
by PacBio single-molecule real-time (SMRT) sequencing with circular
consensus sequencing (CCS) (Wenger et al. 2019). HiFi reads combined
with Hi-C technology have been used to construct
chromosome-level reference genome
of various animals (Wu et al. 2021, Zhou et al. 2021) and plants (Chen
et al. 2021, Huang et al. 2021, Ma et al. 2021, Sharma et al. 2021, Wang
et al. 2021). In this study, we performed the PacBio long HiFi reads and
Hi-C technology to construct a high-quality chromosome-level reference
genome of takin. The phylogenetic relationship, chromosomal evolution,
genetic diversity, demographic history and inbreeding depression of
takin were analyzed. These information will be helpful for understanding
the evolution of the takin and working towards its conservation.
Materials and Methods
2.1 Sample collection and ethics statement
The liver sample from a dead adult male golden takin was used forde novo genome sequencing. Seven Blood samples (alive) and two
muscle samples (dead) from nine golden takins were used for genome
resequencing. All of the samples were collected from the Rare Wildlife
Rescue and Breeding Research Center in Qinling Mountains. All the
experimental procedures were carried out according to the guidelines of
the China Council on Animal Care and approved by the Experimental Animal
Management Committee (EAMC) of Northwest A&F University.
Genome sequencing and assembly
A 15 Kb DNA SMRTbell library was constructed for SMRT sequencing
according to a standard protocol (Ardui et al. 2018) using PacBio Sequel
II platform with circular consensus sequencing (CCS). HiFi reads were
assembled in de novo using Hifiasm (v0.13) (Cheng et al. 2021).
Then the restriction enzyme Mbo I was used to digest cross-linked
chromatin to construct the Hi-C library, which was sequenced on an
Illumina NovaSeq6000 platform. The Hi-C contigs were anchored onto the
chromosomes using Juicer (v1.6) (Durand et al. 2016) and 3D-DNA
(v201008) (Dudchenko et al. 2017) combined with Juicebox
(https://github.com/theaidenlab/juicebox ). BUSCO (v3.0.2) was
used to assess the completeness of assembled genome (Simao et al. 2015).
Genome resequencing was performed by the Illumina NovaSeq6000 platform.
All of the sequencing services were provided by the Berry Genomics
Biotechnologies Co., Ltd. (Beijing, China). A resequencing genome of
golden takin (TX-11) was selected to estimate the genome size and
heterozygosity rate by gce (v1.0.2)
(https://arxiv.org/abs/1308.2012v2 ).
Repeats and gene annotation
The RepeatMasker software (v4.1.2) (http://www.repeatmasker.org) and the
RepBase library (2018.10.26) were used to identify the repeats in takin
genome. Homology-based method was used to predict the protein-coding
genes with RNA-seq data. Firstly, we annotated the protein-coding genes
based on the protein sequences from Bos taurus , C. hircus ,O. aries and Homo sapiens using the BRAKER (v2.1.5)
pipeline (Hoff et al. 2016, Hoff et al. 2019). Secondly, we aligned
three RNA-seq datasets (PRJNA720167) to the takin’s genome using STAR
software (v2.7.1a) (Dobin et al. 2013), which were predicted by the
BRAKER pipeline. Next, the BRAKER with TSEBRA module (Gabriel et al.
2021) was used to integrate the prediction results. Finally, the
prediction results were used to align against the non-redundant (Nr)
database using the DIAMOND (v2.0.11.149) software (Buchfink et al. 2015,
Buchfink et al. 2021). GC content distribution of genome was performed
by bedtools (v2.26.0) with a window of 500 kb (Quinlan 2014).
Mitochondrial genome assembly
All the HiFi reads (Average length 14 kb) were blasted to the
mitochondrial genome of Sichuan takin (GenBank accession No.
NC_039686.1) and assembly reference genome of golden takin using the
Minimap2 (v2.22-r1101) (Li 2018). The HiFi reads aligned to NC_039686.1
were selected to assemble the mitochondrial genome of golden takin. The
sequence similarity was analyzed using the BLAST search program. The
annotation of mitochondrial genome was completed by the AGORA online
platform (Jung et al. 2018).
Phylogenetic analysis and
divergence time estimation
We downloaded nine mitochondrial genome from GenBank database to
construct a phylogenetic tree, including B. taurus (GenBank
accession No. NC_006853.1), C. hircus (GenBank accession No.
NC_005044.2), O. aries (GenBank accession No. NC_001941.1),Pantholops hodgsonii (GenBank accession No. NC_007441.1),O. moschatus (GenBank accession No. NC_020631.1), Oreamnos
americanus (GenBank accession No. NC_020630.1), Ammotragus
lervia (GenBank accession No. NC_009510.1), Pseudois nayaur(GenBank accession No. NC_020632.1), and B. taxicolor (GenBank
accession No. NC_039686.1, NC_013069.1, NC_043930.1, KY399869.1,
KU361169.1 and OM237313 ). Firstly, the mitochondrial genomes were
aligned by MUSCLE (v3.8.31) (Edgar 2004) to exclude the ambiguous
regions. Then, the neighbor-joining (NJ) method was selected using MEGA
(v11) (Tamura et al. 2021) with 1000 bootstrap replicates.
The genome of B. taurus (ARS-UCD1.2) (Rosen et al. 2020) was
selected as reference, and that of O. americanus (ASM975805v1)
(Martchenko et al. 2020),O. aries(Oar_rambouillet_v1.0),C. hircus (ARS1) (Worley
2017),P. hodgsonii (PHO1.0) (Ge et al. 2013),A. lervia(ALER1.0) (Chen et al. 2019),Pseudois nayaur (ASM318257v1) (Chen
et al. 2019), O. moschatus (ASM2146233v1) and B. taxicolor(Takin1.0) were separately aligned to the reference genome using LAST
software (v942) (Kielbasa et al. 2011). The pair-wise alignments were
merged into multiple genome alignments using the MULTIZ software (v11.2)
(Blanchette et al. 2004). The consensus coding sequences of nine species
were extracted using the Perl scripts from RGD (v2.0). 5, 862
single-copy homologous genes were identified using the DIAMOND
(v2.0.11.149) and RAxML (v8.2.12) (Stamatakis 2014) with Maximum
Likelihood (ML) method to construct the phylogenetic tree. The
estimations of divergence time were calculated using PAML MCMCtree
(v4.9j) (Yang 2007) with calibrated time from the previous study (Chen
et al. 2019).
Chromosomal evolution
The ancestral chromosome karyotype was reconstructed using the genomes
of B. taurus (Rosen et al. 2020), C. hircus (Li et al.
2021), O. aries (Davenport et al. 2021), B. taxicolor andPhyseter catodon (Fan et al. 2019) (as outgroup). The genome ofB. taurus was selected as the reference, and other genomes were
aligned to the reference genome using LAST (v942) (Kielbasa et al.
2011). Next, “chain” and “net” files were generated from axtChain
and ChainNet, which were used as input for DESCHRAMBLER at a 350 kb
resolution (Kim et al. 2017). Lastly, we analyzed the collinearity of
the genomes of B. taurus , O. aries and B. taxicolorusing mummer (v4.0beta2) (Marcais et al. 2018). We visualized the
collinearity region and detected
the chromosome fusion events using the RectChr
(https://github.com/BGI-shenzhen/RectChr ).
Heterozygosity estimation
Genome resequencing datasets from nine golden takins were aligned with
the assembly reference genome using BWA-MEM (v0.7.17)
(http://arxiv.org/abs/1303.3997 ). The SAMtools (v1.7) (Li et al.
2009) was used to convert sam to bam files. The single nucleotide
polymorphisms (SNPs) were called and filted using the GATK (v4.1.7.0)
(McKenna et al. 2010) with parameters “QD < 2.0, QUAL
< 30.0, FS > 60.0, MQ < 40.0, MQRankSum
< -12.5, ReadPosRankSum < -8.0”. The nucleotide
diversity (θπ) was calculated using the VCFtools (v0.1.16) with a 50 kb
window (Danecek et al. 2011). Genome resequencing datasets 8 giant
pandas (Zhao et al. 2013) were used to calculate θπ. The genome-wide
heterozygosity rate was calculated as previously described (Liu et al.
2021). Genome resequencing datasets from 26 snub-nosed monkeys (Yu et
al. 2016) were used to calculate heterozygosity.
The heterozygosity rates of other
species were obtained from previous reports (Li et al. 2010, Cho et al.
2013, Dobrynin et al. 2015, Liu et al. 2021).
ROH estimation and inbreeding coefficient
The runs of homozygosity (ROH) were estimated across autosomes for nine
golden takins using PLINK (v1.90b6.21) (Chang et al. 2015). The
following PLINK parameters (Purcell et al. 2007) were applied to define
a ROH: (i) a 50 SNPs sliding window across the genome; (ii) the
proportion of homozygous overlapping windows was 0.05; (iii) minimum
number of consecutive SNPs included in a ROH was 50; (iv) required
minimum density was set at one SNP per 50 kb; (v) maximum gap between
consecutive homozygous SNPs was 1000 kb; and (vi) maximum of five SNPs
with missing genotypes and up to three heterozygous genotype were
allowed in a ROH. Inbreeding coefficient based on ROH (FROH) for each
individual was calculated according to previous study (McQuillan et al.
2008). The number of generations was estimated to the common ancestor of
these homologous sequences as previously described (Thompson 2013).