10Sahlgrenska Cancer Center, Zhuhai, Guangdong,
China
*Shared first authors. §Corresponding author.
Abstract
CHO cell lines are a workhorse for the production of pharmaceutical
proteins, but show some limitations in the variability and stability of
N-glycosylation profiles. One promising approach to addressing this at
the required systems-level is miRNA, which can regulate a large number
of genes and have predictable targets. Herein, we first identifiedde novo 656 potential miRNAs in the CHO genome based on a
combination of literature, database searching, and miRNA sequencing. We
further sequenced mRNA from the same cultures, and used a combination of
mRNA-miRNA correlation analysis, target prediction and literature
searches to find miRNAs potentially targeting N-glycosylation. Our ten
best miRNA candidates were subjected to miRNA overexpression, knockdown,
or knock-out in CHO cell lines. Out of the ten candidates, four
(miR-128, miR-34c, miR-30b, and miR-449a) showed positive effects on
N-glycosylation and could be applied directly for CHO cell engineering.
The fact that 40% of the screened targets had a desired effect, and the
prediction of 656 miRNAs illustrates the massive potential of miRNA
engineering in CHO.
Key words:
CHO cell, Stable microRNA engineering, Crispr/Cas9, N-glycosylation,
Differential equation, Correlation analysis, miRNA.
Introduction
Recombinant therapeutic proteins provide innovative and novel medical
treatment possibilities for many difficult-to-treat diseases. Sales have
been estimated to be 140 billion USD in 2013
(Walsh 2014) and are
continually growing. Chinese hamster ovary (CHO) cells have, as the most
predominant expression platforms for recombinant therapeutic proteins,
for several decades shown high adaptability and robustness in industrial
scale applications and great capability of producing proteins with
complex folding and human-like
N-glycosylation(Wong
et al. 2010; Werner, Kopp, and Schlueter 2007). In particular, CHO cell
factories with better and more controlled production of desired
N-glycosylation structures are of interest as the glycosylation pattern
has a major role in defining the efficacy of the produced protein. This
drives a need to study, understand and engineer the glycosylation
pattern produced by a given CHO cell line.
Recently developed CRISPR/Cas9 technologies facilitate precise genome
editing in CHO cells with significantly reduced cost and delivery time
(Ronda et al. 2014;
Cong et al. 2013). Thus, better guidance and tools are now available
for selecting and engineering different targets in CHO cells and enables
rational design and development of CHO cell lines with much faster pace
and higher throughput
(J.
Y. Kim, Kim, and Lee 2012; Kildegaard et al. 2013; Hackl et al. 2012;
Sanchez et al. 2014; North et al. 2010; Sealover et al. 2013;
Maszczak-Seneczko et al. 2011; Z. Yang et al. 2015). In tandem,
glycoengineering studies have shown promising results in improving and
controlling the quality of therapeutic
proteins(Amann et al. 2019)
. Even though CHO is considered a superior host for the production of
therapeutic proteins, it does not typically have all the reactions of
the human N-glycan pathway
(Zhang et al. 2016).
Among the targets for CHO cell engineering, microRNAs (miRNAs) have
emerged as a potent and promising target in the context of improving
cells for production of recombinant therapeutic proteins at the
phenotype level (Jadhav et
al. 2013). This is because miRNA is able to regulate complex gene
networks that control various cellular processes
(Lam et al. 2015), have
predictable targets in the genome, and introduce no translational burden
to the host cells. miRNAs are single-stranded small non-coding RNAs (19
to 25 nucleotides) that are found in a wide range of higher eukaryotes
(Mack 2007). They are able
to regulate the expressions of multiple genes at the
post-transcriptional level by completely or partially complementary
targeting 3’-UTR of the target mRNAs
(Hackl et al. 2012;
Jadhav et al. 2013). So far, miRNA’s have been identified to impact
cell cycle progression
(Jadhav et al. 2013),
apoptosis (Cimmino et al.
2005), metabolism
(Gao et al. 2009;
Csibi et al. 2013), cell proliferation
(He et al. 2007), various
cancer related pathways (such as the MAPK pathway
(Masliah-Planchon, Garinet,
and Pasmant 2016)) in mouse and humans, while miRNA studies in CHO
cells are still in an early stage
(Hackl
et al. 2011, 2012; Jadhav et al. 2014; Gammell et al. 2007; Clarke et
al. 2012; Strotbek et al. 2013; Hammond et al. 2012). However, as
miRNAs are highly conserved across species
(Pasquinelli et al. 2000),
knowledge of miRNAs from other mammals in association with various
phenotypic effects – especially findings from the highly prolific field
of human cancer biology – may be translated into CHO cells.
Furthermore, the role of miRNA in regards to N-glycan metabolism and
protein quality needs to be explored more. Additionally, the field of
N-glycosylation engineering in CHO cells has been extensively studied
and heavily patented (Z.
Yang et al. 2015), but almost no studies have investigated into miRNAs
that modulate N-glycosylation in CHO cells so far. Therefore, to fill
the gaps in miRNA engineering in CHO cells for industrial applications,
more efforts on stable microRNA engineering for improving CHO cell
phenotypes are needed.
In this study, we search for and discover miRNA candidates for stable
CHO cell engineering to achieve desired N-glycosylation structures. We
do this by identifying targets through a hybrid strategy of targeted
literature and database searching, couple with mRNA and miRNA
sequencing, target prediction and differential gene and miRNA expression
analysis. Based on this, we overexpress (OE), knock-down (KD) and knock
out (K/O) miRNA targets using CRISPR-based technologies and a
CHO-adapted miRNA expression system, finding several miRNAs with effects
on N-glycosylation.
Materials and Methods
CHO cell lines
Six CHO suspension cell lines were used: Three producer cell lines,
namely CHO-EPO (erythropoietin producer), CHO-CS13-1.00 (rituximab high
producer), and CHO-CS13-0.02 (rituximab low producer), and three
non-producer cell lines (CHO-K1, CHO-DG44, and CHO-S) were used. The
CS13-0.02 and CS13-1.00 cell lines were generated in a previous study by
insertion of chimeric antibody genes into a CHO DG44 parental cell line,
and then selected under different MTX concentrations (0.02 μM and 1.00
μM, respectively) (S. J. Kim
et al. 1998). DG44-EPO is a MTX amplified stable EPO producer which was
provided as a kind gift from Prof. Gyun Min Lee. CHO-K1 and CHO-DG44 are
serum-free suspension cell lines, which have been adapted in-house from
ATCC adherent CHO-K1 and CHO-DG44 cell lines. CHO-S was acquired from
Thermo Fisher Scientific, Waltham, MA, USA.
The growth medium for CS13-0.02, CS13-1.00 and DG44-EPO was PowerCHO-2
chemically defined, serum-free medium (Lonza, Switzerland), with 4 mM of
glutamine and 0.2% of anti-clumping agent (ThermoFisher Scientific)
added. CHO-DG44 cell line was grown in the same growth medium with a
10% HT supplement (10 mM of sodium hypoxanthine and 1.6 mM of
thymidine, ThermoFisher Scientific). The growth medium for CHO-K1 and
CHO-S was CD CHO-2 chemically defined, serum-free medium (Lonza, Visp,
Switzerland), with 8 mM of glutamine and 0.2% of anti-clumping agent
(ThermoFisher Scientific) added.
Bioreactor cultivations
All cells were maintained in duplicates in their corresponding growth
media in shake flasks at 37˚C, 5% CO2, 150 rpm prior to the batch
process in bioreactor.
During batch process, cell lines were grown in their corresponding
growth medium in duplicates in 2x 1.0 L Eppendorf DASGIP bioreactor
systems at 37°C and 150 rpm with an initial seeding density of
4×105 viable cells/mL and a working volume of 410 mL.
The pH and dissolved oxygen (DO) was monitored and controlled online
during the cultivation process. The pH was set to 7.2 with a deviation
of 0.02, where 2M sodium carbonate and CO2 gas flow was
used to adjust the pH. DO was kept at 50% of air saturation mixed of
air, O2 and CO2 with a gas flow of 0.6
L/h. The bioreactor cultivations were last for 7 days. Daily sampling
was carried out for monitoring cell growth (Nucleocounter NC-200,
ChemoMetec A/S, Allerød, Denmark), and extracellular metabolites
(BioProfile 400 chemistry analyzer, Nova Biomedical,Waltham,USA).
Samplings of cells for RNA analysis (transcriptomics and miRNAs) was
taken on day 4 of the cultivation.
RNA purification and next-generation
sequencing
RNA was extracted 5 × 107 cells using miRNeasy Mini
Kit (Qiagen, Germany) for total RNA preparation following the
manufacturer’s instructions. RNA integrity was evaluated and confirmed
using an Agilent 2100 Bioanalyzer with total RNA nano chips (Agilent
Technologies, Santa Clara, Ca, USA). RNA concentration was measured
using Qubit 2.0 (ThermoFisher Scientific). Multiplexed cDNA library and
miRNA library generation were carried out by BGI (China) using the same
total RNA samples by the TruSeq RNA Sample Preparation Kit v2 (Illumina,
Inc., San Diego, CA) and TruSeq Small RNA Sample Preparation Kit
(Illumina), respectively. Next-generation sequencing of both
transcriptome and miRNAs were performed by BGI using a Illumina Hiseq
4000 system for paired-end sequencing.
mRNA-seq quality control, alignment, and generation of
read
counts
The Chinese hamster genome published previously
(Lewis et al. 2013) was
downloaded from NCBI (accession number PRJNA239316). An initial quality
check was carried out using FastQC v0.11.5
(http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) in
non-interactive mode. Quality trimming of the poor Illumina reads was
performed based on the FAstQC evaluation using Trimmomatic v0.36
(Bolger, Lohse, and Usadel
2014), adopting a paired-end mode as a sequencing mode. Alignment of
all trimmed paired-end reads to the Chinese hamster genome, was achieved
using a STAR v2.5.2b aligner
(Dobin et al. 2013),
choosing genomeGenerate mode as a runMode to generates genome indices
using a genome FASTA file and GTF files which are then used to align the
mRNA reads. Files obtained after alignment were converted into a binary
version using SAMtools (Li
et al. 2009). To calculate read counts in the form of logCPM, all of
the aligned reads as well as a GFF file of the Chinese hamster genome
were fed into the HTSeq v0.8.0
(Anders, Pyl, and Huber
2015) package under intersection strict mode. Output from the HTSeq
package were loaded into RStudio v0.99.1261 (https://www.rstudio.com/),
with the integrated development environment running on R v3.3.3
(R Development Core Team
2003) for performing further downstream analysis.
Gene expression analysis (contrast matrix + DE
analysis)
Through an examination of the logCPM
values, we found that a large proportion of the genes within individual
samples were unexpressed. Hence, we chose to enforce a cutoff of CPM of
1 or above in at least two samples
(Law et al. 2016) as it
distinguished expressed genes from unexpressed genes for most of the
dataset (Supplementary Table 1). Next, the genes were normalized by
using TMM algorithms
(Law et al. 2016;
Robinson and Oshlack 2010) and the calcNormFcator function of
edgeR (McCarthy, Chen, and
Smyth 2012). Subsequently, to establish the mean variance relationship
in the data to calculate weights based on the calculated mean variance,
we used the relationship function voom(Law et al. 2014) from thelimma (Smyth, n.d.)package. Transformation of the RNA-Seq data for linear modeling was
performed using the lmFit function of limma . Consequently,
differentially expressed (DE) genes were determined with thelimma lmFit function. Finally, differential expression
analysis was performed using an empirical Bayes model (eBayes ).
Moderated t -statistics and the log-odds ratio was calculated
using a design matrix and a contrast matrix. Summaries of the gene sets
were laid out by various functions such as toptable andvolcanoplot (Su et al.
2017) to finally determine the highly differentially expressed genes
in each of the contrasts.
miRNA expression analysis
The quality of fastq files was evaluated using FASTQC version 0.11.2.
Sequencing adapters were trimmed using cutadapt version 1.7.1. miRNAs
were search against miRBase (Release 21) for Cricetulus griseus ,Rattus norvegicus , Homo sapiens and Mus musculususing CLC genomic workbench (version 7.5) with standard settings
regarding mature length variants in order to obtaining the read counts
of identified microRNAs. All of the gene-specific predicted miRNA
targets of human and mouse were fetched from the miRWalk 2.0 database
(Dweep, Gretz, and Sticht
2014). Gene and official gene symbols were kept as an input identifier
and 3’ UTR as an input parameter.
Correlation analysis
Unsupervised correlation analysis between expression levels of gene and
miRNA was performed by comparing the expression value of genes and
counts all the predicted miRNA. Pearson and Spearman correlation was
performed using the cor function of R.
Generation of stable miRNA knock-out (KO) and knock down
(KD) cell
lines
500 bp upstream and 500 bp downstream sequences of target miRNAs were
selected from the CHO genome sequences to identify sgRNA targeting sites
and design sgRNAs using the ZiFiT tool
(www.zifit.partners.org/ZiFiT/).
The top sgRNA hits against the upstream and downstream sequences with
the lowest off-target events were selected (Supplementary Table 2). The
selected sgRNA was cloned into the px458 plasmid co-expressing eGFP
(Addgene) by restriction with Bbs1 and conventional ligation. For each
miRNA target, one plasmid with the upstream sgRNA (up-sgRNA) and one
plasmid with the downstream sgRNA (dw-sgRNA) were generated. All
plasmids were verified by sequencing and purified using NucleoBond Xtra
Midi EF (Macherey-Nagel, Düren, Germany), according to manufacturer’s
instructions.
miRNA KD/KO was performed in CHO-S using CRISPR/Cas9. CHO-S cells were
maintained as suspension culture in CD CHO2 medium (Lonza) with 8 mM of
glutamine (Thermo Fisher Scientific) in 250ml shake flask with 120 rpm
(Infors, USA) at 37 °C and 5% CO2. Cells were seeded at 0.5 ×
106 cells/ml in 6 well plate (NUNC, Denmark) 1 day
before transfection with 3 mL cell culture per well. Transfection of
CHO-S cells were carried out by using 3.8 μg of total plasmid DNA (50 %
up-sgRNA + 50 % dw-sgRNA) together with FreeStyleTM MAX reagent as
described previously (Grav
et al. 2015). Transfection with 3.8 µg pmaxGFP® vector without sgRNAs
(Lonza, Basel, Switzerland) was applied as control for determining
transfection efficiencies. Transfection with 3.8 µg px458 plasmid was
carried out for generation of mock cell lines. Two days after
transfection, cells were subject to FACS in order to perform single cell
sorting of fluorescent-positive cells as reported
(Grav et al. 2015). 14 days
after single cell sorting, the colonies were expanded into 96-well
flat-bottom plates (BD Biosciences). The wells with confluent cells were
split, and replicated plates were subject to PCR analysis of genomic DNA
when close to confluent. Genomic DNA was extracted from the cell pellets
using QuickExtract DNA extraction solution (Epicentre Illumina, WI)
(J. S. Lee et al. 2015).
PCR was performed as described in
(J. S. Lee et al. 2015) in
order to select colonies that contains KO/KD of the targeted miRNA. PCR
primers used for verifying miRNA KO/KD were designed to amplify from
about 150 bp upstream of the up-sgRNA targeting sites and about 150 bp
downstream of the dw-sgRNA targeting sites (Supplementary Table 3).
Selected KO/KD cell lines were further expanded, tested, and found free
of mycoplasma prior to storing at -180°C.
Generation stable miRNA overexpression cell
pools
Seven miRNA overexpression (OE) plasmids were designed as described
previously with minor modifications
(Klanert et al. 2014). Each
target sequence contains eGFP sequences linked with the stem-loop
sequence of a miRNA with 50 bp upstream and 50 bp downstream flanking
sequence from the CHO genome. Each target sequence (Supplementary table
4) was synthesized at GenScript (China), and cloned into HindIII and
BamHI restriction sites of pcDNA3.1/Hygro(+) plasmid (Addgene) to
generate seven miRNA OE plasmids. All plasmids were purified using
NucleoBond Xtra Midi EF (Macherey-Nagel), according to manufacturer’s
instructions.
CHO-S cells were maintained in 6 well plates (as described above for the
miRNA KD/KO procedure) prior to transfection. Transfections with each OE
plasmid DNA was done in the 6 well plates using
FreeStyleTM MAX reagent. Negative controls and mock
cell pools were generated by transfecting cells with nucleotide-free
water and pcDNA3.1/Hygro(+) based plasmid with a target sequence
containing the eGFP sequence but no regulatory miRNA sequence,
respectively. Transfections were carried out in duplicates with each
plasmid. All cell cultures were kept in 6 well plates at 37°C, 5% CO2
after transfection. Transfection efficiency is determined by relative
percentage of GFP cells in each culture comparing to the negative
control 24 hours post transfection using Celigo Imaging Cell Cytometer
(Nexcelom Bioscience, Lawrence, MA).
Two days after transfection, the transfected cells were transferred to a
125mL shake flask and were selected in 10mL of selection medium (CD CHO
medium with 8 mM Gln, 0.2% AcA and 400µg/mL hygromycin B). During
selection, cell culture was maintained at 37°C with 5% CO2 at 120 rpm,
and medium was changed every 3–4 days. Cell growth was monitored using
the NucleoCounter NC-200 every 2-3 days. After 14 days of selection,
stable OE cell pools were generated when viability of each miRNA OE cell
culture and mock cell culture are all above 94%, while viability of
negative control cell culture is 0%. All stable cell pools were further
expanded, tested, and found free of mycoplasma prior storing in the cell
bank at -180°C.
Characterization of miRNA KO/KD cell lines or OE cell
pools in batch culture
For each KO/KD or OE, two representative engineering cell lines/pools
were selected and further evaluated with a 6-day 50mL batch culture in
CD CHO-2 medium with 8 mM of glutamine and 0.2% of anti clumping agent,
at 37˚C, 150 rpm, 5% CO2 in duplicated 250 mL shake flasks. Daily
sampling was carried out for monitoring cell growth (Nucleocounter
NC-200), and extracellular metabolites (BioProfile 400 chemistry
analyzer). 15 mL supernatant were harvested on day 4, and 10x
concentrated using Amicon Ultra-15 (Merck, Darmstadt, Germany) for
N-Glycoprofiling of secretome.
Quantitation of mature
miRNA
5x106 cells were sampled for quantitation of mature
miRNA on day 2 of the cultivation. RNA extraction was performed using a
Trizol-based method as reported by
(Klanert et al. 2014), and
quantitation of mature miRNA were performed by quantitative real-time
PCR (RT-qPCR) using miScript® SYBR® Green PCR Kit together with miScript
Primer Assays (Supplementary Table 5). Synthesis was carried out using
5x miScript HiFlex Buffer and miScript Reverse Transcriptase Mix
according to manufacturer’s instructions. Reaction mixtures contained
12.5 µL 2x QuantiTect SYBR Green PCR Master Mix, 2.5 µL 10x miScript
Universal Primers, 2.5 µL 10x miScript Primer Assay, 5 µL RNase-free
water and 2.5 µL cDNA template. Amplification was executed with the
following conditions: 95˚C for 15 min; 40×: 94˚C for 15 s, 55˚C for 30 s
and 70˚C for 30s. Each PCR reaction had 4 replicates. Primer specificity
was verified by a melting curve analysis of the PCR products with a
temperature gradient of 0.2˚C/s from 65˚C to 95˚C. The expression levels
of each mature miRNA relative to a house-keeping miRNA (cgr-miR-185-5p)
were determined using the 2–ΔΔCTmethod(Klanert et al.
2014).
Quantitation of target gene
expression
Quantitative real-time PCR (RT-qPCR) was performed on these genes using
miScript® SYBR® Green PCR Kit (Integrated DNA Technologies, Coralville,
IA) together with corresponding primers (Supplementary Table 6),
according to manufacturer’s instructions. Each primer pairs were passed
linearity check (r2 >0.98) using
4-fold serial dilutions of cDNA samples over 5 grades as well as
amplification efficiency check (90% and 110%). Amplification was
executed with the following conditions: 95˚C for 15 min; 40×: 94˚C for
15 s, 52˚C for 30 s and 70˚C for 30s. Each PCR reaction had 4
replicates, and every PCR plate included template controls. Primer
specificity was verified by a melting curve analysis. GAPDH was used as
a housekeeping gene control and the relative fold change of the gene
expression was calculated by the 2–ΔΔCT method
(Klanert et al. 2014).
N-Glycoprofiling of secreted
proteins
Sample preparation for N-glycoprofiling was carried out using GlycoWorks
RapiFluor-MS N-Glycan Kit (Waters, Milford, MA) according to the
manufacturer’s instructions. Labeled N-Glycans were analyzed by a LC-MS
system using a Thermo Ultimate 3000 HPLC with fluorescence detector
coupled to a Thermo Velos Pro Iontrap MS, as described previously
(Grav et al. 2015).
Relative amounts of the glycans was measured by integrating the areas
under the normalized fluorescence spectrum peaks with Thermo Xcalibur
software (Thermo Fisher Scientific, Waltham, MA).
Results
Identification of miRNAs in the Chinese hamster
genome
Our first step was to identify as many miRNAs in the CHO genome as
possible. However, current annotation of miRNAs in CHO genome is very
much incomplete, and so, we decided to do de novo prediction,
partially based on sequence analysis (described here) and partially on
miRNA sequencing (See below). First we examined the Chinese hamster,
human, and mouse genomes for predicted and identified miRNAs. In total,
656 miRNAs were found to align to the Chinese hamster, human and/or
mouse genome. Of these, 277 were uniquely (sequence unique) mapping to
the hamster genome, 353 to human. Further, 4 miRNAs were identical in
human, mouse and hamster, 4 in human and hamster, 6 miRNA in human and
mouse, and 12 in mouse and hamster. Out of the 656 miRNAs, 489 were
predicted with the help of miRWalk to regulate one or more reactions of
the N-glycan pathway (Supplementary Table 1).
Identification of genes in CHO protein N-glycosylation
First, we wanted to identify CHO genes in the N-glycosylation pathway as
well as miRNAs putatively targeting these genes. To this end, a list of
75 genes assigned to the N-glycosylation pathway was collected from the
KEGG pathway database
(Kanehisa et al. 2019)). To
further sophisticate this list, genes were grouped in categories such as
mannosylation, galactosylation, fucosylation etc. based on literature
mining. Furthermore, genes were classified on the basis of whether up-
or down-regulation would be desirable in order to have mature
N-glycosylation (Supplementary Table 7).
Generation of mRNA and miRNA expression
profiles
In order to experimentally determine the expression patterns of CHO
miRNAs and to be able to correlate these to gene expression patterns for
genes of interest, we set up a mRNA/miRNA profiling experiment. We
selected a panel of three non-producing cell lines and three
protein-producing cell lines, and cultivated them in biological
duplicates in batch bioreactors. CHO-S, CHO DG44 and CHO-K1 are all
non-expressing cell lines; CS13-0.03 is an IgG low-producer, CS13-1.00
is an IgG high-producer, and DG44-EPO produces Erythropoietin (EPO).
Culture performance of the cell lines that are used for transcriptomics
and miRNAseq analysis are shown in Figure 1, all details are available
in Supplementary Data File 1. As can be seen, replicates are consistent,
but there are differences between the lines. CHO-S and DG44-EPO grow
faster and grow to higher VCD than other cell lines. Glucose was
depleted around day 5 for all cell lines except the DG44 cell line.
CHO-K1 produce the most lactate and consume the most glucose during the
growth phase (day 0-4) compared to other cell lines, indicating
inefficient carbon flow from glycolysis to support cell growth. CHO-K1
and CHO-S also show relatively high glutamine consumption levels and
high NH4+ production levels along the cell culture
comparing to other cell lines. This may be due to their growth medium
used is different from other the cell lines.
RNA samples were taken on day 4, purified, and sent for mRNA and miRNA
sequencing.