10Sahlgrenska Cancer Center, Zhuhai, Guangdong, China
*Shared first authors. §Corresponding author.

Abstract

CHO cell lines are a workhorse for the production of pharmaceutical proteins, but show some limitations in the variability and stability of N-glycosylation profiles. One promising approach to addressing this at the required systems-level is miRNA, which can regulate a large number of genes and have predictable targets. Herein, we first identifiedde novo 656 potential miRNAs in the CHO genome based on a combination of literature, database searching, and miRNA sequencing. We further sequenced mRNA from the same cultures, and used a combination of mRNA-miRNA correlation analysis, target prediction and literature searches to find miRNAs potentially targeting N-glycosylation. Our ten best miRNA candidates were subjected to miRNA overexpression, knockdown, or knock-out in CHO cell lines. Out of the ten candidates, four (miR-128, miR-34c, miR-30b, and miR-449a) showed positive effects on N-glycosylation and could be applied directly for CHO cell engineering. The fact that 40% of the screened targets had a desired effect, and the prediction of 656 miRNAs illustrates the massive potential of miRNA engineering in CHO.

Key words:

CHO cell, Stable microRNA engineering, Crispr/Cas9, N-glycosylation, Differential equation, Correlation analysis, miRNA.

Introduction

Recombinant therapeutic proteins provide innovative and novel medical treatment possibilities for many difficult-to-treat diseases. Sales have been estimated to be 140 billion USD in 2013 (Walsh 2014) and are continually growing. Chinese hamster ovary (CHO) cells have, as the most predominant expression platforms for recombinant therapeutic proteins, for several decades shown high adaptability and robustness in industrial scale applications and great capability of producing proteins with complex folding and human-like N-glycosylation(Wong et al. 2010; Werner, Kopp, and Schlueter 2007). In particular, CHO cell factories with better and more controlled production of desired N-glycosylation structures are of interest as the glycosylation pattern has a major role in defining the efficacy of the produced protein. This drives a need to study, understand and engineer the glycosylation pattern produced by a given CHO cell line.
Recently developed CRISPR/Cas9 technologies facilitate precise genome editing in CHO cells with significantly reduced cost and delivery time (Ronda et al. 2014; Cong et al. 2013). Thus, better guidance and tools are now available for selecting and engineering different targets in CHO cells and enables rational design and development of CHO cell lines with much faster pace and higher throughput (J. Y. Kim, Kim, and Lee 2012; Kildegaard et al. 2013; Hackl et al. 2012; Sanchez et al. 2014; North et al. 2010; Sealover et al. 2013; Maszczak-Seneczko et al. 2011; Z. Yang et al. 2015). In tandem, glycoengineering studies have shown promising results in improving and controlling the quality of therapeutic proteins(Amann et al. 2019) . Even though CHO is considered a superior host for the production of therapeutic proteins, it does not typically have all the reactions of the human N-glycan pathway (Zhang et al. 2016).
Among the targets for CHO cell engineering, microRNAs (miRNAs) have emerged as a potent and promising target in the context of improving cells for production of recombinant therapeutic proteins at the phenotype level (Jadhav et al. 2013). This is because miRNA is able to regulate complex gene networks that control various cellular processes (Lam et al. 2015), have predictable targets in the genome, and introduce no translational burden to the host cells. miRNAs are single-stranded small non-coding RNAs (19 to 25 nucleotides) that are found in a wide range of higher eukaryotes (Mack 2007). They are able to regulate the expressions of multiple genes at the post-transcriptional level by completely or partially complementary targeting 3’-UTR of the target mRNAs (Hackl et al. 2012; Jadhav et al. 2013). So far, miRNA’s have been identified to impact cell cycle progression (Jadhav et al. 2013), apoptosis (Cimmino et al. 2005), metabolism (Gao et al. 2009; Csibi et al. 2013), cell proliferation (He et al. 2007), various cancer related pathways (such as the MAPK pathway (Masliah-Planchon, Garinet, and Pasmant 2016)) in mouse and humans, while miRNA studies in CHO cells are still in an early stage (Hackl et al. 2011, 2012; Jadhav et al. 2014; Gammell et al. 2007; Clarke et al. 2012; Strotbek et al. 2013; Hammond et al. 2012). However, as miRNAs are highly conserved across species (Pasquinelli et al. 2000), knowledge of miRNAs from other mammals in association with various phenotypic effects – especially findings from the highly prolific field of human cancer biology – may be translated into CHO cells. Furthermore, the role of miRNA in regards to N-glycan metabolism and protein quality needs to be explored more. Additionally, the field of N-glycosylation engineering in CHO cells has been extensively studied and heavily patented (Z. Yang et al. 2015), but almost no studies have investigated into miRNAs that modulate N-glycosylation in CHO cells so far. Therefore, to fill the gaps in miRNA engineering in CHO cells for industrial applications, more efforts on stable microRNA engineering for improving CHO cell phenotypes are needed.
In this study, we search for and discover miRNA candidates for stable CHO cell engineering to achieve desired N-glycosylation structures. We do this by identifying targets through a hybrid strategy of targeted literature and database searching, couple with mRNA and miRNA sequencing, target prediction and differential gene and miRNA expression analysis. Based on this, we overexpress (OE), knock-down (KD) and knock out (K/O) miRNA targets using CRISPR-based technologies and a CHO-adapted miRNA expression system, finding several miRNAs with effects on N-glycosylation.

Materials and Methods

CHO cell lines

Six CHO suspension cell lines were used: Three producer cell lines, namely CHO-EPO (erythropoietin producer), CHO-CS13-1.00 (rituximab high producer), and CHO-CS13-0.02 (rituximab low producer), and three non-producer cell lines (CHO-K1, CHO-DG44, and CHO-S) were used. The CS13-0.02 and CS13-1.00 cell lines were generated in a previous study by insertion of chimeric antibody genes into a CHO DG44 parental cell line, and then selected under different MTX concentrations (0.02 μM and 1.00 μM, respectively) (S. J. Kim et al. 1998). DG44-EPO is a MTX amplified stable EPO producer which was provided as a kind gift from Prof. Gyun Min Lee. CHO-K1 and CHO-DG44 are serum-free suspension cell lines, which have been adapted in-house from ATCC adherent CHO-K1 and CHO-DG44 cell lines. CHO-S was acquired from Thermo Fisher Scientific, Waltham, MA, USA.
The growth medium for CS13-0.02, CS13-1.00 and DG44-EPO was PowerCHO-2 chemically defined, serum-free medium (Lonza, Switzerland), with 4 mM of glutamine and 0.2% of anti-clumping agent (ThermoFisher Scientific) added. CHO-DG44 cell line was grown in the same growth medium with a 10% HT supplement (10 mM of sodium hypoxanthine and 1.6 mM of thymidine, ThermoFisher Scientific). The growth medium for CHO-K1 and CHO-S was CD CHO-2 chemically defined, serum-free medium (Lonza, Visp, Switzerland), with 8 mM of glutamine and 0.2% of anti-clumping agent (ThermoFisher Scientific) added.

Bioreactor cultivations

All cells were maintained in duplicates in their corresponding growth media in shake flasks at 37˚C, 5% CO2, 150 rpm prior to the batch process in bioreactor.
During batch process, cell lines were grown in their corresponding growth medium in duplicates in 2x 1.0 L Eppendorf DASGIP bioreactor systems at 37°C and 150 rpm with an initial seeding density of 4×105 viable cells/mL and a working volume of 410 mL. The pH and dissolved oxygen (DO) was monitored and controlled online during the cultivation process. The pH was set to 7.2 with a deviation of 0.02, where 2M sodium carbonate and CO2 gas flow was used to adjust the pH. DO was kept at 50% of air saturation mixed of air, O2 and CO2 with a gas flow of 0.6 L/h. The bioreactor cultivations were last for 7 days. Daily sampling was carried out for monitoring cell growth (Nucleocounter NC-200, ChemoMetec A/S, Allerød, Denmark), and extracellular metabolites (BioProfile 400 chemistry analyzer, Nova Biomedical,Waltham,USA). Samplings of cells for RNA analysis (transcriptomics and miRNAs) was taken on day 4 of the cultivation.

RNA purification and next-generation sequencing

RNA was extracted 5 × 107 cells using miRNeasy Mini Kit (Qiagen, Germany) for total RNA preparation following the manufacturer’s instructions. RNA integrity was evaluated and confirmed using an Agilent 2100 Bioanalyzer with total RNA nano chips (Agilent Technologies, Santa Clara, Ca, USA). RNA concentration was measured using Qubit 2.0 (ThermoFisher Scientific). Multiplexed cDNA library and miRNA library generation were carried out by BGI (China) using the same total RNA samples by the TruSeq RNA Sample Preparation Kit v2 (Illumina, Inc., San Diego, CA) and TruSeq Small RNA Sample Preparation Kit (Illumina), respectively. Next-generation sequencing of both transcriptome and miRNAs were performed by BGI using a Illumina Hiseq 4000 system for paired-end sequencing.

mRNA-seq quality control, alignment, and generation of read counts

The Chinese hamster genome published previously (Lewis et al. 2013) was downloaded from NCBI (accession number PRJNA239316). An initial quality check was carried out using FastQC v0.11.5 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) in non-interactive mode. Quality trimming of the poor Illumina reads was performed based on the FAstQC evaluation using Trimmomatic v0.36 (Bolger, Lohse, and Usadel 2014), adopting a paired-end mode as a sequencing mode. Alignment of all trimmed paired-end reads to the Chinese hamster genome, was achieved using a STAR v2.5.2b aligner (Dobin et al. 2013), choosing genomeGenerate mode as a runMode to generates genome indices using a genome FASTA file and GTF files which are then used to align the mRNA reads. Files obtained after alignment were converted into a binary version using SAMtools (Li et al. 2009). To calculate read counts in the form of logCPM, all of the aligned reads as well as a GFF file of the Chinese hamster genome were fed into the HTSeq v0.8.0 (Anders, Pyl, and Huber 2015) package under intersection strict mode. Output from the HTSeq package were loaded into RStudio v0.99.1261 (https://www.rstudio.com/), with the integrated development environment running on R v3.3.3 (R Development Core Team 2003) for performing further downstream analysis.

Gene expression analysis (contrast matrix + DE analysis)

Through an examination of the logCPM values, we found that a large proportion of the genes within individual samples were unexpressed. Hence, we chose to enforce a cutoff of CPM of 1 or above in at least two samples (Law et al. 2016) as it distinguished expressed genes from unexpressed genes for most of the dataset (Supplementary Table 1). Next, the genes were normalized by using TMM algorithms (Law et al. 2016; Robinson and Oshlack 2010) and the calcNormFcator function of edgeR (McCarthy, Chen, and Smyth 2012). Subsequently, to establish the mean variance relationship in the data to calculate weights based on the calculated mean variance, we used the relationship function voom(Law et al. 2014) from thelimma (Smyth, n.d.)package. Transformation of the RNA-Seq data for linear modeling was performed using the lmFit function of limma . Consequently, differentially expressed (DE) genes were determined with thelimma lmFit function. Finally, differential expression analysis was performed using an empirical Bayes model (eBayes ). Moderated t -statistics and the log-odds ratio was calculated using a design matrix and a contrast matrix. Summaries of the gene sets were laid out by various functions such as toptable andvolcanoplot (Su et al. 2017) to finally determine the highly differentially expressed genes in each of the contrasts.

miRNA expression analysis

The quality of fastq files was evaluated using FASTQC version 0.11.2. Sequencing adapters were trimmed using cutadapt version 1.7.1. miRNAs were search against miRBase (Release 21) for Cricetulus griseus ,Rattus norvegicus , Homo sapiens and Mus musculususing CLC genomic workbench (version 7.5) with standard settings regarding mature length variants in order to obtaining the read counts of identified microRNAs. All of the gene-specific predicted miRNA targets of human and mouse were fetched from the miRWalk 2.0 database (Dweep, Gretz, and Sticht 2014). Gene and official gene symbols were kept as an input identifier and 3’ UTR as an input parameter.

Correlation analysis

Unsupervised correlation analysis between expression levels of gene and miRNA was performed by comparing the expression value of genes and counts all the predicted miRNA. Pearson and Spearman correlation was performed using the cor function of R.

Generation of stable miRNA knock-out (KO) and knock down (KD) cell lines

500 bp upstream and 500 bp downstream sequences of target miRNAs were selected from the CHO genome sequences to identify sgRNA targeting sites and design sgRNAs using the ZiFiT tool (www.zifit.partners.org/ZiFiT/). The top sgRNA hits against the upstream and downstream sequences with the lowest off-target events were selected (Supplementary Table 2). The selected sgRNA was cloned into the px458 plasmid co-expressing eGFP (Addgene) by restriction with Bbs1 and conventional ligation. For each miRNA target, one plasmid with the upstream sgRNA (up-sgRNA) and one plasmid with the downstream sgRNA (dw-sgRNA) were generated. All plasmids were verified by sequencing and purified using NucleoBond Xtra Midi EF (Macherey-Nagel, Düren, Germany), according to manufacturer’s instructions.
miRNA KD/KO was performed in CHO-S using CRISPR/Cas9. CHO-S cells were maintained as suspension culture in CD CHO2 medium (Lonza) with 8 mM of glutamine (Thermo Fisher Scientific) in 250ml shake flask with 120 rpm (Infors, USA) at 37 °C and 5% CO2. Cells were seeded at 0.5 × 106 cells/ml in 6 well plate (NUNC, Denmark) 1 day before transfection with 3 mL cell culture per well. Transfection of CHO-S cells were carried out by using 3.8 μg of total plasmid DNA (50 % up-sgRNA + 50 % dw-sgRNA) together with FreeStyleTM MAX reagent as described previously (Grav et al. 2015). Transfection with 3.8 µg pmaxGFP® vector without sgRNAs (Lonza, Basel, Switzerland) was applied as control for determining transfection efficiencies. Transfection with 3.8 µg px458 plasmid was carried out for generation of mock cell lines. Two days after transfection, cells were subject to FACS in order to perform single cell sorting of fluorescent-positive cells as reported (Grav et al. 2015). 14 days after single cell sorting, the colonies were expanded into 96-well flat-bottom plates (BD Biosciences). The wells with confluent cells were split, and replicated plates were subject to PCR analysis of genomic DNA when close to confluent. Genomic DNA was extracted from the cell pellets using QuickExtract DNA extraction solution (Epicentre Illumina, WI) (J. S. Lee et al. 2015). PCR was performed as described in (J. S. Lee et al. 2015) in order to select colonies that contains KO/KD of the targeted miRNA. PCR primers used for verifying miRNA KO/KD were designed to amplify from about 150 bp upstream of the up-sgRNA targeting sites and about 150 bp downstream of the dw-sgRNA targeting sites (Supplementary Table 3). Selected KO/KD cell lines were further expanded, tested, and found free of mycoplasma prior to storing at -180°C.

Generation stable miRNA overexpression cell pools

Seven miRNA overexpression (OE) plasmids were designed as described previously with minor modifications (Klanert et al. 2014). Each target sequence contains eGFP sequences linked with the stem-loop sequence of a miRNA with 50 bp upstream and 50 bp downstream flanking sequence from the CHO genome. Each target sequence (Supplementary table 4) was synthesized at GenScript‎ (China), and cloned into HindIII and BamHI restriction sites of pcDNA3.1/Hygro(+) plasmid (Addgene) to generate seven miRNA OE plasmids. All plasmids were purified using NucleoBond Xtra Midi EF (Macherey-Nagel), according to manufacturer’s instructions.
CHO-S cells were maintained in 6 well plates (as described above for the miRNA KD/KO procedure) prior to transfection. Transfections with each OE plasmid DNA was done in the 6 well plates using FreeStyleTM MAX reagent. Negative controls and mock cell pools were generated by transfecting cells with nucleotide-free water and pcDNA3.1/Hygro(+) based plasmid with a target sequence containing the eGFP sequence but no regulatory miRNA sequence, respectively. Transfections were carried out in duplicates with each plasmid. All cell cultures were kept in 6 well plates at 37°C, 5% CO2 after transfection. Transfection efficiency is determined by relative percentage of GFP cells in each culture comparing to the negative control 24 hours post transfection using Celigo Imaging Cell Cytometer (Nexcelom Bioscience, Lawrence, MA).
Two days after transfection, the transfected cells were transferred to a 125mL shake flask and were selected in 10mL of selection medium (CD CHO medium with 8 mM Gln, 0.2% AcA and 400µg/mL hygromycin B). During selection, cell culture was maintained at 37°C with 5% CO2 at 120 rpm, and medium was changed every 3–4 days. Cell growth was monitored using the NucleoCounter NC-200 every 2-3 days. After 14 days of selection, stable OE cell pools were generated when viability of each miRNA OE cell culture and mock cell culture are all above 94%, while viability of negative control cell culture is 0%. All stable cell pools were further expanded, tested, and found free of mycoplasma prior storing in the cell bank at -180°C.

Characterization of miRNA KO/KD cell lines or OE cell pools in batch culture

For each KO/KD or OE, two representative engineering cell lines/pools were selected and further evaluated with a 6-day 50mL batch culture in CD CHO-2 medium with 8 mM of glutamine and 0.2% of anti clumping agent, at 37˚C, 150 rpm, 5% CO2 in duplicated 250 mL shake flasks. Daily sampling was carried out for monitoring cell growth (Nucleocounter NC-200), and extracellular metabolites (BioProfile 400 chemistry analyzer). 15 mL supernatant were harvested on day 4, and 10x concentrated using Amicon Ultra-15 (Merck, Darmstadt, Germany) for N-Glycoprofiling of secretome.

Quantitation of mature miRNA

5x106 cells were sampled for quantitation of mature miRNA on day 2 of the cultivation. RNA extraction was performed using a Trizol-based method as reported by (Klanert et al. 2014), and quantitation of mature miRNA were performed by quantitative real-time PCR (RT-qPCR) using miScript® SYBR® Green PCR Kit together with miScript Primer Assays (Supplementary Table 5). Synthesis was carried out using 5x miScript HiFlex Buffer and miScript Reverse Transcriptase Mix according to manufacturer’s instructions. Reaction mixtures contained 12.5 µL 2x QuantiTect SYBR Green PCR Master Mix, 2.5 µL 10x miScript Universal Primers, 2.5 µL 10x miScript Primer Assay, 5 µL RNase-free water and 2.5 µL cDNA template. Amplification was executed with the following conditions: 95˚C for 15 min; 40×: 94˚C for 15 s, 55˚C for 30 s and 70˚C for 30s. Each PCR reaction had 4 replicates. Primer specificity was verified by a melting curve analysis of the PCR products with a temperature gradient of 0.2˚C/s from 65˚C to 95˚C. The expression levels of each mature miRNA relative to a house-keeping miRNA (cgr-miR-185-5p) were determined using the 2–ΔΔCTmethod(Klanert et al. 2014).

Quantitation of target gene expression

Quantitative real-time PCR (RT-qPCR) was performed on these genes using miScript® SYBR® Green PCR Kit (Integrated DNA Technologies, Coralville, IA) together with corresponding primers (Supplementary Table 6), according to manufacturer’s instructions. Each primer pairs were passed linearity check (r2 >0.98) using 4-fold serial dilutions of cDNA samples over 5 grades as well as amplification efficiency check (90% and 110%). Amplification was executed with the following conditions: 95˚C for 15 min; 40×: 94˚C for 15 s, 52˚C for 30 s and 70˚C for 30s. Each PCR reaction had 4 replicates, and every PCR plate included template controls. Primer specificity was verified by a melting curve analysis. GAPDH was used as a housekeeping gene control and the relative fold change of the gene expression was calculated by the 2–ΔΔCT method (Klanert et al. 2014).

N-Glycoprofiling of secreted proteins

Sample preparation for N-glycoprofiling was carried out using GlycoWorks RapiFluor-MS N-Glycan Kit (Waters, Milford, MA) according to the manufacturer’s instructions. Labeled N-Glycans were analyzed by a LC-MS system using a Thermo Ultimate 3000 HPLC with fluorescence detector coupled to a Thermo Velos Pro Iontrap MS, as described previously (Grav et al. 2015). Relative amounts of the glycans was measured by integrating the areas under the normalized fluorescence spectrum peaks with Thermo Xcalibur software (Thermo Fisher Scientific, Waltham, MA).

Results

Identification of miRNAs in the Chinese hamster genome

Our first step was to identify as many miRNAs in the CHO genome as possible. However, current annotation of miRNAs in CHO genome is very much incomplete, and so, we decided to do de novo prediction, partially based on sequence analysis (described here) and partially on miRNA sequencing (See below). First we examined the Chinese hamster, human, and mouse genomes for predicted and identified miRNAs. In total, 656 miRNAs were found to align to the Chinese hamster, human and/or mouse genome. Of these, 277 were uniquely (sequence unique) mapping to the hamster genome, 353 to human. Further, 4 miRNAs were identical in human, mouse and hamster, 4 in human and hamster, 6 miRNA in human and mouse, and 12 in mouse and hamster. Out of the 656 miRNAs, 489 were predicted with the help of miRWalk to regulate one or more reactions of the N-glycan pathway (Supplementary Table 1).

Identification of genes in CHO protein N-glycosylation

First, we wanted to identify CHO genes in the N-glycosylation pathway as well as miRNAs putatively targeting these genes. To this end, a list of 75 genes assigned to the N-glycosylation pathway was collected from the KEGG pathway database (Kanehisa et al. 2019)). To further sophisticate this list, genes were grouped in categories such as mannosylation, galactosylation, fucosylation etc. based on literature mining. Furthermore, genes were classified on the basis of whether up- or down-regulation would be desirable in order to have mature N-glycosylation (Supplementary Table 7).

Generation of mRNA and miRNA expression profiles

In order to experimentally determine the expression patterns of CHO miRNAs and to be able to correlate these to gene expression patterns for genes of interest, we set up a mRNA/miRNA profiling experiment. We selected a panel of three non-producing cell lines and three protein-producing cell lines, and cultivated them in biological duplicates in batch bioreactors. CHO-S, CHO DG44 and CHO-K1 are all non-expressing cell lines; CS13-0.03 is an IgG low-producer, CS13-1.00 is an IgG high-producer, and DG44-EPO produces Erythropoietin (EPO). Culture performance of the cell lines that are used for transcriptomics and miRNAseq analysis are shown in Figure 1, all details are available in Supplementary Data File 1. As can be seen, replicates are consistent, but there are differences between the lines. CHO-S and DG44-EPO grow faster and grow to higher VCD than other cell lines. Glucose was depleted around day 5 for all cell lines except the DG44 cell line. CHO-K1 produce the most lactate and consume the most glucose during the growth phase (day 0-4) compared to other cell lines, indicating inefficient carbon flow from glycolysis to support cell growth. CHO-K1 and CHO-S also show relatively high glutamine consumption levels and high NH4+ production levels along the cell culture comparing to other cell lines. This may be due to their growth medium used is different from other the cell lines.
RNA samples were taken on day 4, purified, and sent for mRNA and miRNA sequencing.