Figure 1 - Comparison of 6 cell lines used for RNAseq in
bioreactor batch culture in duplicates. All RNA samples were taken on
day 4 of cultivation. Error bars are showing standard deviation of
duplicate culture.
RNA-seq data quality control, alignment, and
normalization
All reads were subjected to quality control and trimming. Trimmed mRNA
and miRNA reads were aligned to the Chinese hamster genome and read
counts (logCPM) were calculated. Post-alignment, data was normalized for
comparability between samples. To evaluate the normalization, library
sizes of the sample sets were compared before and after normalization,
providing adjusted library size pre- and post-normalization. As no
problems with the aligned and normalized data were apparent, this was
used for further analysis. Data on the alignment counts are available
for the glycosylation genes in Supplementary Table 8, and for the mRNA
in Supplementary Table 9. As a further aid, the miRNA has been sorted
according to predicted gene targets.
Differential expression (DE) analysis of mRNA and
miRNAs
In order to understand the general differences in the transcriptomic
landscape and the miRNAs, we performed DE analysis across various
combinations of producer and parental cell lines using a standard
pipeline.
Since the expression profiles and N-glycan profile of the cell lines
varies with the protein of interest (e.g. erythropoietin, rituximab) and
the cell line, we defined four different contrasts for comparison and
analyzed up- and downregulated genes involved in glycosylation as well
as the set of 806 miRNAs potentially targeting glycosylation genes.
Significantly differentially expressed gene sets (adj.p-value
<= 0.05) were counted across various contrasts (Supplementary
Table 10).
In a first comparison, we examined the effect of EPO-production by
comparing the DG44 parental cell line with the DG44 EPO producer. Here,
24 glycosylation genes, and 258 glycosylation-targeting miRNA’s (in the
following just denoted glyco-genes and glyco-miRNAs) were found to be
up-regulated, 36 glyco-genes, 106 glyco-miRNA’s to be down-regulated
leaving 15 glyco-genes, 442 glyco-miRNA’s not significantly
differentially expressed (See Supplementary Table 11 for details on
genes and Supplementary Table 12 for details on miRNA).
In a similar comparison for IgG-production between the CS13-1.00 high
producer and the DG44 parental cell line, 18 glyco-genes, 71
glyco-miRNA’s and 28 glyco-genes, 350 glyco-miRNA’s were found to be up
and down-regulated respectively (genes in Supplementary Table 13, miRNA
in Supplementary Table 14).
As a third comparison, we examined the difference between the CS13-1.00
high producer and the CS13-0.02 low producer cell lines were performed
separately to identify traits specific to high production versus low. As
expected, the significantly changed genes and miRNAs were much lower in
this comparison, than the producer/non-producer comparisons above. Here,
we saw 11 glyco-genes, 40 glyco-miRNA’s and 11 glyco-genes, 29
glyco-miRNA’s to be up- and down-regulated respectively (genes in
Supplementary Table 15, miRNA in Supplementary Table 16).
As a fourth comparison aimed at identifying common traits for producer
cells (irrespective of their high or low producer status), we compared
all producer cell lines (i.e. DG44-EPO producer along with CS13 high and
low producer) with all parental cell lines (i.e. DD44-WT, CHOS-P and
CHO-K1). This comparison yielded an interesting set of 19 glyco-genes,
104 glyco-miRNA’s and 37 glyco-genes, 324 glyco-miRNA’s to be up- and
down-regulated respectively (genes in Supplementary Table 17, miRNA in
Supplementary Table 18). This is a surprisingly high number of miRNA’s,
more than half of the miRNAs in the cell are differentially expressed
comparing producers and non-producers.
Unsupervised linking of miRNA to mRNA: Correlation
analysis
While the literature study gave links to which miRNAs and genes may be
linked, and the miRwalk made a sequence-based analysis, we wanted to
further the analysis by incorporating an unsupervised approach to the
linking of miRNA to mRNA. In general, an unsupervised statistical
correlation analysis of two related datasets allows linking across
datasets in an un-biased way. Hence, we employed a correlation analysis
of all miRNAs in the set to miRNA’s belonging to the N-glycan pathway.
We chose to use two well-established correlation co-efficients: Pearson
and Spearman. Pearson correlation allowed us to identify linearized
relationships and Spearman correlation measures the ranked strength and
direction of association between the expression variables of gene and
miRNA’s. Here, cell line-specific bias was minimized by averaging the
expression values of each mRNA across 6 cell lines and correlating it
with the expression value of its corresponding miRNA. We thus identified
a list of miRNAs with a negative expression correlation by selecting
only those gene-miRNA combinations that were negatively correlated
(Supplementary Table 19). Similarly, positive correlations were
identified. As an example, miR-6715b-5p, miR-30b-3p and miR-335-3p was
found to repress eight and seven gene targets out of eight, seven and
fifteen in total.
Supervised linking of miRNA to
mRNA
As an addition to the unsupervised approach above, we wanted to include
a supervised approach as well for linking genes to miRNAs. In order to
do this, we made a correlation analysis based on the direction of
statistically significant changes in the comparison of the EPO producers
to the parental cells, rather than the absolute RNA-seq counts (as used
in the un-supervised approached). As negative correlations, we
identified miRNA targets that were either up-regulated in the DE of gene
and down regulated in the DE of miRNA or vice versa. This gave us one
list of miRNAs (Supplementary Table 19). The final table (Supplementary
Table 19) containing all of the positive and negative correlations from
both the supervised and un-supervised analysis.
Target selection
Taking the perspective that miRNA negatively inhibit and transnationally
repress mRNA, we only wanted to take those targets that were negatively
regulated in the supervised and/or unsupervised analysis. This gives us
a filter for miRNA which supposedly works more directly on the
transcript, rather than identify miRNAs targeting transcriptional
repressors/activators. The list of miRNAs with negative correlations in
both the supervised and unsupervised approaches can be seen in
Supplementary Table 19. Combining these two types of correlation with
the miRwalk analysis, the miRNA expression analysis, the miRNA DE
analysis, and literature analysis, we selected a set of 10 miRNAs for
modification; three for KO/KD, and seven for OE studies. The overview of
the analysis is found in Table 1.
Table 1 - miRNAs engineered in this study and their
hypothetical effect on CHO cell phenotypes