Figure 1 - Comparison of 6 cell lines used for RNAseq in bioreactor batch culture in duplicates. All RNA samples were taken on day 4 of cultivation. Error bars are showing standard deviation of duplicate culture.

RNA-seq data quality control, alignment, and normalization

All reads were subjected to quality control and trimming. Trimmed mRNA and miRNA reads were aligned to the Chinese hamster genome and read counts (logCPM) were calculated. Post-alignment, data was normalized for comparability between samples. To evaluate the normalization, library sizes of the sample sets were compared before and after normalization, providing adjusted library size pre- and post-normalization. As no problems with the aligned and normalized data were apparent, this was used for further analysis. Data on the alignment counts are available for the glycosylation genes in Supplementary Table 8, and for the mRNA in Supplementary Table 9. As a further aid, the miRNA has been sorted according to predicted gene targets.

Differential expression (DE) analysis of mRNA and miRNAs

In order to understand the general differences in the transcriptomic landscape and the miRNAs, we performed DE analysis across various combinations of producer and parental cell lines using a standard pipeline.
Since the expression profiles and N-glycan profile of the cell lines varies with the protein of interest (e.g. erythropoietin, rituximab) and the cell line, we defined four different contrasts for comparison and analyzed up- and downregulated genes involved in glycosylation as well as the set of 806 miRNAs potentially targeting glycosylation genes. Significantly differentially expressed gene sets (adj.p-value <= 0.05) were counted across various contrasts (Supplementary Table 10).
In a first comparison, we examined the effect of EPO-production by comparing the DG44 parental cell line with the DG44 EPO producer. Here, 24 glycosylation genes, and 258 glycosylation-targeting miRNA’s (in the following just denoted glyco-genes and glyco-miRNAs) were found to be up-regulated, 36 glyco-genes, 106 glyco-miRNA’s to be down-regulated leaving 15 glyco-genes, 442 glyco-miRNA’s not significantly differentially expressed (See Supplementary Table 11 for details on genes and Supplementary Table 12 for details on miRNA).
In a similar comparison for IgG-production between the CS13-1.00 high producer and the DG44 parental cell line, 18 glyco-genes, 71 glyco-miRNA’s and 28 glyco-genes, 350 glyco-miRNA’s were found to be up and down-regulated respectively (genes in Supplementary Table 13, miRNA in Supplementary Table 14).
As a third comparison, we examined the difference between the CS13-1.00 high producer and the CS13-0.02 low producer cell lines were performed separately to identify traits specific to high production versus low. As expected, the significantly changed genes and miRNAs were much lower in this comparison, than the producer/non-producer comparisons above. Here, we saw 11 glyco-genes, 40 glyco-miRNA’s and 11 glyco-genes, 29 glyco-miRNA’s to be up- and down-regulated respectively (genes in Supplementary Table 15, miRNA in Supplementary Table 16).
As a fourth comparison aimed at identifying common traits for producer cells (irrespective of their high or low producer status), we compared all producer cell lines (i.e. DG44-EPO producer along with CS13 high and low producer) with all parental cell lines (i.e. DD44-WT, CHOS-P and CHO-K1). This comparison yielded an interesting set of 19 glyco-genes, 104 glyco-miRNA’s and 37 glyco-genes, 324 glyco-miRNA’s to be up- and down-regulated respectively (genes in Supplementary Table 17, miRNA in Supplementary Table 18). This is a surprisingly high number of miRNA’s, more than half of the miRNAs in the cell are differentially expressed comparing producers and non-producers.

Unsupervised linking of miRNA to mRNA: Correlation analysis

While the literature study gave links to which miRNAs and genes may be linked, and the miRwalk made a sequence-based analysis, we wanted to further the analysis by incorporating an unsupervised approach to the linking of miRNA to mRNA. In general, an unsupervised statistical correlation analysis of two related datasets allows linking across datasets in an un-biased way. Hence, we employed a correlation analysis of all miRNAs in the set to miRNA’s belonging to the N-glycan pathway. We chose to use two well-established correlation co-efficients: Pearson and Spearman. Pearson correlation allowed us to identify linearized relationships and Spearman correlation measures the ranked strength and direction of association between the expression variables of gene and miRNA’s. Here, cell line-specific bias was minimized by averaging the expression values of each mRNA across 6 cell lines and correlating it with the expression value of its corresponding miRNA. We thus identified a list of miRNAs with a negative expression correlation by selecting only those gene-miRNA combinations that were negatively correlated (Supplementary Table 19). Similarly, positive correlations were identified. As an example, miR-6715b-5p, miR-30b-3p and miR-335-3p was found to repress eight and seven gene targets out of eight, seven and fifteen in total.

Supervised linking of miRNA to mRNA

As an addition to the unsupervised approach above, we wanted to include a supervised approach as well for linking genes to miRNAs. In order to do this, we made a correlation analysis based on the direction of statistically significant changes in the comparison of the EPO producers to the parental cells, rather than the absolute RNA-seq counts (as used in the un-supervised approached). As negative correlations, we identified miRNA targets that were either up-regulated in the DE of gene and down regulated in the DE of miRNA or vice versa. This gave us one list of miRNAs (Supplementary Table 19). The final table (Supplementary Table 19) containing all of the positive and negative correlations from both the supervised and un-supervised analysis.

Target selection

Taking the perspective that miRNA negatively inhibit and transnationally repress mRNA, we only wanted to take those targets that were negatively regulated in the supervised and/or unsupervised analysis. This gives us a filter for miRNA which supposedly works more directly on the transcript, rather than identify miRNAs targeting transcriptional repressors/activators. The list of miRNAs with negative correlations in both the supervised and unsupervised approaches can be seen in Supplementary Table 19. Combining these two types of correlation with the miRwalk analysis, the miRNA expression analysis, the miRNA DE analysis, and literature analysis, we selected a set of 10 miRNAs for modification; three for KO/KD, and seven for OE studies. The overview of the analysis is found in Table 1.
Table 1 - miRNAs engineered in this study and their hypothetical effect on CHO cell phenotypes