4.2 Instances of CBM coupled with ML for fermentation analysis
and optimization
Routinely, CBM uses genetic and environmental conditions as inputs to
predict metabolic flux distributions. However, Sridhara et al.
investigated whether they could infer bacterial growth conditions from
internal fluxomics in an inverse manner. For this reason, the prediction
conducted using a simple linear regression. The results showed that
using the intracellular flux values, carbon and nitrogen sources
utilized in the initial culture medium could be predicted even with a
small number of impurities [114]. In a recent study, Oyetunde et al.
extracted over 1,200 curated bioprocess datasets from ∼100 articles to
predict microbial factories’ performance (yield, titer, and rate). The
authors generated additional flux-based features from a CBM model to
augment ML input data. Next, they applied ensemble methods to alleviate
data challenges such as sparse, non-standardized, and incomplete
datasets. The developed ML-CBM model could predict an engineeredEscherichia coli performance with high accuracy [115]. In
2016, Wu et al. developed MFlux, an online platform, for predicting
bacterial central metabolism. The authors used ML approaches (SVM, KNN,
and decision tree) to train previously experimental data, including
substrate types, bioprocess strategies, and genetic modifications from
about 100 13C-MFA articles. MFlux outputs can be used as inputs for FBA
to reduce the solution space, thus improving the model’s accuracy
[116].
Most recently, a novel CBM-ML hybridization approach for time-course
controlling nutrients availability in a fed-batch CHO cell culture has
been developed. For this reason, Schinn et al. used ML as a tool to
overcome CBM limitations, such as optimal metabolism considerations and
steady-state assumptions. In this study, cell density, product titer,
glucose, lactate, glutamine, and glutamate concentrations were used as
constraints for the FBA solution. The metabolic model calculated the
initial consumption rates of proteogenic amino acids. Next, a series of
linear regressions were used to refine the predictions. Finally, using a
sigmoid function, the refined consumption rates were fit to a
time-course dependent profile. The model was able to correctly forecast
the concentrations of 13 out of 18 amino acids [117].
Essential genes are genes that are critical for cell viability and
growth. Gene essentiality is not an intrinsic trait of a gene. But
instead, it can be influenced by environmental and genetic contexts
[118]. Nandi et al. developed an SVM-based model named SVM-RFE to
classify Escherichia coli genes as essential or non-essential.
The model input included a mixture of genotypic and phenotypic features,
i.e., gene and protein sequences, topological network, and gene
expression. Then, they employed flux coupling analysis (FCA) to generate
flux-based features to consider gene adaptability in different
environmental conditions. SVM-RFE was trained on 4094 reaction-gene
combinations with 64 features. The model could successfully capture the
minimal set of essential genes in various environmental conditions with
high accuracy [119]. This study shows the importance of selecting
and describing appropriate features in an ML study.
In the context of multi-omics integration, Zampieri et al. employed a
combination of CBM and ML to predict lactate production, a secondary
metabolite, in CHO cell culture. In this study, transcriptomics data
from different culture conditions were integrated with fluxomics data
from in-silico genome-scale modeling to construct a data-driven
framework. The results showed an improving performance over the
predictive power of pure transcriptomic analysis [120]. Similarly,
Vijayakumar et al. proposed a machine learning pipeline integrated with
genome-scale modeling to improve phenotypic prediction in a
lipid-producing cyanobacterium. First, they extracted RNA sequencing
data from 23 different growth conditions to develop condition-specific
GEMs via transcriptomics data integration. Then, FBA was performed to
obtain context-specific fluxomic data. The preprocessing stage was
conducted to incorporate fluxomics into experimental transcriptomics
data. PCA, k-mean clustering, and LASSO regression were used to identify
the dataset’s key features. As a result, a data-driven multi-view model
was developed with a high phenotype predictive accuracy [121]. This
strategy also has been adapted to predict yeast S. cerevisiae growth
rate. In this study, fluxomics, generated from parsimonious flux balance
analysis (pFBA), were coupled with transcriptomics to train neural
networks [122].