Principal component analysis (PCA) and partial least squares (PLS) are
particularly popular as they are easy to implement and capable of
tackling a variety of problems, e.g. reduce the dimensionality of
the data, extra key features and detect outliers. Hence, it is not
surprising that these algorithms make up for the majority of applied ML
models that are described in literature . In combination with a
regression model, reliable online predictions with high-dimensional data
become available . Often used regression models include support vector
machines (SVMs) and models based on decision trees (e.g. gradient
boosting regressor (GBR), ADABoost or random forest) , , . The
advantages of these regression tree ensembles include fast training
times and the ability to handle large amounts of data, while providing
good accuracy due to combination of multiple estimators . This decision
is also made to facilitate the integration of adaptive solutions in
future, which would require repeated training of new models, e.g.via recursive or ensemble-based methods . However, individual regression
trees are usually not competitive with other methods like SVMs or neural
networks , but thanks to the low computational cost, regression trees
can be combined with bagging or boosting techniques to build a group of
estimators to improve predictive accuracy and control overfitting , . In
bagging, each estimator is trained on a subset of data and the output of
every estimator is averaged for the final prediction, e.g. random
forest or extra trees regression . In boosting, “weak” estimators are
trained in succession on a subset of data and combined into a single
“strong” estimator. This can be achieved e.g. by weighting the
weak estimators according to their accuracy (AdaBoost) or by fitting the
weak estimators using an arbitrary loss function (gradient boosting) .
Clustering is another popular method that is used to organize unlabeled
data according to their similarity. Combined with PCA it is a common
method for process monitoring and process fault detection .