Extra trees, gradient boosting, and the combination of AdaBoost and
extra trees regression perform very similarly in terms of accuracy. If
an adaptive ML solution is desired, the training time could play a
critical role, but for the scope of this work all three highlighted
models will be further evaluated with live data in the tryout stage
(section 3.5).
For the monitoring system, the pressure drop time series will be
classified based on the 20 second forecast window and the measurements
from the last 10 seconds to determine the current operating state of the
distillation column, i.e. flooding or not flooding. Therefore,
the historical pressure drop data is preprocessed as described in
section 3.1 with a window size of 30 seconds and used to identify
meaningful clusters (k-means clustering) and label the data. As this
method is distance based, a prior scaling of the data is performed. The
time series features trend and level are determined from the slope of a
linear fit and the median of the 30 second windows, respectively.
Additionally, the data is filtered for positive slopes and medians above
80 Pa as flooding is only observed for increasing pressure drop trends
and high pressure drop levels.
In order to identify an appropriate number of clusters, the elbow method
is applied . For this method, the k-means clustering algorithm is
executed with a varying number of clusters, which are plotted against
the inertia, i.e. the sum of squared distances of samples to their
closest cluster center. Adding additional cluster centers after there
are already enough clusters to describe the data, lead to a smaller
change in inertia and a characteristic kink (elbow) is observed in the
plot (Figure 5, left hand side). For the presented data this elbow is
found at 5 clusters. In Figure 5 (right hand side) the data and cluster
centers are visualized by a median against slope plot. Most of the data
lies around a slope of 0, as it is the desired stable operating state.
The blue and green (far right-hand side) clusters describe operating
states, where the pressure drop is increasing and the column might flood
soon. For the implementation with live data a warning will be displayed,
if those conditions are observed. The remaining clusters will indicate
normal operating behavior. Note that the data was transformed back to
the original values for the visualization, but scaled data was used for
the clustering process.