Extra trees, gradient boosting, and the combination of AdaBoost and extra trees regression perform very similarly in terms of accuracy. If an adaptive ML solution is desired, the training time could play a critical role, but for the scope of this work all three highlighted models will be further evaluated with live data in the tryout stage (section 3.5).
For the monitoring system, the pressure drop time series will be classified based on the 20 second forecast window and the measurements from the last 10 seconds to determine the current operating state of the distillation column, i.e. flooding or not flooding. Therefore, the historical pressure drop data is preprocessed as described in section 3.1 with a window size of 30 seconds and used to identify meaningful clusters (k-means clustering) and label the data. As this method is distance based, a prior scaling of the data is performed. The time series features trend and level are determined from the slope of a linear fit and the median of the 30 second windows, respectively. Additionally, the data is filtered for positive slopes and medians above 80 Pa as flooding is only observed for increasing pressure drop trends and high pressure drop levels.
In order to identify an appropriate number of clusters, the elbow method is applied . For this method, the k-means clustering algorithm is executed with a varying number of clusters, which are plotted against the inertia, i.e. the sum of squared distances of samples to their closest cluster center. Adding additional cluster centers after there are already enough clusters to describe the data, lead to a smaller change in inertia and a characteristic kink (elbow) is observed in the plot (Figure 5, left hand side). For the presented data this elbow is found at 5 clusters. In Figure 5 (right hand side) the data and cluster centers are visualized by a median against slope plot. Most of the data lies around a slope of 0, as it is the desired stable operating state. The blue and green (far right-hand side) clusters describe operating states, where the pressure drop is increasing and the column might flood soon. For the implementation with live data a warning will be displayed, if those conditions are observed. The remaining clusters will indicate normal operating behavior. Note that the data was transformed back to the original values for the visualization, but scaled data was used for the clustering process.