Machine learning implementation

The aforementioned algorithms belong to the classical ML techniques and can be quickly implemented as they are readily available in software libraries for e.g. Matlab, Python or R . The following procedure to employ ML techniques in a digital twin framework was demonstrated by Min et al. and can be roughly applied in most cases :
Preprocessing describes for example the temporal alignment, de-noising or scaling of data  , , feature extraction, the selection or transformation of data by means of a correlation matrix , PCA or even operator experience . Some common modelling techniques were already presented in Table 1. Other noteworthy approaches include genetic/evolutionary algorithms, fuzzy logic, probability-based techniques (e.g. Gaussian processes, ), semi-supervised and reinforcement learning or artificial neural networks. For the latter kind, structures with convolutional layers and long short-term memory units (LSTM) have proven to be effective for image analysis and time series data, respectively. These neural networks capture the spatial or temporal structure of the data and led to the remarkable advances in image classification and speech recognition . As the vast amount of choices for a ML model can be overwhelming at first, it is common practice to test different models and compare their performance based on chosen metrics. In case of a regression problem, the root mean squared error or the coefficient of determination (R²) are often used. If an adaptive online mechanism is desired, the training and forecast speed can be a deciding factor as well.
During the tryout and optimization stage, the trained model is tested in a real-time operating environment. Since the model is trained on historical data, it is important to verify its operational reliability in the latest environment and adapt the model if necessary. Finally, the virtual model is deployed with connection to the real-time data and the process control or monitoring system. The optimal set of control parameters can be found by means of search algorithms like depth-first search, breadth-first search or grid search or by employing model predictive control or other control strategies , . For security reasons, it is advisable to implement visible recommendations from the ML model for an operator rather than a direct access to the process control system, especially at an experimental stage , . This is related to the veracity problem that is often associated with ML solutions . It can be difficult to generate interpretable suggestions made by purely data-driven models and justify the adaptation of e.g. a control strategy based on these models. A possible solution lies in the incorporation of first-principle models to form hybrid models that support rational decision-making , . Such advances could revolutionize the perception of ML solutions in process engineering, but are still considered as a “long, adventurous, and intellectually exciting journey” .
Another opportunity exists in the integration of edge and cloud computing solutions for the facilitated access and treatment of data . With the increasing computational power of microcontrollers, it becomes more and more possible to locally preprocess and analyze sensor datae.g. for each unit operation and forward the processed information to a higher-level control or data handling structure, which in turn can function more efficiently due to the reduced amount, but higher quality of data  , . The development of this type of ”smart equipment” by using new sensor or ML solutions to determine hard-to-measure variables and offer more process flexibility has attracted considerable attention in academic and industrial research  , , .
Some examples include the digitalization of extraction columns by means of novel measurement techniques complemented by modeling and simulation methods to create tools for predictive online monitoring, which is reviewed in Hlawitschka et al. . Other approaches work on the integration of novel sensors and actuators to obtain valuable process information and create more responsive equipment  . In order to facilitate the integration of the given examples and other ML solutions into existing processes, new concepts with standardized interfaces and communication protocols are emerging . One promising concept that stands out in these aspects is the Module Type Package (MTP), which is a module-based approach with embedded process knowledge and standardized interfaces according to VDI/VDE/NAMUR 2658 part 1-4 , , , . MTP enables a quick and flexible design of processes and integration of modules, so called process equipment assemblies (PEAs), compare with VDI 2776 , into a higher-level control system, which is referred to as process orchestration layer (POL) . Due to the standardized interfaces, the data from every PEA or the entire process is easily accessible and ML solutions can be quickly implemented. A special MTP feature is the service oriented architecture that provides the possibility to run recipes with the predefined services each PEA offers to the POL . This feature could be used to run the process with many different control variables, study the respective effects of control variables and observe states that are usually undesired. For most processes this kind of data is scarce as it is not the optimal way to operate the process, but it is useful for the training of ML models that are supposed to prevent those states . The recipe feature could be further used for the automated conduction of experiments via “Design of Experiments” (DoE) in conjunction with ML algorithms to optimize a product or process . The implementation of such ML solutions via a service-like architecture as proposed by Soto et al. is also an interesting concept, which would greatly benefit from accepted standards.