Figure 10: Image of extraction column (left hand side), example pictures
for optimal operating point (upper right-hand side) and flooding (lower
right-hand side) with the area of interest (where the two states are
distinguishable) indicated by the white box.
Images are taken under various different illuminations, light from the
right-, the left-hand side, both sides, or daylight only. A Panasonic
DMC-FZ72 camera is used with an image resolution of about 0,13 mm/pixel.
An image preprocessing routine follows, where the desired image section
is cut out as indicated in the right-hand side images in Figure 10.
The images are labeled according to the operating state “normal
operating state” or “flooding”. More than 1000 images per class are
fed as training data to the neural net. As convolutional neural network
CNN resnet18 is used and retrained for this purpose. The core idea of
ResNet is introducing a so-called “identity shortcut connection” that
skips one or more layers. Since the shortcut connection is learning only
the residual, the whole module is called residual module. The shortcut
connection’s skipping of certain layers speeds up the training process
of the net. Resnet-18 consists of 18 convolution blocks, which each
consist of several different layers.
After seven training epochs the neural net achieves an accuracy of 99.3
%. For training purpose, the batch size was chosen to 4 with stochastic
gradient descent and momentum (SGDM) as solver and with an initial
learning rate of 0.001.
To check whether the net provides a reasonable performance, a confusion
matrix is created. Here, the predicted class of the network is compared
to the true class that is given to the image. If the predicted and the
true class are the same, the network can make correct predictions. For
validation of the trained net a set of 252 test images with 126 for each
state, not used for the training of the neural net beforehand, is used.
The obtained accuracy is 99.7% with a single misclassified image
During the investigations, the following question came up: what if the
net can predict the class correctly, but is based on unreasonable
sections within the image? To exclude this error source from training a
network, a class activation map (CAM) is introduced. Its purpose is to
visualize, within which area of the image the neural net deems most
important to base on its class prediction decision. This CAM is
constructed in Matlab by using the “activations()” function and
plotting it on top of the analyzed image, see Figure 11.