SealNet Architecture and Training:
The CNN-based face recognition classifier is the main component of our
software package. We train this classifier with photos that have been
aligned, chipped, and normalized. We trained the CNN for 100 epochs,
using mini-batch gradient descent and a batch size of 16. We started
with a learning rate of 0.01 and used ADADELTA optimizer for our
gradient descent. Each input image underwent four convolutional blocks
and a final bottleneck layer to output an embedded vector of length 512
that contained learned features of the input image (Figure 4 ).
Each convolutional block contained a convolutional layer with a kernel
size of 3x3 and a stride of 1 followed by a max-pooling layer with a
kernel size of 3x3 and a stride of 2. These blocks may contain an
additional Squeeze and Excitation (SE) block that performs feature
recalibration. The addition of SE blocks to our CNN helped the model
better learn the interdependencies between channels, highlighting
informative features while disregarding unimportant features . As
proposed by PrimNet, our convolutional block also employs group
convolutions followed by channel shuffling to make the network sparser
to reduce vulnerability to overfitting .
SealNet was trained using a GeForce RTX 2080 TI graphics card that took
about 4 minutes and 33 seconds on average to finish training each fold.
We trained the model on an average of 485 images of the same resolution
for each fold of the closed-set and on 533 images with dimensions (112,
112) for the open-set.