Face Detection:
We created the graphical user interface (GUI) in C++ by modifyingimglab tools for image annotation We trained the interface to detect seal faces, allowing for automated detection of all seal faces in each photo. In addition, the GUI allows for the option to manually select seal faces by drawing boxes around valid faces in the application. A valid seal face is determined based on the quality and clarity of the image, as well as the angle of the seal face to the camera. Invalid faces are those that are too blurry, not facing the camera, or are partially obstructed. Invalid faces are ignored by the software, as are regions of the image not marked as faces. Variations in illuminations, lighting, and other conditions can introduce noise to the data and impede analysis. We next converted the photos to grayscale to help the model learn based on physical features and color patterns rather than the colors, which also serves to reduce overfitting during training. After all photos were aligned and cropped, we manually grouped photos of the same seals into folders by individual. To train our face detector, we selected all seal faces from the 516 photos taken at Brandt Ledges on January 29th, 2020.
Our imglab based face detection software is a CNN network which uses Max-Margin Object Detection ( loss function. The first three layers of the network downsample the input images by 8 and output a feature map of 32 channels. This feature map will go through 4 more convolutional layers with batch normalization and Rectified Linear Unit (ReLU) as nonlinearity. The final output will only have 1 channel; a large value will indicate that the network has found an object at that location and vice versa.
Using the full 2020 dataset, we measured the accuracy of the model using 5-fold stratified cross-validation. Each strata (i.e, each location and date) was split into 5 sections. For each fold, 4 of the 5 sections were chosen as a training set while the remaining section was used as a validation set. For each fold, the training set contained ~413 photos from all 5 locations, and the validation set contained ~103 photos from the same 5 locations. The accuracy of the face detector is measured by two metrics: precision and recall (Figure 3 ).