Label prediction performance
Balanced accuracy scores for the 1-min UMAP dimensions were high (> 0.85) for the location label (Table 1). Of the samples labelled as ‘Burin’ and ‘Red Island’, 94% and 95% were correctly identified using the UMAP dimensions, respectively. Scores for seismic airgun presence were also high; however, model sensitivity was poor (58.3%), meaning that true positive and false negative predictions occurred with almost equal frequency. Repeating model training using the 128 acoustic features improved performance, and resulted in a drop of both false negatives and false positives. The ship presence classifier trained on the two UMAP dimensions showed a balanced accuracy score of 0.7, with only 33% of samples being correctly identified as presences. The acoustic features classifier displayed a higher balanced accuracy score (0.86), and the number of correctly predicted presences, although still low, increased to 58%.
The random forest classifiers for humpback whale presence trained on the two UMAP dimensions showed the lowest F1 and balanced accuracy score (0.59 and 0.62, respectively), resulting in a large number of mislabelled samples. Once again, repeating model fitting using the acoustic features improved model performance. Training the classifier on the 128 dimensions resulted in increased balanced accuracy score, mainly due to a dramatic increase in classifier sensitivity (93.9%) when compared to the performance of the classifier trained on UMAP dimensions (<0.001%).
Confusion matrices for the WMD and PBD cross validation runs are reported in Appendix S1.