UMAP label prediction performance
Model evaluation scores were above 0.7 for all of the WMD trials (Table
1), but with varying results depending on the specific label. The best
classification results were obtained for Balanopteridae species
(F1 = 0.998; balanced accuracy = 0.987), while the classifier built forDelphinidae species had the lowest performance (F1 = 0.829;
balanced accuracy = 0.703). Classification accuracy varied across
trials. For example, in the first trial, most Mysticete and
Odontocete samples were correctly labelled, while 59% of thePinniped samples were mislabelled. In the second trial, 99%,
74%, and 71% of the Balaenopteridae, Eschrichtiidae , andBalaenidae samples were correctly classified. Of the fourOdontocede families, Physteridae , Delphinidae , andPhocoenidae , 99%, 90%, and 78% of the samples were correctly
classified, respectively. Only 56% of the testing samples for the
family Monodontidae were classified correctly.
Table 1. k-fold nested cross-validation input and results. The table
reports model features (X), labels (Y), and evaluation metrics (F1
score, Balanced Accuracy score). Best models, model hyperparameters, and
scores per run can be found in appendix S1.
All of the three Balaenoptera species considered in the study
were correctly classified in the vast majority of cases, with scores
equal or above 98% of correct predictions. Eight of the 14Delphinidae species had 80% or more correct label predictions.
Of the four labels tested for orcas, correct labels ranged from 87%
(WN Atlantic ) to 92% (EN Atlantic ), except for theEN Pacific labels, with only 33% of the labels guessed
correctly. Both model performance metrics reflected such class
imbalances, with lower scores for models containing a mix of labels with
low and high prediction accuracy. Balanced-accuracy scores provided a
more conservative metric and were more sensitive to class imbalance than
the F1 scores.
Placentia Bay
Dataset