Classifier training
After nine focal species had been identified, custom-made single-species
classifiers were trained for each focal species using the machine
learning clustering software Kaleidoscope Pro v 5.4.8 (Wildlife
Acoustics). Each classifier was used to detect and identify
vocalizations from one species. Kaleidoscope Pro uses a two-step process
for species identification: scanning and clustering. During the scanning
step, Kaleidoscope scans the dataset of audio files for sounds that
match a set of signal parameters tuned to pick up the focal species.
Those matching target sounds are extracted, and Kaleidoscope then
clusters those sounds based on similarity, and those clusters can be
trained to accurately place vocalizations from the focal species into
one cluster, while noise and similar sounding species go in other
clusters. Appropriate signal parameters were chosen by measuring the
minimum frequency, maximum frequency, minimum motif duration, maximum
motif duration, and maximum inter-syllable gap of five unattenuated
recordings of each focal species. Unattenuated vocalizations were
collected from among the site recordings and found during the manual
identification step. Absolute minima and absolute maxima, rather than
averages, were used to ensure the edges of detected vocalizations were
not lost during the scanning step. Spectrogram measurements were taken
in Raven Pro v 1.6.4 (Cornell Lab of Ornithology, Ithaca, NY, USA).
Signal detection parameters for each species classifier can be found in
Table 1.
Once appropriate signal parameters had been identified, a training
dataset was compiled for each species. For each classifier, the training
dataset contained a) examples of vocalizations from the focal species at
the study site, b) randomly selected audio files from the study site
which did not include vocalizations from the focal species to be used as
negative examples, c) clean recordings of the focal species obtained
from the online sound library Xeno-Canto (xeno-canto.org), and d)
example audio recordings from the cohort of non-focal species which
occur at the study site, also from Xeno-Canto. Setting up the training
datasets this way ensured that there were clear examples of the focal
species’ vocalization as well as realistic examples containing the type
of interfering background noise experienced at the site. Recordings of
non-focal species and of random audio from the study site were used as
negative examples, i.e., potentially similar sounding vocalizations and
background noise which the classifier was trained to identify as
non-target sounds. The amount of data for each category can be found in
Table 1. Using non-bat analysis mode, Kaleidoscope was then employed to
scan and cluster the recordings from the training dataset, using 2.0 max
distance from cluster center to include inputs, a 5.33ms FFT window, 12
max states, 0.5 max distance from cluster center for building clusters,
and 500 max clusters. Among all the resulting clusters which contained
at least one detection of the focal species, detections were manually
re-classified as either the focal species or noise. Clusters which did
not contain the focal species were left as is. These edited cluster IDs
were then used to create a new clustering algorithm. The training and
re-clustering steps were repeated for multiple iterations until the
accuracy reached 80% or improvement plateaued, resulting in the final
trained classifiers.