Classifier training
After nine focal species had been identified, custom-made single-species classifiers were trained for each focal species using the machine learning clustering software Kaleidoscope Pro v 5.4.8 (Wildlife Acoustics). Each classifier was used to detect and identify vocalizations from one species. Kaleidoscope Pro uses a two-step process for species identification: scanning and clustering. During the scanning step, Kaleidoscope scans the dataset of audio files for sounds that match a set of signal parameters tuned to pick up the focal species. Those matching target sounds are extracted, and Kaleidoscope then clusters those sounds based on similarity, and those clusters can be trained to accurately place vocalizations from the focal species into one cluster, while noise and similar sounding species go in other clusters. Appropriate signal parameters were chosen by measuring the minimum frequency, maximum frequency, minimum motif duration, maximum motif duration, and maximum inter-syllable gap of five unattenuated recordings of each focal species. Unattenuated vocalizations were collected from among the site recordings and found during the manual identification step. Absolute minima and absolute maxima, rather than averages, were used to ensure the edges of detected vocalizations were not lost during the scanning step. Spectrogram measurements were taken in Raven Pro v 1.6.4 (Cornell Lab of Ornithology, Ithaca, NY, USA). Signal detection parameters for each species classifier can be found in Table 1.
Once appropriate signal parameters had been identified, a training dataset was compiled for each species. For each classifier, the training dataset contained a) examples of vocalizations from the focal species at the study site, b) randomly selected audio files from the study site which did not include vocalizations from the focal species to be used as negative examples, c) clean recordings of the focal species obtained from the online sound library Xeno-Canto (xeno-canto.org), and d) example audio recordings from the cohort of non-focal species which occur at the study site, also from Xeno-Canto. Setting up the training datasets this way ensured that there were clear examples of the focal species’ vocalization as well as realistic examples containing the type of interfering background noise experienced at the site. Recordings of non-focal species and of random audio from the study site were used as negative examples, i.e., potentially similar sounding vocalizations and background noise which the classifier was trained to identify as non-target sounds. The amount of data for each category can be found in Table 1. Using non-bat analysis mode, Kaleidoscope was then employed to scan and cluster the recordings from the training dataset, using 2.0 max distance from cluster center to include inputs, a 5.33ms FFT window, 12 max states, 0.5 max distance from cluster center for building clusters, and 500 max clusters. Among all the resulting clusters which contained at least one detection of the focal species, detections were manually re-classified as either the focal species or noise. Clusters which did not contain the focal species were left as is. These edited cluster IDs were then used to create a new clustering algorithm. The training and re-clustering steps were repeated for multiple iterations until the accuracy reached 80% or improvement plateaued, resulting in the final trained classifiers.