Figure 3 . Variable importance and dependence plots for the top
selected variables linking microhabitat to of nestbox occupancy by hazel
dormice M. avellanarius in a UK woodland. Top large panel shows
the 27 variables with binomial test p-values<0.05 with values
for the three importance metrics that were less correlated in their
ranking (chosen to showcase differences in variable importance among
metrics. Fig S1). The three metrics are used for displaying purposes,
but all seven metrics were considered to identify the most important
variables shown in red colour and labelled with letters that correspond
to the bottom dependence plot panels. Bottom panels (labelled a to j)
show changes in predicted probability of nestbox occupancy by hazel
dormice for the ten most relevant predictors (red symbols on the top
panel) in descending order of variable importance (from left to right,
top to bottom).
The random forest model based on these ten variables had a OOB error
rate of 22.22% (model accuracy 77.78%) with 20.8% false positives
(specificity=79.2%) and 23.8% false negatives (sensitivity =76.2%).
The model predicted increased probability of nestbox occupancy with more
trees within ten metres, particularly more hazel C. avellana and
hawthorn C. monogyna trees and at intermediate to high levels of
tree canopy and/or understorey closure (values above 90% cover resulted
in lower probability of occupancy). Occupancy was also more likely in
areas with higher percentages of understorey cover by hazel and
honeysuckle L. periculum but lower ground cover of dog’s mercuryMercurialis perennis , and for nestboxes located nearer to other
boxes (within 10-15m distance) or isolated (lower probability for
intermediate distances) and located further from footpaths and slightly
away from woodland margins which may be sources of disturbance (Fig 3).
Occupancy data from 2021 available for model validation was limited, as
only 11 boxes in total across the site were occupied during June to
October (ten in the woodland site and only one in thehedgeline ) and from those, five boxes in the woodland site
were not included in our dataset (thus, we lacked habitat data and could
not predict occupancy). The random forest model based on the top ten
variables correctly predicted occupancy for five of the six nestboxes
occupied in summer 2021 resulting in a 16.7% false negative rate. The
single false negative (predicted to be empty but found to be occupied)
was a nestbox that had not been occupied in any previous years and was
found with an unwoven nest with green leaves in October, but no dormice
were present. Due to low numbers of hazel dormice in 2021 (nestbox
occupancy was very low), our predictions had a higher false positive
rate (41.0%) with 16 boxes predicted to be occupied by the model but
found empty during the surveys (the remaining 23 were predicted to be
empty and found empty). Predictions based on the complete model with all
variables were identical.