Interpretability analysis
Logistic regression classifier is linear and thus not able to model
possible non-monotonic relations between predictors and outcome. Random
forest and gradient boosting classifiers are able model complex,
non-monotonous relations, but they are so called black box models which
means non-interpretable classifiers. Relations between inputs and output
are difficult to understand directly from the parameters or structure of
trained model. Hence, SHAP values and PDP plots were used to conduct
post-hoc interpretability analysis for the random forest classifier.
SHAP values enable to calculate exactly for the tree classifiers (such
as random forest) by using the mature treeSHAP
method17.
We performed data flow (Figure S1B) to train the random forest
classifier and calculated SHAP values of the variables collected from
the baseline visit until 6 months after the baseline ESS. Figure 3 shows
variables sorted by the highest sum of absolute SHAP values over all
patients. The distributions of the data points on the plots show the
impacts of each variable for the classifier output. We detected that
high number of visits after baseline ESS and short time between baseline
visit and baseline ESS both increased the revision ESS risk. In
addition, CRSwNP, asthma and NERD increased revision ESS risk. SHAP
values show that the age of patients and the visit frequency from
baseline visit to baseline ESS affected revision ESS risk in a
non-monotonic way. That is, the red values (the higher than the average
values) of these variables are dispersed on both sides of the scale
(Figure 3).
We formed PDP plots of the ten variables with the highest SHAP values.
The plots of the following variables showed a large risk score scale for
a revision ESS: the number of visits 6 (or 3) months after the baseline
ESS, the time between baseline visit and baseline ESS, age, the number
of visits between baseline visit and baseline ESS, CRSwNP and asthma.
The average predicted risk score varied more than .02 units between the
low and high value of these predictors, whereas for the other predictors
the PDP risk score varied less than .02 units (Figure S4). The PDP plot
of the number of visits 6 months after the baseline ESS, showed a large
scale of the risk score ranging from value of .1 for patients with less
than two visits after baseline ESS, up to a value of about .35 for
patients with more than seven visits (Figure S4A). Similarly, if patient
had two or more postoperative visits within the 3 months, the risk score
for revision ESS increased (Figure S4D). The plot of the time between
baseline visit and baseline ESS showed a sharp drop of the risk score
after about 100 days (Figure S4F). When the time between baseline visit
and ESS was less than 100 days, the risk score was about .15. When the
time increases to > 500 days, the risk score decreases to
< .13. The PDP curve for age was non-monotonic and the risk
scores varied from .1 for patients with age from 10-30 years, to about
.17 for patients with age from 60-70 years (Figure S4E). The risk scores
were .13- .15 for patients with age from 30-60 or over 70 years. The
number of visits between baseline visit and baseline ESS was
non-monotonic. The patients with 10-20 visits between the baseline visit
and baseline ESS had smaller risk for revision ESS than the patients
with less than 10 or more than 20 visits (Figure S4I).