Model Development and Validation
Considering its ability in both variable selection and shrinkage, the
least absolute shrinkage and selection operator (LASSO) method was
employed to select the optimal predictive factors of tophi formation
from all variables.15,16 All 22 variables were
included in the LASSO model. Those variables with regression
coefficients close to zero were excluded, with the remaining ones
considered to be related to tophi formation. Additionally, the
multivariable logistic regression model was also used to analyse the
variables associated with tophi formation, and took those with P-value
less than 0.05 as the possible predictors. Based on the aforementioned
analyses and the results of previous clinical
researches2,11,12, nine candidate variables were
eventually selected as the risk factors, and were applied for
establishing a model for predicting the risks of tophi
formation.17,18 Afterward, the model’s discrimination
and calibration were correspondingly assessed using Harrell’s
concordance index (C-index) and a calibration curve. The C-index can
reflect the consistency between the actual probability of the outcome
and the predicted probability. It ranges between 0.5 and 1.0 and a
higher C-index denotes a better accuracy of the prediction model.
Meanwhile, by plotting a calibration curve, the relationship between the
predicted probability (x-axis) and the observed probability of tophi
(y-axis) was tested.
The model was validated by a bootstrap procedure that generates a large
number of similar – but not entirely the same – data from the original
dataset through random resampling with replacements.19A total of 1,000 bootstrap resamples were utilised to recalculate the
relatively corrected C-index. If the corrected C-index is similar to the
original value, then the model is deemed ideal due to its potential
ability to perform similarly in different datasets. Meanwhile, the
clinical practicability of the model was determined by a decision curve
analysis that can quantify the net benefits at different threshold
probabilities from the included patients’ data.20 The
net benefit was calculated by subtracting the proportion of all false
positive patients from those of true positive ones, and by weighing the
relative harm of giving up interventions compared with the negative
consequences of unnecessary interventions.21 All
statistical analyses were performed using the R software, version 3.6.2
(https://www.R-project.org).