Model Development and Validation
Considering its ability in both variable selection and shrinkage, the least absolute shrinkage and selection operator (LASSO) method was employed to select the optimal predictive factors of tophi formation from all variables.15,16 All 22 variables were included in the LASSO model. Those variables with regression coefficients close to zero were excluded, with the remaining ones considered to be related to tophi formation. Additionally, the multivariable logistic regression model was also used to analyse the variables associated with tophi formation, and took those with P-value less than 0.05 as the possible predictors. Based on the aforementioned analyses and the results of previous clinical researches2,11,12, nine candidate variables were eventually selected as the risk factors, and were applied for establishing a model for predicting the risks of tophi formation.17,18 Afterward, the model’s discrimination and calibration were correspondingly assessed using Harrell’s concordance index (C-index) and a calibration curve. The C-index can reflect the consistency between the actual probability of the outcome and the predicted probability. It ranges between 0.5 and 1.0 and a higher C-index denotes a better accuracy of the prediction model. Meanwhile, by plotting a calibration curve, the relationship between the predicted probability (x-axis) and the observed probability of tophi (y-axis) was tested.
The model was validated by a bootstrap procedure that generates a large number of similar – but not entirely the same – data from the original dataset through random resampling with replacements.19A total of 1,000 bootstrap resamples were utilised to recalculate the relatively corrected C-index. If the corrected C-index is similar to the original value, then the model is deemed ideal due to its potential ability to perform similarly in different datasets. Meanwhile, the clinical practicability of the model was determined by a decision curve analysis that can quantify the net benefits at different threshold probabilities from the included patients’ data.20 The net benefit was calculated by subtracting the proportion of all false positive patients from those of true positive ones, and by weighing the relative harm of giving up interventions compared with the negative consequences of unnecessary interventions.21 All statistical analyses were performed using the R software, version 3.6.2 (https://www.R-project.org).