Machine learning algorithms
Univariate logistic regression models were used to study the prediction accuracy of individual variables. Predictive performance was compared between three classifiers: random forest, logistic regression and extreme gradient boosting. The effect of variable collection time on predictive performance was studied by using logistic regression classifier. Shapley values (SHAP)17 were used to rank the important variables for the trained classifier. Partially dependence plots (PDP)18 were used to explore how the predictions of the trained classifier partially depend on the values of variables (Please see further methods in this article’s supporting information).