Model Training and Performance Comparison
A total of 400 algorithms were trained using varying subsets of training data based on randomly stratified levels of over and under resampling of the training dataset (Figure 2 ). The final ensemble ML model contained all 400 underlying algorithms, while smaller sized ensemble models were also combined using the 100 iterations of each type of algorithm individually for comparison. The optimal model performance was the complete ensemble ML model (Figure 3 ), outperforming all other models with an AUROC of 0.764 (95% CI, 0.745-0.782) (p<0.001). By comparison, the singular logistic regression model had an AUROC of 0.649 (95% CI, 0.628-0.670). Additionally, the final ensemble ML model demonstrated an improvement of 72.9% ±3.8% (p<0.001) in predictive performance as assessed by net reclassification index compared to logistic regression. The decision curve analysis showed the final ensemble method improved risk prediction across the entire spectrum of predicted risk as compared to all other models (Figure 4 , p<0.001). The final ensemble ML model was well-calibrated, with the majority of observed risk in the validation cohort falling within range of predicted risk based on the training cohort after stratifying into deciles of risk (Supplemental Figure 2 ).