Conclusions
Prognostication of clinical outcomes after OHT has profound importance in patient selection and organ allocation. The present study demonstrates the potential utility of employing modern ML techniques to improve prognostic model performance at an individual patient level. A final ensemble ML model using only preoperative variables outperformed all other comparison algorithms in predicting one-year survival, achieving improved performance by a variety of metrics including AUROC, net reclassification index, and decision curve analysis. Further, the model demonstrated appropriate calibration.
Cardiac surgery as a field has historically been an early adopter of clinical prognostic models,14 most notably the widely used Society of Thoracic Surgeons (STS) Short-Term Risk Calculators.15 However, predicting one-year mortality after OHT has remained a persistent challenge. Early OHT risk models incorporated a select number of variables with only modest overall performance.7,8,16 More recent models have added an increasing number of variables into more robust models, such as the IMPACT9 and IHTSA10, but have been able to achieve only slight improvements in discriminatory performance. In part, this may relate to the challenges in capturing all granular and potentially predictive elements of post-transplant survival in a multicenter registry. For example, factors such as anti-rejection medication compliance are not assessed but can have important implications in survival following transplant. Also, there is a trade-off in assessing longer term outcomes, such that the event rate will be higher but the impact of pre-operative risk factors on that outcome will likely diminish as longer-term factors weight more heavily into outcome prediction.
Machine learning techniques have demonstrated clinical utility in a number of different fields.17-21 Within OHT, the IHTSA score itself employs an artificial neural network approach, and has consistently demonstrated some of the highest discriminatory values out of all current models in recent studies.22 Moreover, a recent study recalibrated both the IMPACT and IHTSA models to use only the same subset of variables, and found the deep learning approach was superior.23 The recent Trees of Predictors model is also an innovative approach that identifies clusters of patients with similar characteristics, and develops machine learning predictive models specifically for each cluster.24 The success of this approach demonstrates the potential for developing very individualized prognostic scores, at the risk of overfitting the model to specific retrospective cohorts that may not translate to prospective clinical practice.
The final ensemble ML model we developed in the present study is an example of using both clinical acumen and automated machine learning to develop a robust model from a large clinical registry. The statistical adage of “garbage in produces garbage out” remains especially true for machine learning approaches.25 It is particularly relevant for black box algorithms when used clinically, as there is low interpretability for clinicians in terms of how the algorithm arrives at its final prognosis. Moreover, registry data is particularly prone to reporting inaccuracies and missing data, resulting in poor prognostic ability if machine learning approaches are applied without sufficient data preparation.26 Our approach was to combine both expert clinician manual review of the variables with automated feature selection techniques in order to arrive at the final set of variables. While time-consuming, we believe this collaborative approach is necessary in order to derive utility from registry level data. Moreover, while computationally more expensive, the ensemble machine learning approach allows for the integration of multiple types of algorithms into one cohesive model, which has been suggested to produce a more robust final product.27
While this is not the first study to employ machine learning techniques for OHT prediction, it describes the use of more robust feature selection techniques and the development of a larger scale ensemble ML model than has been previously reported. This example of applying modern techniques may help to overcome the registry-level data limitations that have hindered prior studies.
This study has several limitations that need to be considered when interpreting the results. First, it is retrospective in nature and subject to all inherent limitations of such studies. Most notably, there have been a number of substantial changes in the allocation system and clinical management of OHT patients over the timeframe encapsulated by the study period. As such, there is associated bias as risk models including the one developed in the current study cannot account for individual provider or transplant program decision-making. Second, the UNOS database, similar to other multicenter registries, has a number of limitations including variability in data reporting and quality. As such, assumptions are made for missing data that may introduce bias and there may be clinically important variables not captured in the available dataset. Finally, while we created a randomly selected validation cohort at the outset of the study, an independent validation cohort separate from the UNOS database was not available for testing. Further study is warranted on independent, prospective data not present in the current dataset in order to provide more comprehensive validation testing of the final model.
In conclusion, an ensemble ML model was able to achieve greater predictive performance as compared to individual ML models and logistic regression in predicting survival after OHT. This analysis demonstrates the potential of modern ML techniques in risk prediction for OHT. These approaches may have important implications in patient selection, programmatic evaluation, policy-making, and patient counseling in OHT.