Conclusions
Prognostication of clinical outcomes after OHT has profound importance
in patient selection and organ allocation. The present study
demonstrates the potential utility of employing modern ML techniques to
improve prognostic model performance at an individual patient level. A
final ensemble ML model using only preoperative variables outperformed
all other comparison algorithms in predicting one-year survival,
achieving improved performance by a variety of metrics including AUROC,
net reclassification index, and decision curve analysis. Further, the
model demonstrated appropriate calibration.
Cardiac surgery as a field has historically been an early adopter of
clinical prognostic models,14 most notably the widely
used Society of Thoracic Surgeons (STS) Short-Term Risk
Calculators.15 However, predicting one-year mortality
after OHT has remained a persistent challenge. Early OHT risk models
incorporated a select number of variables with only modest overall
performance.7,8,16 More recent models have added an
increasing number of variables into more robust models, such as the
IMPACT9 and IHTSA10, but have been
able to achieve only slight improvements in discriminatory performance.
In part, this may relate to the challenges in capturing all granular and
potentially predictive elements of post-transplant survival in a
multicenter registry. For example, factors such as anti-rejection
medication compliance are not assessed but can have important
implications in survival following transplant. Also, there is a
trade-off in assessing longer term outcomes, such that the event rate
will be higher but the impact of pre-operative risk factors on that
outcome will likely diminish as longer-term factors weight more heavily
into outcome prediction.
Machine learning techniques have demonstrated clinical utility in a
number of different fields.17-21 Within OHT, the IHTSA
score itself employs an artificial neural network approach, and has
consistently demonstrated some of the highest discriminatory values out
of all current models in recent studies.22 Moreover, a
recent study recalibrated both the IMPACT and IHTSA models to use only
the same subset of variables, and found the deep learning approach was
superior.23 The recent Trees of Predictors model is
also an innovative approach that identifies clusters of patients with
similar characteristics, and develops machine learning predictive models
specifically for each cluster.24 The success of this
approach demonstrates the potential for developing very individualized
prognostic scores, at the risk of overfitting the model to specific
retrospective cohorts that may not translate to prospective clinical
practice.
The final ensemble ML model we developed in the present study is an
example of using both clinical acumen and automated machine learning to
develop a robust model from a large clinical registry. The statistical
adage of “garbage in produces garbage out” remains especially true for
machine learning approaches.25 It is particularly
relevant for black box algorithms when used clinically, as there is low
interpretability for clinicians in terms of how the algorithm arrives at
its final prognosis. Moreover, registry data is particularly prone to
reporting inaccuracies and missing data, resulting in poor prognostic
ability if machine learning approaches are applied without sufficient
data preparation.26 Our approach was to combine both
expert clinician manual review of the variables with automated feature
selection techniques in order to arrive at the final set of variables.
While time-consuming, we believe this collaborative approach is
necessary in order to derive utility from registry level data. Moreover,
while computationally more expensive, the ensemble machine learning
approach allows for the integration of multiple types of algorithms into
one cohesive model, which has been suggested to produce a more robust
final product.27
While this is not the first study to employ machine learning techniques
for OHT prediction, it describes the use of more robust feature
selection techniques and the development of a larger scale ensemble ML
model than has been previously reported. This example of applying modern
techniques may help to overcome the registry-level data limitations that
have hindered prior studies.
This study has several limitations that need to be considered when
interpreting the results. First, it is retrospective in nature and
subject to all inherent limitations of such studies. Most notably, there
have been a number of substantial changes in the allocation system and
clinical management of OHT patients over the timeframe encapsulated by
the study period. As such, there is associated bias as risk models
including the one developed in the current study cannot account for
individual provider or transplant program decision-making. Second, the
UNOS database, similar to other multicenter registries, has a number of
limitations including variability in data reporting and quality. As
such, assumptions are made for missing data that may introduce bias and
there may be clinically important variables not captured in the
available dataset. Finally, while we created a randomly selected
validation cohort at the outset of the study, an independent validation
cohort separate from the UNOS database was not available for testing.
Further study is warranted on independent, prospective data not present
in the current dataset in order to provide more comprehensive validation
testing of the final model.
In conclusion, an ensemble ML model was able to achieve greater
predictive performance as compared to individual ML models and logistic
regression in predicting survival after OHT. This analysis demonstrates
the potential of modern ML techniques in risk prediction for OHT. These
approaches may have important implications in patient selection,
programmatic evaluation, policy-making, and patient counseling in OHT.