Data Preparation and Feature Selection
All variables in the UNOS database available for the OHT patients were
manually reviewed by two independent clinicians (N=525 variables).
Variables were excluded if they were redundant, free text, or would not
be available in the preoperative setting. Variables with more than 20%
missing data were also excluded. The distribution of data for each
remaining categorical variable were again manually reviewed and grouped
into clinically meaningful categories for each variable by two
independent physicians. This step decreases data sparsity by grouping
low incidence characteristics into fewer, clinically meaningful
categories. Missing continuous variable data were imputed using feature
median and missing categorical data were imputed with the feature mode.
Continuous variables were standardized to have a mean of zero and
standard deviation of one. Categorical variables were one-hot encoded
ensuring no linear dependencies between columns.