AI-Enabled ECG Interpretation Model:
There is much room for improvement in the current automated ECG
interpretation.30,31 This change could likely
translate to better outcomes as incorrect labels are associated with
incorrect physician overreads.27,32,33 End-to-end DNN
models to interpret ECG have recently shown great promise to replace the
‘feature-based’ computer algorithms currently used.
In one study, a DNN model was trained on a dataset of 91,232 ECGs to
detect 12 rhythm classes from a single lead, patch based ambulatory
monitor.6 Results showed an AUC>0.91 and
a superior model performance compared to separate annotations made by 6
cardiologists. AUC of the model for AF was as high as 0.96 but the small
testing dataset of only 328 ECGs limits the reliability of results for
any individual class 34. For external validation, the
model was trained and tested using 2017 Physionet Challenge data with a
relatively larger testing dataset (performance for AF diagnosis- F1
score 0.84). However, the testing dataset was not randomly selected, and
more rare diagnoses were purposefully included which makes the results
less generalizable.6
In another proof-of-concept study, a DNN model was trained using over 2
million ECGs to detect 6 abnormalities from 12 SL
ECG.7 However, individual diagnostic accuracy for the
6 abnormalities selected is hard to comment on due to a smaller testing
dataset used to assess several distinct diagnoses. For instance, testing
dataset included only 13 AF cases.7
PhysioNet/computing in cardiology challenge 2017,35allowed an opportunity for external validation of algorithms from 75
teams which were trained using a common training dataset of 8,528 single
lead ECGs. These then competed head-to-head on a hidden dataset to
diagnose AF amongst the 4 labeled outputs (normal, AF, other, noisy).
Winning algorithms (F1 score 0.83) varied from hand featured models
using random forest, extreme gradient boosting to CNN and recurrent
neural networks. However, the authors concluded that training set was
not sufficient to allow an advantage for more complex algorithms that
require enormous data for parameter and hyperparameter tuning.
Furthermore, although training set was relatively larger (3,658 ECGs
with 311 AF cases), only 27.3% of it was used to rank the
algorithms.35
A preliminary study demonstrated that triaging patients in emergency
care setting from interpretation of 12 SL ECG using a
CNN.36 The results were also compared to their
conventional ECG algorithm currently in use. Output was mapped to 16
pre-specified groups to provide actionable information. Results showed
slightly better performance than the traditional interpretation but was
statistically significant. One of the limitations was again a small
testing dataset which included few emergency readings (total
60).36
Our group has developed a DNN model to make a comprehensive 12-lead ECG
interpretation using 2.5 million standard 12 SL-ECGs from 720,000
patients.9,37 A ‘transformer model’ was also
incorporated to translate the output into 66 discrete readable ECG
diagnostic codes and make a multilabel prediction comparable to current
computer automated programs. We previously showed the performance of
this model according to all individual 66 codes included using a testing
dataset of 499,917 ECGs.37 The overall performance was
an AUC of ≥0.98 for 62 of the 66 reported codes. Recently, the
performance was evaluated head-to-head against the traditional ECG
interpretation software currently in use at our institute and the final
cardiologist over-read diagnosis.9 Results showed an
average ideal or acceptable diagnosis of 91.8% (AI-enabled
interpretation) vs 86.6% (computer generated interpretation) vs 94%
(final clinical diagnosis). In some ways these studies show the
potential of DNN models to provide a level of ECG interpretation
previously confined to the realms of field experts.