AI-Enabled ECG Interpretation Model:
There is much room for improvement in the current automated ECG interpretation.30,31 This change could likely translate to better outcomes as incorrect labels are associated with incorrect physician overreads.27,32,33 End-to-end DNN models to interpret ECG have recently shown great promise to replace the ‘feature-based’ computer algorithms currently used.
In one study, a DNN model was trained on a dataset of 91,232 ECGs to detect 12 rhythm classes from a single lead, patch based ambulatory monitor.6 Results showed an AUC>0.91 and a superior model performance compared to separate annotations made by 6 cardiologists. AUC of the model for AF was as high as 0.96 but the small testing dataset of only 328 ECGs limits the reliability of results for any individual class 34. For external validation, the model was trained and tested using 2017 Physionet Challenge data with a relatively larger testing dataset (performance for AF diagnosis- F1 score 0.84). However, the testing dataset was not randomly selected, and more rare diagnoses were purposefully included which makes the results less generalizable.6
In another proof-of-concept study, a DNN model was trained using over 2 million ECGs to detect 6 abnormalities from 12 SL ECG.7 However, individual diagnostic accuracy for the 6 abnormalities selected is hard to comment on due to a smaller testing dataset used to assess several distinct diagnoses. For instance, testing dataset included only 13 AF cases.7
PhysioNet/computing in cardiology challenge 2017,35allowed an opportunity for external validation of algorithms from 75 teams which were trained using a common training dataset of 8,528 single lead ECGs. These then competed head-to-head on a hidden dataset to diagnose AF amongst the 4 labeled outputs (normal, AF, other, noisy). Winning algorithms (F1 score 0.83) varied from hand featured models using random forest, extreme gradient boosting to CNN and recurrent neural networks. However, the authors concluded that training set was not sufficient to allow an advantage for more complex algorithms that require enormous data for parameter and hyperparameter tuning. Furthermore, although training set was relatively larger (3,658 ECGs with 311 AF cases), only 27.3% of it was used to rank the algorithms.35
A preliminary study demonstrated that triaging patients in emergency care setting from interpretation of 12 SL ECG using a CNN.36 The results were also compared to their conventional ECG algorithm currently in use. Output was mapped to 16 pre-specified groups to provide actionable information. Results showed slightly better performance than the traditional interpretation but was statistically significant. One of the limitations was again a small testing dataset which included few emergency readings (total 60).36
Our group has developed a DNN model to make a comprehensive 12-lead ECG interpretation using 2.5 million standard 12 SL-ECGs from 720,000 patients.9,37 A ‘transformer model’ was also incorporated to translate the output into 66 discrete readable ECG diagnostic codes and make a multilabel prediction comparable to current computer automated programs. We previously showed the performance of this model according to all individual 66 codes included using a testing dataset of 499,917 ECGs.37 The overall performance was an AUC of ≥0.98 for 62 of the 66 reported codes. Recently, the performance was evaluated head-to-head against the traditional ECG interpretation software currently in use at our institute and the final cardiologist over-read diagnosis.9 Results showed an average ideal or acceptable diagnosis of 91.8% (AI-enabled interpretation) vs 86.6% (computer generated interpretation) vs 94% (final clinical diagnosis). In some ways these studies show the potential of DNN models to provide a level of ECG interpretation previously confined to the realms of field experts.