2.3 Outcome Measures
The automatic detection of laryngeal carcinoma was carried out using Faster R-CNN, which is one of the most popular two-stage detection networks used for many medical image detection situations. The architecture of Faster R-CNN generally consists of the region proposal network (RPN) and Fast R-CNN detector, as shown in Figure 1. RPN is a deep fully convolutional network using simultaneous object boundaries and object scores prediction at each detection for generating high-quality region proposals. It should be highlighted that RPNs are introduced to share convolutional layers with the Fast R-CNN detector to a more effective detection network’s computation (Figure 1).15 Fast R-CNN includes region of interest (RoI) pooling, which produces the fixed-size feature maps from non-uniform inputs by doing max-pooling on the inputs, aiming to refine the proposals and process the sketch map.20
Results
After the training procedure described above, the CNN-based classifier was used to detect laryngeal lesions. Comparisons were conducted of laryngoscopic images detected by the CNN-based classifier with that by the pathological results to determine the diagnostic potential of the classification system for the testing dataset.
Among 89 cases of the malignant group, the classifier was able to accurately evaluate the laryngeal carcinoma in 66 cases (74.16%, sensitivity). For the benign lesions, the classifier was able to accurately assess lesions in 503 cases of the 640 cases in the benign group (78.59%, specificity). Furthermore, the CNN-based classifier achieved an overall accuracy of 78.05% on the testing dataset. The results for the classifier are summarized in Table 2 and Table 3.
Discussion
For years, the laryngoscope has been recommended as an essential tool for assessing vocal fold lesions. However, laryngoscopes have limitations in making diagnoses owing to subtle differences in mucosal tissue. Distinguishing different mucosa can be difficult, especially for early-stage laryngeal carcinoma. Insufficient experience and excessive workload can also lead to missed diagnosis and misdiagnosis. Therefore, an assistive system for identifying laryngeal carcinoma is needed to improve and standardize clinicians’ diagnostic capacity.
The Faster R-CNN detection system has the potential to analyze the characteristics, extract the images’ commonalities, and then classify the data. In our study, a 78.05% accurate rate was achieved. Our results confirmed that the model we have established could offer valuable clinical assistance to preliminarily screen patients with laryngeal carcinoma and identify patients with benign lesions. This system would be able to help clinicians make more efficient and accurate diagnoses and enable standardized evaluation.
In this study, 2179 laryngoscope images taken were from different endoscopic systems between six hospitals. We did not impose special restrictions on equipment or images. Neither did we have restrictions on age or gender. Such processing maximized the natural diversity of different laryngoscopic appearances in various conditions to reflect whether AI technology could cope with complex situations in the real world.
Our results indicate that the automatic classifier has a sensitivity of 74.16%, a specificity of 78.59%, and 78.05% accuracy for detecting laryngeal carcinoma, which outperformed physicians in laryngeal carcinoma recognitions as demonstrated in several articles. In Ren’s study, twelve human experts only achieved an overall accuracy of 54% in malignancy detection in a set of 706 laryngoscopic testing images.19 Additionally, a recent meta-analysis showed that only 65% of the tested population was correctly identified as not having cancer by ENT doctors, even with videostroboscopy.8 Generally speaking, an automated system could help physicians make more confident determinations to exclude the malignant lesion, reducing the burden of endoscopists and patients’ waiting time, while it could raise red flags when a patient tests positive, reminding doctors to complete further clinical investigation to determine if the patient is truly positive.
However, two recent studies using the CNN-based technique obtained over 90% accuracy for malignancy lesions detection, which outperformed our classifier. A possible explanation for this might be that our laryngoscope images were from different endoscopic systems in several hospitals. The diverse laryngoscope equipment could attribute to complicated commonality extractions, leading to the limited quality of data classification. This situation would be a common problem for clinical application, which is an obstacle that needs to be overcome in future research. Additionally, increasing the training dataset size may be another approach to improving our classifier’s sensitivity and specificity.
Further research is necessary to include high-quality laryngoscope images with NBI from multiple centers. Since the sensitivity of NBI has been found to be superior to white light endoscopy,21the CNN-based classifier may provide more efficient and accurate information with NBI.
Conclusion
The fiber-optic laryngoscope has been a routine method to diagnose laryngeal lesions; however, physicians still have difficulty distinguishing early-stage cancer. In this study, an automatic classifier for laryngeal carcinoma based on Faster R-CNN was established for multiple clinics and multiple laryngoscopic systems. An accuracy of 78.05% was obtained. This diagnostic system could offer promising applicability to daily medical practice, which may improve clinical diagnosis quality and benefit patients.