2.3 Outcome Measures
The automatic detection of laryngeal carcinoma was carried out using
Faster R-CNN, which is one of the most popular two-stage detection
networks used for many medical image detection situations. The
architecture of Faster R-CNN generally consists of the region proposal
network (RPN) and Fast R-CNN detector, as shown in Figure 1. RPN is a
deep fully convolutional network using simultaneous object boundaries
and object scores prediction at each detection for generating
high-quality region proposals. It should be highlighted that RPNs are
introduced to share convolutional layers with the Fast R-CNN detector to
a more effective detection network’s computation (Figure
1).15 Fast R-CNN includes region of interest (RoI)
pooling, which produces the fixed-size feature maps from non-uniform
inputs by doing max-pooling on the inputs, aiming to refine the
proposals and process the sketch map.20
Results
After the training procedure described above, the CNN-based classifier
was used to detect laryngeal lesions. Comparisons were conducted of
laryngoscopic images detected by the CNN-based classifier with that by
the pathological results to determine the diagnostic potential of the
classification system for the testing dataset.
Among 89 cases of the malignant group, the classifier was able to
accurately evaluate the laryngeal carcinoma in 66 cases (74.16%,
sensitivity). For the benign lesions, the classifier was able to
accurately assess lesions in 503 cases of the 640 cases in the benign
group (78.59%, specificity). Furthermore, the CNN-based classifier
achieved an overall accuracy of 78.05% on the testing dataset. The
results for the classifier are summarized in Table 2 and Table 3.
Discussion
For years, the laryngoscope has been recommended as an essential tool
for assessing vocal fold lesions. However, laryngoscopes have
limitations in making diagnoses owing to subtle differences in mucosal
tissue. Distinguishing different mucosa can be difficult, especially for
early-stage laryngeal carcinoma. Insufficient experience and excessive
workload can also lead to missed diagnosis and misdiagnosis. Therefore,
an assistive system for identifying laryngeal carcinoma is needed to
improve and standardize clinicians’ diagnostic capacity.
The Faster R-CNN detection system has the potential to analyze the
characteristics, extract the images’ commonalities, and then classify
the data. In our study, a 78.05% accurate rate was achieved. Our
results confirmed that the model we have established could offer
valuable clinical assistance to preliminarily screen patients with
laryngeal carcinoma and identify patients with benign lesions. This
system would be able to help clinicians make more efficient and accurate
diagnoses and enable standardized evaluation.
In this study, 2179 laryngoscope images taken were from different
endoscopic systems between six hospitals. We did not impose special
restrictions on equipment or images. Neither did we have restrictions on
age or gender. Such processing maximized the natural diversity of
different laryngoscopic appearances in various conditions to reflect
whether AI technology could cope with complex situations in the real
world.
Our results indicate that the automatic classifier has a sensitivity of
74.16%, a specificity of 78.59%, and 78.05% accuracy for detecting
laryngeal carcinoma, which outperformed physicians in laryngeal
carcinoma recognitions as demonstrated in several articles. In Ren’s
study, twelve human experts only achieved an overall accuracy of 54% in
malignancy detection in a set of 706 laryngoscopic testing
images.19 Additionally, a recent meta-analysis showed
that only 65% of the tested population was correctly identified as not
having cancer by ENT doctors, even with
videostroboscopy.8 Generally speaking, an automated
system could help physicians make more confident determinations to
exclude the malignant lesion, reducing the burden of endoscopists and
patients’ waiting time, while it could raise red flags when a patient
tests positive, reminding doctors to complete further clinical
investigation to determine if the patient is truly positive.
However, two recent studies using the CNN-based technique obtained over
90% accuracy for malignancy lesions detection, which outperformed our
classifier. A possible explanation for this might be that our
laryngoscope images were from different endoscopic systems in several
hospitals. The diverse laryngoscope equipment could attribute to
complicated commonality extractions, leading to the limited quality of
data classification. This situation would be a common problem for
clinical application, which is an obstacle that needs to be overcome in
future research. Additionally, increasing the training dataset size may
be another approach to improving our classifier’s sensitivity and
specificity.
Further research is necessary to include high-quality laryngoscope
images with NBI from multiple centers. Since the sensitivity of NBI has
been found to be superior to white light endoscopy,21the CNN-based classifier may provide more efficient and accurate
information with NBI.
Conclusion
The fiber-optic laryngoscope has been a routine method to diagnose
laryngeal lesions; however, physicians still have difficulty
distinguishing early-stage cancer. In this study, an automatic
classifier for laryngeal carcinoma based on Faster R-CNN was established
for multiple clinics and multiple laryngoscopic systems. An accuracy of
78.05% was obtained. This diagnostic system could offer promising
applicability to daily medical practice, which may improve clinical
diagnosis quality and benefit patients.