2.3 Model Architecture
Mask Region-based Convolutional
Neural Network (Mask R-CNN)[15] is a 2-stage
object detection method: the first stage, called Region Proposal Network
(RPN), proposes candidate object bounding boxes before being followed by
R-CNN and a semantic segmentation model (MASK). We used Mask R-CNN to
perform the image segmentation and classification. First, the input
image was rescaled to a size of 512×512px and input into the region
proposal network with ResNet-50 as the backbone network. With several
targeted regions being generated by the region proposal network, the
network cropped the corresponding area of each Region of Interest (ROI)
in the feature map. Then, we performed the
RoIAlign[16] operation on the cropped area, input
the aligned results into the Full Convolution Network
(FCN)[17] segmentation sub-network and the
classification sub-network respectively, and finally output the results
(Figure 2 ).
2.4 Development of the
model
We fine-tuned the model of Mask R-CNN implemented by Matterport, Inc
(Mountain View, CA, USA)(https://github.com/matterport/Mask-RCNN) with a
ResNet-50 backbone[18]. The DL model could learn
the laryngoscopic images of vocal cord leukoplakia that contained the
pre-labeled regions, and then detect the lesions autonomously.
Since the training of the model requires adequate images, data
augmentation were used to expand the images in the training set to
25000+ images. The specific methods included: 1) horizontal flipping; 2)
vertical flipping; 3) picture rotation (45°, 90°, 135°, 180°, 225°,
270°, 315°). These methods were implemented using the imgaug library
(https://github.com/aleju/imgaug ).
The weights of the model were initialized using a pre-trained model,
which was the same network as those trained with the Microsoft Common
Objects in Context(MS-COCO) dataset[19]. The model
was developed using Python 3.5 and a TensorFlow neural network
framework. Training, validation, and testing were conducted by two
Nvidia GeForce GTX 1080 GPUs with 8 GB of memory each and an Intel (R)
i7-5930k 3.50 GHz CPU with a CentOS 7 operating system.