2.3 Model Architecture
Mask Region-based Convolutional Neural Network (Mask R-CNN)[15] is a 2-stage object detection method: the first stage, called Region Proposal Network (RPN), proposes candidate object bounding boxes before being followed by R-CNN and a semantic segmentation model (MASK). We used Mask R-CNN to perform the image segmentation and classification. First, the input image was rescaled to a size of 512×512px and input into the region proposal network with ResNet-50 as the backbone network. With several targeted regions being generated by the region proposal network, the network cropped the corresponding area of each Region of Interest (ROI) in the feature map. Then, we performed the RoIAlign[16] operation on the cropped area, input the aligned results into the Full Convolution Network (FCN)[17] segmentation sub-network and the classification sub-network respectively, and finally output the results (Figure 2 ).
2.4 Development of the model
We fine-tuned the model of Mask R-CNN implemented by Matterport, Inc (Mountain View, CA, USA)(https://github.com/matterport/Mask-RCNN) with a ResNet-50 backbone[18]. The DL model could learn the laryngoscopic images of vocal cord leukoplakia that contained the pre-labeled regions, and then detect the lesions autonomously.
Since the training of the model requires adequate images, data augmentation were used to expand the images in the training set to 25000+ images. The specific methods included: 1) horizontal flipping; 2) vertical flipping; 3) picture rotation (45°, 90°, 135°, 180°, 225°, 270°, 315°). These methods were implemented using the imgaug library (https://github.com/aleju/imgaug ).
The weights of the model were initialized using a pre-trained model, which was the same network as those trained with the Microsoft Common Objects in Context(MS-COCO) dataset[19]. The model was developed using Python 3.5 and a TensorFlow neural network framework. Training, validation, and testing were conducted by two Nvidia GeForce GTX 1080 GPUs with 8 GB of memory each and an Intel (R) i7-5930k 3.50 GHz CPU with a CentOS 7 operating system.