의료영상을 이용한 인공지능 분류 모델 성능을 향상시키기 위한 라벨 노이즈 및 영상 크기에 관한 연구
- Deep learning, a cutting-edge paradigm for machine learning, had accelerated development of medical artificial intelligence on imaging modalities. Today, many studies on various imaging modalities are based on deep learning algorithms. Among deep learning algorithms, convolutional neural network (CNN) is major tool for studying images, videos.
Medicine is distinguished domain to apply deep learning methods. Medical images are different from common images, as they are composed into digital imaging and communication in medicine (DICOM) format. Common images are based on 8-bit image format, such as portable network graphics (PNG) or joint points expert group (JPEG), while medical DICOM images are based on bits same or higher than 8-bit, for example 12-bit or 16-bit. Furthermore, their unique acquisition protocols, imaging contrast mechanisms are different from those of natural images. Furthermore, natural images contain objects usually in area nearby center of image, in contrast in DICOM images the region of interest (ROI) can be in any spot, any size. For example, lung nodule can locate in upper area of lung in chest X-ray (CXR), lower area, middle area, that is, literally anywhere. Also, it can have sharp margin, speculated margin, or vague margin as well. Therefore, deep learning training strategy may, or should be different from that of natural images. In this study, we contemplated how to train medical artificial intelligence efficiently, in the perspective of robust learning and image size in CXR.
There are enormous factors that have effect on model performance. From accuracy of label or matrix size, model selection, to dataset size, every factor determines model performance. However, in this paper, we only experimented label noise and matrix size, which are considered to be most basic factors when constructing dataset and feeding image data to network.
In the perspective of robust learning, it is common sense for artificial intelligence researchers to acquire clean and accurate labels. In many fields, there is even a proverb, “garbage in, garbage out”, abbreviated as GIGO. Therefore, we investigated how accuracy of deep learning model depends on the degree of dataset distillation. We have collected CT-confirmed CXR datasets and the interval of CT image and its corresponding CXR image is within 7 days. As CXR images are CT-confirmed, we can consider CXR labels are highly credible. To analyze effect of accurateness of labels, we have randomly converted label with given ratios. That is, we have randomly converted labels from normal to abnormal, and abnormal to normal, with 0%, 1%, 2%, 4%, 8%, 16%, 32% and analyzed area under the receiver operating characteristic (AUROC). There was statistically significant difference between 0% of our collected dataset from 2% noise rate to 32% noise rate. This means CNN model is highly sensitive to label noise. Furthermore, we had experimented the same setting on public dataset, from national institute of health (NIH) and Stanford CheXpert dataset, and the result showed these public datasets endured label noise up to 16%. This result has to possible interpretations: (1) CNN is sensitive to label noise and public datasets endure label noise because they contain label noise to some extent. (2) CNN itself is robust to label noise, yet for some reason, CNN model on our dataset seems to be sensitive to label noise. To distinguish these two possibilities, we randomly selected images from each public dataset and one radiologist with more than 10-years experiences visually confirmed whether images are correctly labeled or not. The result of visual scoring said that there was around 20~30% incorrect labels. Therefore, we could conclude that possibility (1) is correct.
For the matrix size of medical artificial intelligence, to investigate the optimal input matrix size for deep learning-based computer-aided diagnosis (CAD) of nodules and masses on chest radiographs. Detection model and classification models were experimented to find out optimal matrix size, with various matrix sizes (256, 448, 896, 1344, 1792)
We had experienced two networks for detection, and one network for classification. In detection networks, matrix size was proved to be optimal with size 896 and 1344, and 896 in two models, respectively. In classification network, matrix size was proved to be optimal with size 896. Thus, we can conclude that matrix size around 1000 is optimal for training medical image data. This is coherent to the fact that many deep learning studies are based on matrix size of around 1024.
To summarize, in this paper we analyzed two factors to increase model performance in medical artificial intelligence on imaging modalities. First is the label noise, which had conclusion that the more accurate dataset, the higher performance. Second is the matrix size, which had conclusion that matrix size around 1000 is best for detection and classification tasks.
- Issued Date
- Awarded Date
- Authorize & License
- Files in This Item:
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.