KLI

검색

Ulsan Univ. Repository Thesis General Graduate School Medicine 2. Theses (Ph.D)

학습된 심층신경망의 피부 종양과 조갑 백선에 대한 진단 능력 및 수련의의 진단 능력 향상 기여 평가

Metadata Downloads

Abstract: Background: Although deep neural networks have shown promising results in the diagnosis of skin cancer and onychomycosis, a prospective evaluation in a real-world setting could confirm these results. This study aimed to evaluate whether an algorithm (http://b2019.modelderm.com, http://nail.modelderm.com) improves the accuracy of non-dermatologists in diagnosing skin neoplasms and onychomycosis.

Methods: A prospective observational study was performed in patients presenting with dystrophic features in the toenails. Five board-certified dermatologists determined a diagnosis of onychomycosis using the clinical photographs. The diagnosis was also made using the algorithm and dermoscopic examination to evaluate the diagnostic abilities of a deep neural network (http://nail.modelderm.com) for onychomycosis. For skin neoplasms, random series cases with skin neoplasms suspected of malignancy by either physicians or patients were recruited in two tertiary care centers located in South Korea. An artificial intelligence (AI) group was diagnosed via routine examination with photographic review and assistance by the algorithm, whereas the control group was diagnosed only via routine examination with a photographic review. The accuracy of the non-dermatologists before and after the interventions was compared. A randomized trial (KCT0005614) was also conducted to validate whether artificial intelligence (AI) could augment the accuracy of non-expert physicians in the real-world setting which included diverse out-of-distribution conditions. Intern doctors and dermatology residents examined the randomly allocated patients with suspicious skin lesions with or without the real-time assistance of AI algorithm (https://b2020.modelderm.com#world). We compared the change in accuracy, sensitivity, and specificity before and after the assistance of the algorithm, to confirm the performance of augmented intelligence.

Results: In onychomycosis study, a total of 90 patients (mean age, 55.3; male, 43.3%) assessed between September 2018 and July 2019 were included. The detection of onychomycosis using the algorithm (AUC, 0.751; 95% CI, 0.646–0.856) and that by dermoscopy (AUC, 0.755; 95% CI, 0.654–0.855) were seen to be comparable (Delong’s test; p = 0.952). The sensitivity and specificity of the algorithm at the operating point were 70.2% and 72.7%, respectively. The sensitivity and specificity of diagnosis by the five dermatologists were 73.0% and 49.7%, respectively. The Youden index of the algorithm (0.429) was also comparable to that of the dermatologists’ diagnosis (0.230 ± 0.176; Wilcoxon rank-sum test; p = 0.667).
For skin neoplasms, among the AI group, the accuracy of the first impression (Top-1 accuracy; 58.3%) after the assistance of AI was higher than that before the assistance (46.5%, p = 0.008). The number of differential diagnoses of the participants increased from 1.9 ± 0.5 to 2.2 ± 0.6 after the assistance (p < 0.001). In the control group, the difference in the Top-1 accuracy between before and after reviewing photographs was not significant (before, 46.1%; after, 51.8%; p = 0.19), and the number of differential diagnoses did not significantly increase (before, 2.0 ± 0.4; after, 2.1 ± 0.5; p = 0.57).
In randomized controlled study, using 576 consecutive cases with suspicious lesions, the accuracy of the AI group (n = 295, 52.5%) was significantly higher than those of the Unaided (n=281, 43.4%; p = 0.035). The augmentation was more significant from 53.3% (n = 150) to 29.7% (n=138; p < 0.0001) in the intern doctors who had the least experience in dermatology, whereas the augmentation was minimal in dermatology residents. The algorithm could help the trainees in the AI group consider more differential diagnoses than the Unaided (2.09 versus 1.95; p = 0.0005).

Conclusion: As a standalone method, the algorithm analyzed photographs taken by non-physician and showed comparable accuracy for the diagnosis of onychomycosis to that made by experienced dermatologists and by dermoscopic examination. For the diagnosis of skin neoplasms, AI augmented the diagnostic accuracy of trainee doctors in real-world settings. This result was also confirmed in a single-center, unmasked, paralleled, randomized controlled trial.
|배경: 학습된 심층 신경망이 피부종양 및 조갑 백선의 진단에 유망한 결과를 보여주었지만, 실제 임상에서 심층 신경망의 진단 결과가 얼마나 정확한지에 대한 평가가 필요하다. 본 연구는 심층 신경망 알고리즘(http://b2019.modelderm.com, http://nail.modelderm.com)이 피부 질환(피부 종양 및 조갑 백선 의심 병변)의 진단 능력(민감도, 특이도) 및 수련의(인턴 의사와 피부과 전공의)의 진단 능력을 향상 시키는 데에 도움되는 정도를 전향적, 비교적 연구를 통해 평가하는 것을 목적으로 하였다.

재료 및 방법: 조갑 백선의 진단 평가를 위해 2018년 9월부터 2019년 7월까지 발톱의 변형이 동반된 환자를 대상으로 서울아산병원에서 전향적 관찰 연구가 수행되었다. 5명의 피부과 전문의가 임상 사진으로 조갑 백선의 진단을 결정하였다. 비교를 위해 심층 신경망 알고리즘(http://nail.modelderm.com)과 피부 확대경 검사를 이용한 진단도 이루어졌다. 한편, 피부 종양 진단 평가를 위해 의사 또는 환자가 악성으로 의심하는 피부 병변이 있는 환자를 무작위 전향적 시리즈로 국내의 두 개의 3차 의료기관(서울아산병원, 분당서울대학교 병원)에서 2020년 2월부터 2020년 11월까지 모집하였다. 인공 지능(AI) 그룹은 1차 진단 이후 임상 사진의 리뷰와 알고리즘의 진단 결과를 참고하여 2차 진단을 시행하였다. 대조군 그룹은 임상 사진의 리뷰만으로 2차 진단을 시행하였다. 인공지능 알고리즘의 중재 전후로 수련의의 진단 정확도를 비교 평가하였다. 확인된 결과를 바탕으로, 무작위 배정 및 다양한 분포 외 조건(out of distribution)을 포함하는 실제 환경에서의 인공 지능(AI)의 진단 증강 능력을 검증하기 위해 서울아산병원에서 2020년 11월부터 2021년 9월까지 무작위 대조 시험(KCT0005614)을 수행하였다. 수련의가 실시간으로 인공지능 알고리즘(https://b2020.modelderm.com#world)의 도움을 받거나, 받지 않는 방식으로 피부 종양 의심 환자를 무작위 배정해 진단하였다. 마찬가지로 인공지능 알고리즘의 결과 참고 전후의 정확도, 민감도, 특이도의 변화를 비교하였다.

결과: 조갑 백선 연구에서는 총 90명의 환자(평균 연령, 55.3세; 남성, 43.3%)가 포함되었다. 알고리즘(AUC, 0.751; 95% CI, 0.646–0.856)을 사용한 조갑 백선의 진단과 피부 확대경 검사(AUC, 0.755; 95% CI, 0.654–0.855)를 사용한 진단은 유사한 진단 검출 능력을 보였다(p = 0.952). ROC(Receiver Operating Characteristic) curve의 operating point에서 알고리즘의 민감도와 특이도는 각각 70.2%와 72.7%였다. 5명의 피부과 전문의의 진단 민감도와 특이도는 각각 73.0%와 49.7%였다. 알고리즘의 Youden 지수(0.429)도 피부과 전문의의 진단 지수(0.230±0.176; p = 0.667)와 유사했다.
피부 종양의 진단 연구의 경우, AI 그룹에서 최우선 진단의 정확도(Top-1 정확도, 58.3%)가 AI 결과 참고 전보다 높았다(46.5%, p = 0.008). 참고 후 수련의의 감별 진단 가짓수는 1.9 ± 0.5개에서 2.2 ± 0.6개로 증가하였다(p < 0.001). 대조군에서는 임상 사진 리뷰 전후의 Top-1 진단 정확도의 차이가 유의하지 않았고(전 46.1%, 후 51.8%, p = 0.19), 감별 진단 가짓수도 크게 증가하지 않았다(전 2.0 ± 0.4, 후 2.1 ± 0.5, p = 0.57). 576개의 연속 사례를 포함한 무작위 대조 연구에서는 AI 그룹의 진단 정확도(n = 295, 52.5%)가 AI 결과를 참고하지 않은 그룹의 정확도(n = 281, 43.4%, p = 0.035)보다 유의하게 높았다. 피부과 진단 경험이 가장 적은 인턴 의사의 경우 진단 능력의 증강이 53.3%(n = 150)에서 29.7%(n = 138, p < 0.0001)로 더 유의미한 반면, 피부과 전공의에서는 증강 정도가 유의하지 않았다. 또한 AI 그룹의 수련의가 참고 하지 않은 수련의에 비해 (2.09 대 1.95, p = 0.0005)로 보다 더 많은 감별 진단을 고려하는 것으로 나타났다.

결론: 조갑 백선의 진단에 있어 학습된 심층신경망은 임상 사진을 분석하여 숙련된 피부과 전문의와 피부 확대경 검사를 통한 진단에 필적하는 진단 정확도를 보였다. 피부 종양 진단을 위해 인공 지능은 실제 임상 환경에서 수련의의 진단 정확도를 높이고, 고려하는 감별 진단의 수를 유의미하게 증가시켰다. 이 결과는 단일 기관의 무작위 대조 시험에서도 확인되었다.