KLI

검색

Ulsan Univ. Repository Thesis General Graduate School Medicine 1. Theses (Master)

단일 기관 레지스트리를 활용한 머신 러닝 기반 두경부암 환자의 생존 분석

Metadata Downloads

Alternative Title: Machine Learning-based Survival Analysis of Head and Neck Cancer Patients from a Single-Institution Cancer Registry

Abstract: Background Accurate prognosis estimation for patients with head and neck (H&N) cancer is crucial for clinical decision-making. The aim of this study was to identify an effective machine learning (ML) model for predicting the 5-year survival of H&N cancer patients.

Methods We reviewed the records of 3,019 patients from the Asan Medical Center H&N Cancer registry collected between 2007-17. The feature set used to compare the performance of various ML algorithms comprised demographic characteristics, past and social history, primary site clinical and histopathologic attributes, and treatment modalities. We applied a total of five ML models to the dataset to classify H&N cancers based on their 5-year survival status. These models included two recently developed gradient-boosting models (XGBoost and LightGBM) and three commonly used tree-based models, Random Forest (RF), Support Vector Machine (SVM), and Naïve Bayes (NB). We implemented 10-fold cross validation to measure model performance.

Results After exclusion, the final study population comprised 1,287 patients. Of the five models, LightGBM showed the best performance. We evaluated model performance using four metrics: sensitivity, accuracy, F1 score, and AUC-ROC. The average model performance scores were as follows: SVM, 0.745; RF, 0.823; NB, 0.686; XGBoost, 0.827; and LightGBM, 0.839. The SHapley Additive exPlanations (SHAP) values were then calculated for the LightGBM model, and these results indicated that the top six (of 16) most important features were, in descending order: age, body mass index (BMI), Primary site, Clinical N stage, Overall stage, and Clinical T stage.

Conclusion In this study, we found that features associated with staging were the most important for predicting H&N cancer survivability. The LightGBM model, combined with an appropriate dataset, can be used to construct an accurate prognostic model for patients diagnosed with H&N cancer. This can be applied in clinical practices in the future.|배경
두경부암 환자에 대한 정확한 예후 예측은 임상에서 매우 중요하다. 이 연구의 목적은 두경부암 환자의 생존 예측의 기계 학습 (머신러닝) 알고리즘의 유용성을 입증하고 효과적인 모델을 찾아내는 것이다.

방법
2007년부터 2017년까지 수집된 서울아산병원 두경부암 레지스트리에 등록된 3019명의 환자를 대상으로 연구를 진행하였다. 다양한 기계 학습 알고리즘의 성능을 비교하기 위해 사용된 데이터 세트에는 인구 통계학적 특성, 과거 및 사회적 특성, 주요 장소의 임상 및 조직 병리학적 특성 및 치료 방법이 포함되었다. 우리는 두경부암을 5년 생존 상태를 기준으로 분류하기 위해 데이터세에 총 다섯 가지 기계 학습 모델을 적용하였다. 이 모델들은 최근 개발된 두 가지 그래디언트 부스팅 모델 (XGBoost 및 LightGBM), 일반적으로 사용되는 트리 기반 모델인 Random Forest (RF), Support Vector Machines (SVM) 및 Naive Bayes (NB)를 포함 한다. 모델 성능을 평가하기 위하여 10겹-교차 검증을 실시하였다.

결과
본 연구는 최종적으로 선정된 1,287명의 환자로 구성되었다. 다섯 모델 중 LightGBM이 가장 뛰어난 성능을 보였다. 모델 성능을 평가하기 위해 민감도, 정확도, F1 점수, 및 AUC-ROC 네 가지 지표를 사용하였다. 모델 성능의 평균 점수는 다음과 같았다: SVM - 0.745, RF - 0.823, NB - 0.686, XGBoost - 0.827, 및 LightGBM - 0.839.LightGBM 모델에서는 SHapley Additive exPlanations (SHAP) 값이 계산되었고, 결과에서 16개의 특징 중 중요도 측면에서 상위 다섯 가지는 다음과 같이 내림차순으로 나타났다: 나이,체질량 지수 (BMI), 원발 부위, N-스테이지, 병기, T-스테이지

결론
본 연구에서는 나이, 체질량 지수 및 스테이지에 관련된 특성이 두경부암의 생존 가능성을 예측하는 데 가장 중요하다는 것을 발견하였다. 또한 LightGBM 모델은 적절한 데이터셋과 결합하여 두경부 암으로 진단된 환자들을 위한 정확한 예후 모델을 구축하는 데 사용될 수 있었다. 향후 기계 학습 모델들을 임상에 적용될 수 있을거라 기대 한다.