Disease Prediction Model Development Using Unstructured data from EMR and Artificial Intelligence
- Alternative Title
- 비정형 전자의무기록과 인공지능을 활용한 질환 예측 모델 개발
- Abstract
- Background
Clinical research utilizing electronic medical records encompasses diverse forms of medical data. Moreover, owing to recent advancements in natural language processing technology, there is a burgeoning interest in investigating text data embedded within electronic medical records. This research contributes to real-world evidence (RWE) in authentic clinical settings and, when coupled with artificial intelligence technology, has the potential to make significant contributions to various domains, including disease prediction and medical decision support.
Objectives
First, we aim to demonstrate the benefit of early treatment by studying the association between achievement of early LDL-C goal and recurrence of MACE and healthcare resource utilization (HRU) in high-risk ASCVD patients through electronic health records. Second, we aim to leverage unstructured text data to develop a prediction model for disease progression in ICUS patients, identify associated risk factors, and provide clinical insights into patient management.
Methods
First, patients with cardiovascular disease were defined based on clinical evidence and then divided into two groups depending on whether they achieved the early LDL-C reduction goal. The results of the analysis regarding the risk ratio of recurrent major cardiovascular events (MACE) and the frequency of medical resource utilization were supported through statistical validation. Second, we utilized unstructured EMR text data to construct a dataset that captures the characteristics of Idiopathic Cytopenia of Undetermined Significance (ICUS) patients. Subsequently, we selected the optimal disease prediction model through performance comparison and conducted an analysis of relevant risk factors.
Results
We conducted an analysis of patients with cardiovascular disease, examining their medication history, test results, and medical records, and compared their characteristics with those who achieved the early LDL-C target. The results, including hazard ratios and cumulative incidence, clearly demonstrated the significant impact of early LDL-C goal attainment on reducing the recurrence rate of cardiovascular disease and a substantial decrease in healthcare resource utilization.
To predict disease progression in ICUS patients, we rigorously assessed three distinct models through 10-fold cross-validation. Furthermore, we integrated data from electronic medical records and evaluated the significance of clinical information embedded within textual data. Ultimately, the XGBoost (XGB) model, which incorporated text embedding data, exhibited the highest performance with an AUROC score of 0.817. Additionally, using Shapley values, we confirmed the meaningful contribution of textual data to the model's predictions.
Conclusions
First, we conducted a real-world analysis of disease prognosis and medical costs, aiming to provide valuable real-world evidence regarding the impact and benefits of early LDL-C reduction in Asian populations. This research can help inform treatment guidelines and support evidence-based medical decisions.
Second, we developed a machine learning model for predicting disease progression by leveraging unstructured clinical text data. By expanding our research to encompass diverse clinical information within electronic medical records, we aim to further contribute to supporting medical decisions and enhancing patient disease prognosis.
Keywords: electronic medical records, cardiovascular diseases, low-density lipoprotein cholesterol, major adverse cardiovascular events, artificial intelligence, machine learning, natural language processing
- Author(s)
- 한지예
- Issued Date
- 2024
- Awarded Date
- 2024-02
- Type
- Dissertation
- URI
- https://oak.ulsan.ac.kr/handle/2021.oak/12993
http://ulsan.dcollection.net/common/orgView/200000734869
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.