南方科技大学知识苑(SUSTech KC): Predicting In-hospital Mortality in ICU Patients Based on GAN and Ensemble Methods

题名	Predicting In-hospital Mortality in ICU Patients Based on GAN and Ensemble Methods
其他题名	基于对抗神经网络和集成学习的急诊病人死亡率预测
姓名	魏明易
姓名拼音	WEI Mingyi
学号	12132906
学位类型	硕士
学位专业	0701 数学
学科门类/专业学位类别	07 理学
导师	杨丽丽
导师单位	统计与数据科学系
论文答辩日期	2023-05-07
论文提交日期	2023-06-28
学位授予单位	南方科技大学
学位授予地点	深圳
摘要	ICU patients are typically critically ill, and their clinical conditions can deteriorate rapidly. The accurate prediction of patient outcomes is therefore essential for clinical decision-making, resource allocation, and patient management. ICU survival prediction models aim to predict the probability of patient survival or mortality based on various clinical factors, including demographic data, vital signs, laboratory values, and clinical diagnoses. Traditional statistical methods, such as logistic regression, have been widely used for ICU survival prediction. However, these methods often have limitations in capturing complex nonlinear relationships between the clinical features and the outcome and their predictions are not very accurate. In this thesis, a prediction model based on ensemble learning is proposed for the ICU mortality prediction problem: MTX-stacking model. Firstly, in this thesis, the imbalanced data are processed based on the Modified generative adversarial network method to obtain the class of balanced data set. This approach is more explanatory and more effective than traditional data generation methods. Secondly, based on the patient data information, the XGBoost method is used to make prediction of the patient's status within 24h. Again, the above model is optimized using Bayesian ideas, and the optimized model is searched for in several iterations. Then, multiple optimal models are integrated using the stacking framework to obtain the final MTX-stacking model. Finally, the SHAP algorithm is used to explain the significance of variables to make the model more explanatory. We compare the MTX-stacking model with the commonly used, state-of-the-art model on two prediction sets. It shows that the MTX-stacking prediction model proposed in this thesis successfully predicts the survival status of patients and improves the prediction accuracy. By using the stacking framework, we verified the robustness and accuracy of the final model. The model proposed in this thesis can help the emergency room to effectively categorize the patients received and accurately identify the more critical patients, thus saving more lives and better allocating medical resources.
关键词	ICU Survival Prediction Ensemble Learning Generative Adversarial Nets Bayesian Optimization
语种	英语
培养类别	独立培养
入学年份	2021
学位授予年份	2023-06
参考文献列表	[1] DING Y, WANG Y, ZHOU D. Mortality prediction for ICU patients combining just-in-time learning and extreme learning machine[J]. Neurocomputing, 2018, 281: 12-19. [2] CHEN K, GAO C, ZHOU Q, et al. Predictors of in-hospital mortality for sepsis patients in intensive care units[J]. International Journal of Clinical and Experimental Medicine, 2016, 9(2): 4029-34. [3] THAO P T N, TRA T T, SON N T, et al. Reduction in the IL-6 level at 24 h after admission to the intensive care unit is a survival predictor for Vietnamese patients with sepsis and septic shock: a prospective study[J]. BMC Emergency Medicine, 2018, 18: 1-7. [4] ROS M M, VAN DER ZAAG-LOONEN H J, HOFHUIS J G, et al. Survival prediction inseverely ill patients study—the prediction of survival in critically ill patients by ICU physicians[J]. Critical Care Explorations, 2021, 3(1). [5] WEN W, YANG L, ZHANG X. Prognostic value of National Early Warning Scores combined with arterial lactate level in critical elderly ill patients[J]. Chinese Journal of Emergency Medicine, 2017: 441-445. [6] TAN L, XU Q, SHI R. A nomogram for predicting hospital mortality in intensive care unit patients with acute myocardial infarction[J]. International Journal of General Medicine, 2021:5863-5877. [7] ZIMMERMAN J E, KRAMER A A, MCNAIR D S, et al. Acute Physiology and Chronic Health Evaluation (APACHE) IV: hospital mortality assessment for today’s critically ill patients[J]. Critical Care Medicine, 2006, 34(5): 1297-1310. [8] LE GALL J R, LEMESHOW S, SAULNIER F. A new simplified acute physiology score (SAPS II) based on a European/North American multicenter study[J]. Jama, 1993, 270(24): 2957-2963. [9] LEMESHOW S, TERES D, KLAR J, et al. Mortality Probability Models (MPM II) based on an international cohort of intensive care unit patients[J]. Jama, 1993, 270(20): 2478-2486. [10] HUSSAIN A J, FERGUS P, AL-ASKAR H, et al. Dynamic neural network architecture inspired by the immune algorithm to predict preterm deliveries in pregnant women[J]. Neurocomputing, 2015, 151: 963-974. [11] KIM K J, CHO S B. Prediction of colon cancer using an evolutionary neural network[J]. Neurocomputing, 2004, 61: 361-379. [12] CHEN P, YUAN L, HE Y, et al. An improved SVM classifier based on double chains quantum genetic algorithm and its application in analogue circuit diagnosis[J]. Neurocomputing, 2016, 211: 202-211. [13] AZAR A T, EL-SAID S A. Performance analysis of support vector machines classifiers in breast cancer mammography recognition[J]. Neural Computing and Applications, 2014, 24(5): 1163-1177 [14] AZAR A T, EL-METWALLY S M. Decision tree classifiers for automated medical diagnosis[J]. Neural Computing and Applications, 2013, 23(7): 2387-2403. [15] RYYNÄNEN O P, SOINI E J, LINDQVIST A, et al. Bayesian predictors of very poor health related quality of life and mortality in patients with COPD[J]. BMC Medical Informatics and Decision Making, 2013, 13(1): 1-10. [16] CUI Z, WANG Y, GAO X, et al. Multispectral image classification based on improved weighted MRF Bayesian[J]. Neurocomputing, 2016, 212: 75-87. [17] ACZON M, LEDBETTER D, HO L, et al. Dynamic mortality risk predictions in pediatric critical care using recurrent neural networks[A]. 2017. [18] ALVES T, LAENDER A, VELOSO A, et al. Dynamic prediction of ICU mortality risk using domain adaptation[C]//2018 IEEE International Conference on Big Data (Big Data). 2018: 1328-1336. [19] LAST M, TOSAS O, CASSARINO T G, et al. Evolving classification of intensive care patients from event data[J]. Artificial Intelligence in Medicine, 2016, 69: 22-32. [20] KLANN J G, SZOLOVITS P, DOWNS S M, et al. Decision support from local data: creating adaptive order menus from past clinician behavior[J]. Journal of Biomedical Informatics, 2014, 48: 84-93. [21] ZHANG Y, SZOLOVITS P. Patient-specific learning in real time for adaptive monitoring in critical care[J]. Journal of Biomedical Informatics, 2008, 41(3): 452-460. [22] ENRIGHT C G, MADDEN M G. Modelling and monitoring the individual patient in real time[M]//Foundations of Biomedical Knowledge Representation. Springer, 2015: 107-136. [23] KASABOV N, HU Y. Integrated optimisation method for personalised modelling and case studies for medical decision support[J]. International Journal of Functional Informatics and Personalised Medicine, 2010, 3(3): 236-256. [24] LI X, WANG Y. Adaptive online monitoring for ICU patients by combining just-in-time learning and principal component analysis[J]. Journal of Clinical Monitoring and Computing, 2016, 30(6): 807-820. [25] GUO C, LIU M, LU M. A Dynamic Ensemble Learning Algorithm based on K-means for ICU mortality prediction[J]. Applied Soft Computing, 2021, 103: 107166. [26] EL-RASHIDY N, EL-SAPPAGH S, ABUHMED T, et al. Intensive Care Unit Mortality Prediction: An Improved Patient-Specific Stacking Ensemble Model[J]. IEEE Access, 2020, 8: 133541-133564. [27] REN N, ZHAO X, ZHANG X, et al. Mortality prediction in ICU Using a Stacked Ensemble Model[J]. Computational and Mathematical Methods in Medicine, 2022, 2022. [28] WILSON D L. Asymptotic properties of nearest neighbor rules using edited data[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1972(3): 408-421. [29] LAURIKKALA J. Improving identification of difficult small classes by balancing class distribution[C]//Conference on artificial intelligence in medicine in Europe. Springer, 2001: 63-66. [30] CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16: 321-357 [31] HAN H, WANG W Y, MAO B H. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning[C]//International Conference on Intelligent Computing. Springer, 2005: 878-887. [32] HE H, BAI Y, GARCIA E A, et al. ADASYN: Adaptive synthetic sampling approach for imbalanced learning[C]//2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). IEEE, 2008: 1322-1328. [33] BATISTA G E, BAZZAN A L, MONARD M C, et al. Balancing Training Data for Automated Annotation of Keywords: a Case Study.[C]//WOB. 2003: 10-18. [34] BATISTA G E, PRATI R C, MONARD M C. A study of the behavior of several methods for balancing machine learning training data[J]. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 20-29. [35] FAN W, STOLFO S J, ZHANG J, et al. AdaCost: misclassification cost-sensitive boosting[C]//International Conference on Machine Learning: volume 99. 1999: 97-105. [36] TING K M. A comparative study of cost-sensitive boosting algorithms[C]//In Proceedings of the 17th International Conference on Machine Learning. Citeseer, 2000. [37] BREIMAN L. Bagging predictors Machine Learning 24 (2), 123-140 (1996) 10.1023[J]. Machine Learning, 1996. [38] GANAIE M A, HU M, et al. Ensemble deep learning: A review[A]. 2021. [39] MOHSENI S, ZAREI N, RAGAN E D. A Multidisciplinary Survey and Framework for Design and Evaluation of Explainable AI Systems[J]. ACM Transactions on Interactive Intelligent Systems, 2018, 11: 24:1-24:45. [40] MOLNAR C, CASALICCHIO G, BISCHL B. Interpretable Machine Learning – A Brief History, State-of-the-Art and Challenges[M]//ECML PKDD 2020 Workshops. Springer International Publishing, 2020: 417-431. [41] CHAKRABORTY S, TOMSETT R J, RAGHAVENDRA R, et al. Interpretability of deeplearning models: A survey of results[J]. 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), 2017: 1-6. [42] CARVALHO D V, PEREIRA E M, CARDOSO J S. Machine Learning Interpretability: A Survey on Methods and Metrics[J]. Electronics, 2019. [43] DU M, LIU N, HU X. Techniques for interpretable machine learning[J]. Communications of the ACM, 2018, 63: 68 - 77. [44] GUIDOTTI R, MONREALE A, TURINI F, et al. A Survey of Methods for Explaining Black Box Models[J]. ACM Computing Surveys (CSUR), 2018, 51: 1 - 42. [45] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative Adversarial Nets[C]//Neural Information Processing Systems. 2014. [46] CHEN T, GUESTRIN C. Xgboost: A scalable tree boosting system[C]//Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining. 2016: 785-794. [47] CURLEY C, KRAUSE R M, FEIOCK R, et al. Dealing with missing data: A comparative exploration of approaches using the integrated city sustainability database[J]. Urban affairs review, 2019, 55(2): 591-615. [48] BRADLEY A P. The use of the area under the ROC curve in the evaluation of machine learning algorithms[J]. Pattern Recognition, 1997, 30(7): 1145-1159. [49] LUNDBERG S M, LEE S I. A unified approach to interpreting model predictions[J]. Advances in Neural Information Processing Systems, 2017, 30.
所在学位评定分委会	数学
国内图书分类号	O212.2
来源库	人工提交
成果类型	学位论文
条目标识符	http://sustech.caswiz.com/handle/2SGJ60CL/544486
专题	理学院_统计与数据科学系
推荐引用方式 GB/T 7714	Wei MY. Predicting In-hospital Mortality in ICU Patients Based on GAN and Ensemble Methods[D]. 深圳. 南方科技大学,2023.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可	操作
12132906-魏明易-统计与数据科学（1821KB）	学位论文	--	限制开放	CC BY-NC-SA	请求全文