心脏病例分类预测模型的研究
Research on the Classification and Prediction Model of Heart Disease
摘要: 心脏病对人体的危害极大,甚至会危及人们的生命。相比于医院检测,使用机器学习方法预测心脏病,可以节约大量的时间。本文以Kaggle心脏病数据集中的1025条真实心脏病数据为例,分析了引起心脏病的相关因素,并构建了K近邻、决策树、随机森林、逻辑回归四种不同的分类算法模型,对心脏病进行预测。以混淆矩阵、准确率、召回率、精确率、ROC曲线和AUC值作为模型的评价指标,发现K近邻和随机森林的预测效果更好,从而为心脏病预测和诊断提供了有效的科学依据。
Abstract: Heart disease poses great harm to the human body, even endangering people’s lives. Compared to hospital testing, using machine learning methods to predict heart disease can save a lot of time. This article takes 1025 real heart disease data in the Kaggle heart disease dataset as examples to analyze the relevant factors that cause heart disease, and constructs four different classification algorithm models: K-nearest neighbor, decision tree, random forest, and logistic regression to predict heart disease. Using confusion matrix, accuracy, recall, precision, ROC curve, and AUC value as evaluation indicators for the model, it was found that K-nearest neighbor and random forest had better prediction performance, providing an effective scientific basis for heart disease prediction and diagnosis.
参考文献
|
[1]
|
Kukar, M., Kononenko, I., Grošelj, C., Kralj, K. and Fettich, J. (1999) Analysing and Improving the Diagnosis of Ischaemic Heart Disease with Machine Learning. Artificial Intelligence in Medicine, 16, 25-50. [Google Scholar] [CrossRef] [PubMed]
|
|
[2]
|
Subbalakshmi, G., Ramesh, K. and Rao, M.C. (2011) Decision Support in Heart Disease Prediction System Using Naive Bayes. Indian Journal of Computer Science and Engineering, 2, 170-176.
|
|
[3]
|
Dimopoulos, A.C., Nikolaidou, M., Caballero, F.F., Engchuan, W., Sanchez-Niubo, A., Arndt, H., et al. (2018) Machine Learning Methodologies versus Cardiovascular Risk Scores, in Predicting Disease Risk. BMC Medical Research Methodology, 18, Article No. 179. [Google Scholar] [CrossRef] [PubMed]
|
|
[4]
|
刘宇, 乔木. 基于聚类和XGboost算法的心脏病预测[J]. 计算机系统应用, 2019, 28(1): 228-232.
|
|
[5]
|
周孟然, 周悦尘, 闫鹏程, 等. AABC算法优化ELM的心脏病辅助诊断[J]. 计算机工程与设计, 2020, 41(5): 1439-1444.
|
|
[6]
|
叶苏婷, 潘媛媛, 毕迎春. 基于决策树算法的心脏病发病预警模型研究[J]. 电脑知识与技术, 2020, 16(19): 187-189.
|
|
[7]
|
陈茜茜. 中年人心血管疾病预测研究[D]: [硕士学位论文]. 贵阳: 贵州财经大学, 2021.
|
|
[8]
|
Cover, T. and Hart, P. (1967) Nearest Neighbor Pattern Classification. IEEE Transactions on Information Theory, 13, 21-27. [Google Scholar] [CrossRef]
|