基于机器学习识别中国社区心血管疾病人群
Identification of Cardiovascular Disease Populations in Chinese Communities Based on Machine Learning
DOI: 10.12677/acm.2025.151273, PDF,   
作者: 谈 炎, 许邦龙*:安徽医科大学第二附属医院心血管内科,安徽 合肥
关键词: 心血管疾病机器学习预测模型Cardiovascular Disease Machine Learning Predictive Model
摘要: 目的:在我国城乡居民疾病死亡构成比里,心血管疾病位居首位。患者通常在出现症状时才前往就医,而且诊断心血管疾病的传统手段既复杂又昂贵。鉴于此,本研究旨在借助一般人口特征、合并症以及常规体检血检指标来识别心血管疾病患者。方法:样本选取自CHARLS数据库13,420的参与者。删除缺失值后,运用逻辑回归、决策树、K-最邻近算法、随机森林、神经网络构建模型,通过比较接收者操作特征曲线下面积(ROC_AUC)值选择最优模型进一步构建各心血管疾病亚组模型,并采用SHAP算法对模型予以解释。结果:通过逻辑回归构建的模型效能最佳,其ROC_AUC值为0.7644 (95% CI: 0.7397~0.7890),其中对心脏病的识别效能较好,ROC_AUC值为0.7747。SHAP算法对模型的解释显示,年龄、体重指数、糖尿病以及吸烟史在识别心血管病方面有着重要贡献。结论:基于机器学习方法能够识别心血管病患者,可利用简易检查结果在早期对高风险人群进行识别并实施干预。
Abstract: Objective: Cardiovascular diseases account for the highest proportion of deaths among both urban and rural residents in our country. Patients typically seek medical attention only after the onset of symptoms, and traditional diagnostic methods for cardiovascular diseases are often complex and costly. Therefore, this study aimed to identify patients with cardiovascular diseases based on general population characteristics, comorbidities, and routine physical blood test indicators. Methods: Samples were drawn from 13,420 participants in the CHARLS database. After removing missing values, models were constructed using logistic regression, decision trees, the K-nearest neighbor algorithm, random forests, and neural networks. The optimal model was selected by comparing the area under the receiver operating characteristic curve (ROC_AUC) which facilitated the construction of subgroup models for each type of cardiovascular disease. The SHAP algorithm was employed to interpret the models. Results: The logistic regression model exhibited the best performance, achieving an ROC_AUC value of 0.7644 (95% CI: 0.7397~0.7890), with a particularly strong recognition of heart disease, which had an ROC_AUC value of 0.7747. The interpretation provided by the SHAP algorithm indicated that age, body mass index, diabetes, and smoking history significantly contributed to the identification of cardiovascular diseases. Conclusion: Utilizing machine learning methods, it is possible to identify patients with cardiovascular diseases, allowing for the early identification and intervention of high-risk groups based on the results of physical examinations.
文章引用:谈炎, 许邦龙. 基于机器学习识别中国社区心血管疾病人群[J]. 临床医学进展, 2025, 15(1): 2059-2069. https://doi.org/10.12677/acm.2025.151273

参考文献

[1] 国家心血管病中心, 中国心血管健康与疾病报告编写组, 胡盛寿. 中国心血管健康与疾病报告2023概要[J]. 中国循环杂志, 2024, 39(7): 625-660.
[2] 杨继, 张垚, 马腾, 等. 1990-2019年中国心血管疾病流行现状、疾病负担及发病预测分析[J]. 中国全科医学, 2024, 27(2): 233-244, 252.
[3] Shehab, M., Abualigah, L., Shambour, Q., Abu-Hashem, M.A., Shambour, M.K.Y., Alsalibi, A.I., et al. (2022) Machine Learning in Medical Applications: A Review of State-Of-The-Art Methods. Computers in Biology and Medicine, 145, Article ID: 105458. [Google Scholar] [CrossRef] [PubMed]
[4] Pinto-Coelho, L. (2023) How Artificial Intelligence Is Shaping Medical Imaging Technology: A Survey of Innovations and Applications. Bioengineering, 10, Article 1435. [Google Scholar] [CrossRef] [PubMed]
[5] França, R.P., Bonacin, R. and Monteiro, A.C.B. (2024) The Growing Application Potential of Machine Learning in Healthcare Systems of Modernity. In: Leal Filho, W. and Kuzmanović, V., Eds., Sustainable Development Seen through the Lenses of Ethnoeconomics and the Circular Economy, Springer, 1-17. [Google Scholar] [CrossRef
[6] Zhao, Y., Hu, Y., Smith, J.P., Strauss, J. and Yang, G. (2012) Cohort Profile: The China Health and Retirement Longitudinal Study (CHARLS). International Journal of Epidemiology, 43, 61-68. [Google Scholar] [CrossRef] [PubMed]
[7] Bagley, S.C., White, H. and Golomb, B.A. (2001) Logistic Regression in the Medical Literature: Standards for Use and Reporting, with Particular Attention to One Medical Domain. Journal of Clinical Epidemiology, 54, 979-985. [Google Scholar] [CrossRef] [PubMed]
[8] Quinlan, J.R. (1986) Induction of Decision Trees. Machine Learning, 1, 81-106. [Google Scholar] [CrossRef
[9] Taunk, K., De, S., Verma, S. and Swetapadma, A. (2019) A Brief Review of Nearest Neighbor Algorithm for Learning and Classification. 2019 International Conference on Intelligent Computing and Control Systems (ICCS), Madurai, 15-17 May 2019, 1255-1260. [Google Scholar] [CrossRef
[10] Liaw, A. and Wiener, M. (2002) Classification and Regression by Random Forest. R News, 2, 18-22.
[11] Du, K., Leung, C., Mow, W.H. and Swamy, M.N.S. (2022) Perceptron: Learning, Generalization, Model Selection, Fault Tolerance, and Role in the Deep Learning Era. Mathematics, 10, Article 4730. [Google Scholar] [CrossRef
[12] Lundberg, S. and Lee, S.I. (2017) A Unified Approach to Interpreting Model Predictions. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 4768-4777.
[13] Ambale-Venkatesh, B., Yang, X., Wu, C.O., Liu, K., Hundley, W.G., McClelland, R., et al. (2017) Cardiovascular Event Prediction by Machine Learning: The Multi-Ethnic Study of Atherosclerosis. Circulation Research, 121, 1092-1101. [Google Scholar] [CrossRef] [PubMed]
[14] Niccoli, T. and Partridge, L. (2012) Ageing as a Risk Factor for Disease. Current Biology, 22, R741-R752. [Google Scholar] [CrossRef] [PubMed]
[15] North, B.J. and Sinclair, D.A. (2012) The Intersection between Aging and Cardiovascular Disease. Circulation Research, 110, 1097-1108. [Google Scholar] [CrossRef] [PubMed]
[16] Chen, Y., Yu, W., Lv, J., Sun, D., Pei, P., Du, H., et al. (2024) Early Adulthood BMI and Cardiovascular Disease: A Prospective Cohort Study from the China Kadoorie Biobank. The Lancet Public Health, 9, e1005-e1013. [Google Scholar] [CrossRef] [PubMed]
[17] Wang, L., Ding, H., Deng, Y., Huang, J., Lao, X. and Wong, M.C.S. (2024) Associations of Obesity Indices Change with Cardiovascular Outcomes: A Dose-Response Meta-Analysis. International Journal of Obesity, 48, 635-645. [Google Scholar] [CrossRef] [PubMed]
[18] Sharma, A., Mittal, S., Aggarwal, R. and Chauhan, M.K. (2020) Diabetes and Cardiovascular Disease: Inter-Relation of Risk Factors and Treatment. Future Journal of Pharmaceutical Sciences, 6, Article No. 130. [Google Scholar] [CrossRef
[19] Adams, B., Jacocks, L. and Guo, H. (2020) Higher BMI Is Linked to an Increased Risk of Heart Attacks in European Adults: A Mendelian Randomisation Study. BMC Cardiovascular Disorders, 20, Article No. 258. [Google Scholar] [CrossRef] [PubMed]
[20] Khan Minhas, A.M., Sedhom, R., Jean, E.D., Shapiro, M.D., Panza, J.A., Alam, M., et al. (2024) Global Burden of Cardiovascular Disease Attributable to Smoking, 1990-2019: An Analysis of the 2019 Global Burden of Disease Study. European Journal of Preventive Cardiology, 31, 1123-1131. [Google Scholar] [CrossRef] [PubMed]
[21] Mambo, A., Yang, Y., Mahulu, E. and Zihua, Z. (2024) Investigating the Interplay of Smoking, Cardiovascular Risk Factors, and Overall Cardiovascular Disease Risk: NHANES Analysis 2011-2018. BMC Cardiovascular Disorders, 24, Article No. 193. [Google Scholar] [CrossRef] [PubMed]
[22] 王权, 刘德平. 高尿酸血症与高血压[J]. 中华老年医学杂志, 2019, 38(7): 820-824.
[23] Lanaspa, M.A., Andres-Hernando, A. and Kuwabara, M. (2020) Uric Acid and Hypertension. Hypertension Research, 43, 832-834. [Google Scholar] [CrossRef] [PubMed]
[24] Kuwabara, M., Ae, R., Kosami, K., Kanbay, M., Andres-Hernando, A., Hisatome, I., et al. (2024) Current Updates and Future Perspectives in Uric Acid Research, 2024. Hypertension Research, 48, 867-873. [Google Scholar] [CrossRef] [PubMed]