基于机器学习的糖尿病预测研究
Research on Diabetes Prediction Based on Machine Learning
DOI: 10.12677/airr.2026.152037, PDF,   
作者: 陈 雪, 高瑞娟, 赵丽婷:河北金融学院金融科技学院,河北 保定
关键词: 糖尿病预测机器学习随机森林模型评估Diabetes Prediction Machine Learning Random Forest Model Evaluation
摘要: 随着全球糖尿病患病率的持续攀升,早期筛查与干预已成为降低疾病负担的关键手段。本文提出一种基于机器学习的预测模型,实现糖尿病的精准预测。首先对包含1006例临床样本的糖尿病数据集进行清洗、标准化及特征筛选,通过相关性分析与递归特征消除提取8个核心预测指标;随后构建逻辑回归、决策树、随机森林、支持向量机及XGBoost五种预测模型,采用网格搜索结合5折交叉验证进行参数优化;最终通过多指标综合评估,确定随机森林模型为最优预测模型(准确率84.16%, AUC = 0.9096)。该模型为糖尿病早期筛查提供了高效技术支持,可辅助基层医疗机构开展风险评估工作,具有重要的临床应用价值与社会意义。
Abstract: As the global prevalence of diabetes continues to rise, early screening and intervention have become key measures to reduce the disease burden. This paper proposes a machine learning-based prediction model to achieve accurate diabetes prediction. First, a diabetes dataset containing 1006 clinical samples was cleaned, standardized, and subjected to feature selection: 8 core predictive indicators were extracted via correlation analysis and recursive feature elimination. Then, five predictive models (logistic regression, decision tree, random forest, support vector machine, and XGBoost) were constructed, with parameter optimization performed using grid search combined with 5-fold cross-validation. Finally, through a comprehensive multi-metric evaluation, the random forest model was identified as the optimal predictive model (accuracy: 84.16%, AUC = 0.9096). This model provides efficient technical support for early diabetes screening, can assist primary medical institutions in conducting risk assessment, and holds significant clinical application value and social significance.
文章引用:陈雪, 高瑞娟, 赵丽婷. 基于机器学习的糖尿病预测研究[J]. 人工智能与机器人研究, 2026, 15(2): 387-394. https://doi.org/10.12677/airr.2026.152037

参考文献

[1] World Health Organization (2023) Global Report on Diabetes. World Health Organization.
[2] International Diabetes Federation (2025) IDF Diabetes Atlas. 11th Edition, International Diabetes Federation.
https://www.diabetesatlas.org
[3] 吴晖南, 陈淑娇, 陈展峰, 等. 基于LightGBM模型的糖尿病预测模型的研究[J]. 中国卫生标准管理, 2023, 14(24): 64-67.
[4] Deo, R.C. (2015) Machine Learning in Medicine. Circulation, 132, 1920-1930. [Google Scholar] [CrossRef] [PubMed]
[5] Kavakiotis, I., Tsave, O., Salifoglou, A., Maglaveras, N., Vlahavas, I. and Chouvarda, I. (2017) Machine Learning and Data Mining Methods in Diabetes Research. Computational and Structural Biotechnology Journal, 15, 104-116. [Google Scholar] [CrossRef] [PubMed]
[6] Nabila, N., Islam, M., Hossain, M., et al. (2021) An Intelligent System for Diabetes Prediction: A Machine Learning Approach Using Clustering and Naive Bayes. International Journal of Advanced Computer Science and Applications, 12, 78-85.
[7] Kui, L., Zhang, M., Wang, S., et al. (2022) Evaluation of Machine Learning Models for Predicting Blood Glucose Levels and Detecting Adverse Glycemic Events in Diabetes Management. Journal of Medical Systems, 46, 1-12.
[8] 苗丰顺. 基于集成学习的糖尿病风险预测系统设计与实现[D]: [硕士学位论文]. 济南: 山东师范大学, 2023.
[9] 刘建平. 基于Uniappand Django的糖尿病预测系统设计[J]. 微型电脑应用, 2023, 39(8): 124-127.
[10] 仵豪. 基于Stacking融合算法的糖尿病风险早期识别模型研究[J]. 计算机工程与应用, 2023, 59(15): 235-242.
[11] 马吉聪. 基于XAR-Stacking融合模型的糖尿病预测系统开发[J]. 计算机应用与软件, 2024, 41(2): 189-196.