基于Logistic回归模型的乳腺癌诊断特征筛选
Feature Selection for Breast Cancer Diagnosis Based on the Logistic Regression Model
摘要: 本文主要利用Logistic回归的数学建模方法对乳腺癌诊断问题进行研究,首先基于数据特征,对数据进行清洗、缺失值处理和标准化处理,采用点二列相关系数和基于Logistic回归的递归特征消除法(RFE)分别筛选出15个关键特征构建模型。通过对比两种方法所选特征构建的Logistic回归分类模型,发现基于RFE所选特征的模型准确率达到97.89%,优于基于点二列相关系数所选特征的模型准确率,表现出较高的预测性能。由此不仅验证了机器学习在乳腺癌预测中的有效性,还为未来模型优化和病因探索提供了重要参考,助力临床医生实现“三早”预防。
Abstract: This paper mainly studies the problem of breast cancer diagnosis by using the mathematical modeling method of Logistic regression. Firstly, based on the data characteristics, the data is cleaned, missing values are processed, and standardized. The point-biserial correlation coefficient and the recursive feature elimination method based on Logistic regression (RFE) are used to select 15 key features to build the model. By comparing the Logistic regression classification models constructed with the features selected by the two methods, it is found that the accuracy rate of the model based on the features selected by RFE reaches 97.89%, which is better than the accuracy rate of the model based on the features selected by the point-biserial correlation coefficient, showing high predictive performance. This not only verifies the effectiveness of machine learning in breast cancer prediction, but also provides an important reference for future model optimization and etiological exploration, helping clinicians achieve “three early” prevention.
文章引用:常芳欣, 杨双杨, 冯爱芬, 曹君杰, 蒋智涵, 王世杰. 基于Logistic回归模型的乳腺癌诊断特征筛选[J]. 建模与仿真, 2025, 14(7): 228-237. https://doi.org/10.12677/mos.2025.147531

参考文献

[1] 张雅聪, 吕章艳, 宋方方, 等. 全球及我国乳腺癌发病和死亡变化趋势[J]. 肿瘤综合治疗电子杂志, 2021, 7(2): 14-20.
[2] 何梦. “丁香医生”微信公众号乳腺癌信息传播的框架研究[D]: [硕士学位论文]. 北京: 北京外国语大学, 2022.
[3] 鲁聪, 宋丹, 王伟, 等. 基于机器学习算法构建胰腺癌多器官转移病人的生存预测模型[J/OL]. 腹部外科, 1-15.
http://kns.cnki.net/kcms/detail/42.1252.R.20250225.1326.002.html, 2025-07-23.
[4] 甄凯旋, 闫浩伦, 陈龙彪, 等. 基于机器学习儿童脓毒性休克临床预测模型的Meta分析[J]. 中国循证医学杂志, 2025, 25(2): 200-205.
[5] 王佩佩, 侯钊, 马慧, 等. 基于机器学习的肾癌患者术后复发风险预测模型的构建与评价[J]. 现代泌尿外科杂志, 2025, 30(3): 240-247.
[6] 时欣然, 庞震, 乔婷, 等. 基于机器学习的女性压力性尿失禁发病风险预测模型建立及效能评价[J]. 现代泌尿外科杂志, 2025, 30(3): 196-206.
[7] 阿里云. 威斯康星乳腺癌数据分析及自动诊断[EB/OL].
https://tianchi.aliyun.com/dataset/106831, 2021-07-21.
[8] 李华, 张伟, 王丽. 乳腺癌早期诊断中多模态影像特征与机器学习算法的结合应用[J]. 中国医学影像技术, 2023, 39(6): 854-859.
[9] 姜启源, 谢金星, 叶俊. 数学模型[M]. 第五版. 北京: 高等教育出版社, 2020.
[10] 周涛, 杨柳, 李娜. 乳腺癌风险预测模型的构建与验证[J]. 中国公共卫生, 2024, 35(4): 523-527.
[11] 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016.