机器学习算法在乳腺癌预测中的应用
Application of Machine Learning Algorithms in Breast Cancer Prediction
DOI: 10.12677/ORF.2023.135546, PDF,  被引量    国家自然科学基金支持
作者: 郭昱君:南京信息工程大学数学与统计学院,江苏 南京
关键词: 机器学习Lasso随机森林ROC曲线Machine Learning Lasso Random Forest ROC Curve
摘要: 乳腺癌是世界上女性最常见的恶性肿瘤,治愈乳腺癌的关键在于早期的诊断和治疗。及时诊断肿瘤对临床治疗具有重要意义,因此,找到一种能够准确识别肿瘤类型并尽早进行治疗的算法变得尤为关键。本文介绍了在威斯康星州诊断乳腺癌数据集上使用了lasso算法进行特征筛选,然后基于这些特征训练了随机森林分类器来预测乳腺癌的良性或恶性。结果显示,预测模型的准确率为95.32%,召回率为92.06%,F1分数为93.55%,通过这些指标的综合评估,证明这种方法可以有效地进行乳腺癌良恶性的预测,具有潜在的应用价值。总的来说,文中提供了一种有力的方法,可以对癌症数据进行预测,并优化分类器的性能。这种方法可以帮助医生更好地诊断乳腺癌,促进更好的治疗和预防,对乳腺癌的研究具有重要的意义。
Abstract: Breast cancer is the most common malignant tumor in women worldwide, and early diagnosis and treatment are key to curing breast cancer. Timely detection of tumors is of great significance for clinical treatment, so finding an algorithm that can accurately identify tumor types and start treatment early is crucial. This article introduces the use of the lasso algorithm for feature selection on a breast cancer diagnostic dataset in Wisconsin. Based on these features, a random forest classifier was trained to predict the benign or malignant nature of breast cancer. The results showed an accuracy of 95.32%, a recall rate of 92.06%, and an F1 score of 93.55% for the predictive model. Through a comprehensive evaluation of these metrics, it is proven that this method can effectively predict the benign or malignant nature of breast cancer and has potential practical value. In summary, the article provides a powerful method for predicting cancer data and optimizing the performance of classifiers. This approach can help doctors better diagnose breast cancer, promote better treatment and prevention, and has significant implications for breast cancer research.
文章引用:郭昱君. 机器学习算法在乳腺癌预测中的应用[J]. 运筹与模糊学, 2023, 13(5): 5464-5475. https://doi.org/10.12677/ORF.2023.135546

参考文献

[1] 郑雅文. 基于特征选择和支持向量机的乳腺癌诊断研究[D]: [硕士学位论文]. 太原: 太原理工大学, 2019.
[2] 蔡玉琴, 张璟, 张帆. 乳腺癌影像学检查现状与研究进展[J]. 中国全科医学, 2009(13): 1228-1231.
[3] 孙哲, 黎庶, 徐惠绵. 数字化乳腺X线计算机辅助诊断系统临床应用价值的初步探讨[J]. 中华医学杂志, 2005, 85(24): 1692-1695.
[4] Tibshirani, R. (1996) Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58, 267-288. [Google Scholar] [CrossRef
[5] 黄登香, 卢春婷. Lasso方法在基于行为决定因素的宫颈癌早期检测中的应用[J]. 应用数学进展, 2022, 11(2): 781-789.
[6] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004) Least Angle Regression. Annals of Statistics, 32, 407-499. [Google Scholar] [CrossRef
[7] 李欣海. 随机森林模型在分类与回归分析中的应用[J]. 应用昆虫学报, 2013, 50(4): 1190-1197.
[8] 姚旭, 王晓丹, 张玉玺, 等. 特征选择方法综述[J]. 控制与决策, 2012, 27(2): 161-166.
[9] 李丽, 李霞, 郭政, 等. 两种过滤特征基因选择算法的有效性研究[J]. 生命科学研究, 2003, 7(4): 369-373.
[10] Wolberg, W., Mangasarian, O., Street, N. and Street, W. (1995) Breast Cancer Wisconsin (Diagnostic). UCI Machine Learning Repository. [Google Scholar] [CrossRef
[11] Fawcett, T. (2006) An Introduction to ROC Analysis. Pattern Recognition Letters, 27, 861-874. [Google Scholar] [CrossRef
[12] Powers, D.M. (2011) Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness and Correlation. Journal of Machine Learning Technologies, 2, 37-63.
[13] Breiman, L. (2001) Random Forests. Machine Learning, 45, 5-32. [Google Scholar] [CrossRef
[14] 李思琪. 乳腺癌数据处理及辅助诊断建模[D]: [硕士学位论文]. 哈尔滨: 哈尔滨理工大学, 2023.