基于贝叶斯时空模型和Stacking集成学习模型对肺癌发病因素的分析
Analysis of Lung Cancer Incidence Factors Based on Bayesian Spatiotemporal Model and Stacking Ensemble Learning Model
DOI: 10.12677/aam.2025.146337, PDF,    科研立项经费支持
作者: 朱青群, 黄 欣, 杨云翼:广东财经大学统计与数据科学学院,广东 广州;夏 莉*:广东财经大学统计与数据科学学院,广东 广州;广东财经大学大数据与教育应用统计实验室,广东 广州
关键词: 肺癌发病因素贝叶斯时空模型Stacking集成学习模型防控Pathogenic Factors of Lung Cancer Bayesian Spatiotemporal Model Stacking Ensemble Learning Model Prevention and Control
摘要: 肿瘤是在中国乃至全世界发病率、致死率最高的主要疾病之一。近年来,中国男性、女性整体的肺癌发病率、死亡率均居于恶性肿瘤高位。当前中国肺癌发病与患者死亡形势严峻,如何及时诊断肺癌、让患者能尽早治疗并提高生存率成为了中国解决肿瘤问题中最大的挑战。基于中国肺癌发病率、死亡率兼高的实情,需探究诱发肺癌的主要因素对肺癌发病的具体影响,从根源上对肺癌发病进行防控;此外,帮助肺癌患者早诊断、早治疗肺癌,提高患者治疗后的生存率,这将是对肺癌患者死亡率有效的控制。为了帮助患者以更低的成本、更便利的方式及时发现自身肺癌发病情况,进而降低肺癌患者死亡率,本文从诱发肺癌主要因素角度出发,建立一个能有效预测个人肺癌发病概率的模型。首先研究了诱发肺癌的主要影响因素,即吸烟、肺部慢性疾病、饮酒、性别、年龄,再根据收集的数据运用R语言编程工具,建立logistic回归模型、SVM模型、XGBoost模型,并对模型进行了效能分析;最后,运用Stacking集成学习模型将三个模型集成为一个准确率更高的个人肺癌预测模型。由于目前国内外对肺癌的预测普遍基于患者已经进行的医疗检测,通过相关医疗影像进行分析研究,所以,本文从诱发肺癌主要因素的角度出发,建立一个能够预测肺癌发病概率的模型,扩大了模型使用者的范围,这就是本文的创新所在。基于本文的实验结果,不难发现诱发肺癌的主要因素具有一定的相关性,对于Stacking集成学习模型集成SVM模型、Logistic回归模型和GBoost模型的结果,最终建立的肺癌预测模型综合了三个模型的优势,准确率高达83.12%,使建立的模型更加精准、贴切实际情况。
Abstract: Tumor is one of the major diseases with the highest incidence rate and mortality rate in China and even in the world. In recent years, the overall incidence rate and mortality of lung cancer in Chinese men and women are high in malignant tumors. The current situation of lung cancer incidence and patient mortality in China is severe. How to diagnose lung cancer in a timely manner, enable patients to receive early treatment, and improve survival rates has become the biggest challenge in solving the cancer problem in China. Based on the fact that the incidence rate and mortality of lung cancer are both high in China, it is necessary to explore the specific impact of the main factors inducing lung cancer on the incidence of lung cancer, and prevent and control the incidence of lung cancer from the root; in addition, helping lung cancer patients diagnose and treat lung cancer early, improving their survival rate after treatment, will be an effective way to control the mortality rate of lung cancer patients. In order to help patients discover their own lung cancer incidence in a lower cost and more convenient way in a timely manner, and thereby reduce the mortality rate of lung cancer patients, this article establishes an effective model for predicting the probability of individual lung cancer incidence from the perspective of the main factors that induce lung cancer. Firstly, the main influencing factors that induce lung cancer were studied, including smoking, chronic lung disease, alcohol consumption, gender, and age. Based on the collected data, R language programming tools were used to establish logistic regression models, SVM models, and XGBoost models, and the effectiveness of the models was analyzed; finally, the Stacking ensemble learning model was used to integrate the three models into a more accurate personal lung cancer prediction model. Given that the prediction of lung cancer both domestically and internationally generally relies on medical tests that patients have already undergone and the analysis and research through relevant medical imaging, this paper starts from the perspective of the main factors that induce lung cancer and establishes a model that can predict the probability of lung cancer occurrence. This approach expands the scope of model users, which is the innovation of this paper. Based on the experimental results of this article, it is not difficult to find that the main factors that induce lung cancer have a certain correlation. For the results of the Stacking ensemble learning model integrating SVM model, logistic regression model, and GBoost model, the final lung cancer prediction model established integrates the advantages of the three models, with an accuracy rate of up to 83.12%, making the established model more accurate and practical.
文章引用:朱青群, 黄欣, 杨云翼, 夏莉. 基于贝叶斯时空模型和Stacking集成学习模型对肺癌发病因素的分析[J]. 应用数学进展, 2025, 14(6): 486-497. https://doi.org/10.12677/aam.2025.146337

参考文献

[1] 余永康. 患上肺癌, 如何选择治疗路径? [N]. 大众健康报, 2023-12-06(015).
[2] 中华医学会肿瘤学分会, 中华医学会杂志社. 中华医学会肺癌临床诊疗指南(2023版) [J]. 中华医学杂志, 2023, 103(27): 2037-2074.
[3] 国家卫生健康委办公厅. 原发性肺癌诊疗指南(2022年版) [J]. 协和医学杂志, 2022, 13(4): 549-570.
[4] 李翔, 高申. 1990-2019年中国居民肺癌发病、患病和死亡趋势分析[J]. 中国慢性病预防与控制, 2021, 29(11): 821-826.
[5] Tammemagi, M.C., Schmidt, H., Martel, S., McWilliams, A., Goffin, J.R., Johnston, M.R., et al. (2017) Participant Selection for Lung Cancer Screening by Risk Modelling (the Pan-Canadian Early Detection of Lung Cancer [Pancan] Study): A Single-Arm, Prospective Study. The Lancet Oncology, 18, 1523-1531. [Google Scholar] [CrossRef] [PubMed]
[6] 张熙明, 朱辉. 肺癌预测模型及其进展[J]. 临床医学进展, 2024, 14(3): 98-104.
[7] 《早期肺癌诊断中国专家共识(2023年版)》发布[J]. 中华医学信息导报, 2023, 38(3): 10.
[8] 张静, 马志敏, 王慧, 等. 慢性肺部疾病与肺癌发病风险关联的孟德尔随机化研究[J]. 中华预防医学杂志, 2023, 57(8): 1147-1152.
[9] Knorr‐Held, L. (2000) Bayesian Modelling of Inseparable Space‐Time Variation in Disease Risk. Statistics in Medicine, 19, 2555-2567. [Google Scholar] [CrossRef
[10] Watanabe, S. (2010) Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory. The Journal of Machine Learning Research, 11, 3571-3594.
[11] Plummer, M. (2008) Penalized Loss Functions for Bayesian Model Comparison. Biostatistics, 9, 523-539. [Google Scholar] [CrossRef] [PubMed]
[12] Gneiting, T. and Raftery, A.E. (2007) Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102, 359-378. [Google Scholar] [CrossRef
[13] Spiegelhalter, D.J., Best, N.G., Carlin, B.P. and Van Der Linde, A. (2002) Bayesian Measures of Model Complexity and Fit. Journal of the Royal Statistical Society Series B: Statistical Methodology, 64, 583-639. [Google Scholar] [CrossRef
[14] Perandini, S., Soardi, G.A., Motton, M., Rossi, A., Signorini, M. and Montemezzi, S. (2015) Solid Pulmonary Nodule Risk Assessment and Decision Analysis: Comparison of Four Prediction Models in 285 Cases. European Radiology, 26, 3071-3076. [Google Scholar] [CrossRef] [PubMed]
[15] 郭玉珠, 于钏钏, 许宁, 等. 基于贝叶斯时空模型黑龙江省肺癌死亡风险及其影响因素分析[J]. 中国公共卫生, 2021, 37(6): 965-973.
[16] Han, B., Zheng, R., Zeng, H., Wang, S., Sun, K., Chen, R., et al. (2024) Cancer Incidence and Mortality in China, 2022. Journal of the National Cancer Center, 4, 47-53. [Google Scholar] [CrossRef] [PubMed]