基于数据挖掘的肝内胆管癌预后分析
Prognostic Analysis of Intrahepatic Cholangiocarcinoma Based on Data Mining
摘要: 本文将采用美国癌症数据库SEER的患者数据,基于三种预测模型对肝内胆管癌(ICC)患者临床病理数据进行分析,完成对生存时间以及生存状态的预测。首先选取了年龄、性别、种族、T分期、N分期、M分期等变量完成了生存曲线的刻画,通过单因素分析和多因素分析确定了Cox风险回归模型的变量及系数,构建了Cox预测模型。接着完成了机器学习算法的建模,首先是梯度提升法建立的模型,选取了一共12个变量并筛选出了5个最重要的独立变量完成预测。然后是BP神经网络模型的应用,排除了种族等不重要的分类变量,加入了对ICC预后影响更明显的变量进入模型。通过对三个模型拟合效果对比得出相应结论。最后针对上述研究结果提出合理的建议。
Abstract: This article will use the patient data of the American cancer database SEER to analyze the clinicopathological data of patients with intrahepatic cholangiocarcinoma (ICC) based on three prediction models to complete the prediction of survival time and survival status. First, variables such as age, gender, race, T stage, N stage, and M stage were selected to describe the survival curve. The variables and coefficients of the Cox hazards regression model were determined through univariate analysis and multivariate analysis, and the Cox prediction model was constructed. Then, the modeling of the machine learning algorithm is completed. The first is the model established by the gradient boosting method. A total of 12 variables are selected and the 5 most important independent variables are selected to complete the prediction. Then there is the application of BP neural network model, which excludes unimportant categorical variables such as race, and adds variables that have a more obvious impact on the prognosis of ICC into the model. Corresponding conclusions are drawn by comparing the fitting effects of the three models. Finally, reasonable suggestions are put forward based on the above research results.
文章引用:韩元全, 雷杰, 陈浪. 基于数据挖掘的肝内胆管癌预后分析[J]. 统计学与应用, 2022, 11(4): 892-908. https://doi.org/10.12677/SA.2022.114093

参考文献

[1] 赵燕青, 董辉, 丛文铭. 肝内胆管癌病理学分型新进展[J]. 肝脏, 2022, 27(2): 242-244.
[Google Scholar] [CrossRef
[2] 沈锋, 王葵. 肝内胆管癌诊断和治疗焦点问题[J]. 中国实用外科杂志, 2020, 40(6): 644-649.
[Google Scholar] [CrossRef
[3] 潘晓涛, 南虹, 曹秋祥. 肝内胆管癌患者手术治疗与术后联合放疗的疗效对比: 基于SEER数据库的倾向评分匹配研究[J]. 现代肿瘤医学, 2022, 30(3): 495-499.