癌症预后预测的机器学习集成框架的综合分析
Comprehensive Analysis of Machine Learning Ensemble Frameworks for Cancer Prognosis Prediction
DOI: 10.12677/aam.2026.154200, PDF,   
作者: 巩琪琪:青岛大学数学与统计学院,山东 青岛
关键词: 数据降维特征选择机器学习TCGAData Dimensionality Reduction Feature Selection Machine Learning TCGA
摘要: 本文基于TCGA项目的基因拷贝数变异、RNAseq基因表达、DNA甲基化等多维组学数据,结合机器学习算法构建癌症患者预后生存预测模型。首先对组学数据进行预处理,提取患者生存时间;随后采用主成分分析、偏最小二乘法等方法降维,并通过mRMR算法筛选低冗余、高生物学意义的特征子集;最后应用支持向量机、Logistic回归等算法构建分类模型,经交叉验证与多指标评估模型性能。实验结果表明,模型性能与降维及分类算法选择密切相关,其中基于偏最小二乘法降维的模型表现最优,证实患者标签信息对关键特征提取的重要性;Kaplan-Meier曲线进一步验证了模型有效性。本文构建的预测模型可为临床决策提供科学依据,助力肿瘤精准医疗发展,改善患者预后状况与生存质量,具有较高的理论意义与潜在临床应用价值。
Abstract: Based on multi-dimensional omics data such as gene copy number variations, RNAseq gene expression, and DNA methylation from The Cancer Genome Atlas (TCGA) project, this study constructs a prognostic survival prediction model for cancer patients by integrating machine learning algorithms. First, the omics data are preprocessed to extract patients’ survival time; subsequently, dimensionality reduction methods including principal component analysis (PCA), non-negative matrix factorization (NMF), and partial least squares (PLS) are employed, followed by screening of low-redundancy and biologically meaningful feature subsets using the mRMR algorithm; finally, classification models are built using algorithms such as support vector machine (SVM), random forest (RF), and Logistic regression (LR), with model performance evaluated through cross-validation and multiple metrics. Experimental results indicate that model performance is closely related to the selection of dimensionality reduction and classification algorithms. Among them, models based on PLS dimensionality reduction achieve the optimal performance, confirming the importance of patient label information for extracting key features; Kaplan-Meier curves further verify the model’s effectiveness. The constructed prediction model can provide a scientific basis for clinical decision-making, facilitate the development of tumor precision medicine, improve patients’ prognostic outcomes and quality of life, and thus possesses significant theoretical significance and potential clinical application value.
文章引用:巩琪琪. 癌症预后预测的机器学习集成框架的综合分析[J]. 应用数学进展, 2026, 15(4): 777-790. https://doi.org/10.12677/aam.2026.154200

参考文献

[1] Bray, F., Laversanne, M., Sung, H., Ferlay, J., Siegel, R.L., Soerjomataram, I., et al. (2024) Global Cancer Statistics 2022: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA: A Cancer Journal for Clinicians, 74, 229-263. [Google Scholar] [CrossRef] [PubMed]
[2] Bray, F., Ferlay, J., Soerjomataram, I., Siegel, R.L., Torre, L.A. and Jemal, A. (2018) Global Cancer Statistics 2018: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA: A Cancer Journal for Clinicians, 68, 394-424. [Google Scholar] [CrossRef] [PubMed]
[3] Weinberg, R.A. and Weinberg, R.A. (2006) The Biology of Cancer. WW Norton & Company.
[4] Luo, L,. Wang, X., Lin, Y., et al. (2024) Deep Learning in Breast Cancer Imaging: A DECADE of progress and Future Directions. IEEE Reviews in Biomedical Engineering, 18, 130-151.
[5] Weinstein, J.N., Collisson, E.A., Mills, G.B., Shaw, K.R.M., Ozenberger, B.A., Ellrott, K., et al. (2013) The Cancer Genome Atlas Pan-Cancer Analysis Project. Nature Genetics, 45, 1113-1120. [Google Scholar] [CrossRef] [PubMed]
[6] Verhage, R.J., Hazebroek, E., Boone, J., et al. (2009) Minimally invasive Surgery Compared to Open Procedures in Esophagectomy for Cancer: A Systematic Review of the Literature. Minerva Chirurgica, 64, 135-146.
[7] Lee, Y.T., Tan, Y.J. and Oon, C.E. (2018) Molecular Targeted Therapy: Treating Cancer with Specificity. European Journal of Pharmacology, 834, 188-196. [Google Scholar] [CrossRef] [PubMed]
[8] Sharma, P. and Allison, J.P. (2015) The Future of Immune Checkpoint Therapy. Science, 348, 56-61. [Google Scholar] [CrossRef] [PubMed]
[9] Gebski, V., Garès, V., Gibbs, E. and Byth, K. (2018) Data Maturity and Follow-Up in Time-To-Event Analyses. International Journal of Epidemiology, 47, 850-859. [Google Scholar] [CrossRef] [PubMed]
[10] Ye, T., Shao, J. and Yi, Y. (2024) Covariate-Adjusted Log-Rank Test: Guaranteed Efficiency Gain and Universal Applicability. Biometrika, 111, 691-705. [Google Scholar] [CrossRef
[11] K, M.S., Preetha, J., Reddy, K.N., Ramya, S., S, Y. and Murugan, S. (2024) Survival Analysis with Cox Proportional Hazards Model in Predicting Patient Outcomes. 2024 5th International Conference on Electronics and Sustainable Communication Systems (ICESC), Coimbatore, 7-9 August 2024, 1155-1161. [Google Scholar] [CrossRef