机器学习预测癌症五年存活率
Machine Learning Predicts Five-Year Cancer Survival Rates
DOI: 10.12677/AAM.2023.125254, PDF,   
作者: 杨心蕙:青岛大学数学与统计学院,山东 青岛
关键词: 五年生存率机器学习多组学Five-Year Survival Rate Machine Learning Multi-Omics
摘要: 癌症生存率对于癌症患者的临床治疗具有重要的意义,本篇论文旨在探究出可以准确预测癌症患者五年生存率的机器学习方法。采用的数据特征是TCGA网站上下载的多组学数据。我们探究出mRMR特征选择法和逻辑回归分类器以及SVM分类器的方法组合可以使五年存活率的准确率达到0.85以上,甚至可以超过0.9。由于我们分类时采用的是五折交叉验证,可以表明我们的结果稳健性较高。同时这两种方法组合的结果中AUC值和F1值也比较高,再次证实了这两种方法组合的优势。
Abstract: Cancer survival is of great importance to the clinical management of cancer patients and the aim of this thesis is to explore machine learning methods that can accurately predict the five-year survival rate of cancer patients. The data features used are multi-omics data downloaded from the TCGA website. We explore that the combination of the mRMR feature selection method and the logistic regression classifier and SVM classifier can result in an accuracy of more than 0.85 and even more than 0.9 for the five-year survival rate. Since we use a five-fold cross-validation for our classification, our results are robust. Also the AUC and F1 values are higher in the results of the combination of these two methods, which again confirms the advantages of the combination of these two methods.
文章引用:杨心蕙. 机器学习预测癌症五年存活率[J]. 应用数学进展, 2023, 12(5): 2532-2545. https://doi.org/10.12677/AAM.2023.125254

参考文献

[1] Zheng, H., Zhang, G., Zhang, L., et al. (2020) Comprehensive Review of Web Servers and Bioinformatics Tools for Cancer Prognosis Analysis. Frontiers in Oncology, 10, Article No. 68. [Google Scholar] [CrossRef] [PubMed]
[2] Kourou, K., Exarchos, T.P., Exarchos, K.P., et al. (2015) Machine Learning Applications in Cancer Prognosis and Prediction. Computational and Structural Biotechnology Journal, 13, 8-17. [Google Scholar] [CrossRef] [PubMed]
[3] Vasaikar, S.V., Straub, P., Wang, J., et al. (2018) LinkedOmics: Analyzing Multi-Omics Data within and across 32 Cancer Types. Nucleic Acids Research, 46, D956-D963. [Google Scholar] [CrossRef] [PubMed]
[4] Cruz, J.A. and Wishart, D.S. (2006) Applications of Ma-chine Learning in Cancer Prediction and Prognosis. Cancer Informatics, 2. [Google Scholar] [CrossRef
[5] Altιnçay, H. (2011) Improving the κ-Nearest Neighbour Rule: Using Geometrical Neighbourhoods and Manifold- Based Metrics. Expert Systems, 28, 391-406. [Google Scholar] [CrossRef
[6] Schonlau, M. and Welch, W.J. (2006) Screening the Input Variables to a Computer Model via Analysis of Variance and Visualization. In: Dean, A. and Lewis, S., Eds., Screening: Methods for Experimentation in Industry, Drug Discovery, and Genetics, Springer, Berlin, 308-327. [Google Scholar] [CrossRef
[7] Hasan, M.A.M., Nasser, M., Ahmad, S., et al. (2016) Feature Se-lection for Intrusion Detection Using Random Forest. Journal of Information Security, 7, 129-140. [Google Scholar] [CrossRef
[8] Martinez, A.M. and Kak, A.C. (2001) Pca versus lda. IEEE Transac-tions on Pattern Analysis and Machine Intelligence, 23, 228-233. [Google Scholar] [CrossRef
[9] Peng, H., Long, F. and Ding, C. (2005) Feature Selection Based on Mutual Information Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 1226- 1238. [Google Scholar] [CrossRef
[10] Menard, S. (2002) Applied Logistic Regression Analysis. Sage, London. [Google Scholar] [CrossRef
[11] Pal, M. (2005) Random Forest Classifier for Remote Sensing Classification. International Journal of Remote Sensing, 26, 217-222. [Google Scholar] [CrossRef
[12] Safavian, S.R. and Landgrebe, D. (1991) A Survey of Deci-sion Tree Classifier Methodology. IEEE Transactions on Systems, Man, and Cybernetics, 21, 660-674. [Google Scholar] [CrossRef
[13] Chen, T. and Guestrin, C. (2016) Xgboost: A Scalable Tree Boosting Sys-tem. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, 13-17 August 2016, 785-794. [Google Scholar] [CrossRef
[14] Aziz, R.M., Baluch, M.F., Patel, S., et al. (2022) LGBM: A Ma-chine Learning Approach for Ethereum Fraud Detection. International Journal of Information Technology, 14, 3321-3331. [Google Scholar] [CrossRef
[15] Cunningham, P. and Delany, S.J. (2021) κ-Nearest Neighbour Classifiers—A Tutorial. ACM Computing Surveys (CSUR), 54, 1-25. [Google Scholar] [CrossRef
[16] Hastie, T., Rosset, S., Zhu, J., et al. (2009) Multi-Class Adaboost. Statistics and Its Interface, 2, 349-360. [Google Scholar] [CrossRef
[17] Joachims, T. (1998) Making Large-Scale SVM Learning Practical. Technical Report.
[18] Murphy, K.P. (2006) Naive Bayes Classifiers. University of British Columbia.
[19] Ibrahim, W.H.A. (2020) Performance Evaluation of Classification Algorithms.