基于机器学习的前列腺癌致病基因筛选与疾病预测
Screening and Disease Prediction of Prostate Cancer-Causing Genes Based on Machine Learning
DOI: 10.12677/sa.2025.148218, PDF,   
作者: 王秉基*:中国海洋大学海德学院,山东 青岛;高 翔:中国海洋大学数学科学学院,山东 青岛
关键词: 前列腺癌差异表达分析机器学习诊断预测模型Prostate Cancer Differential Expression Analysis Machine Learning Diagnostic Predictive Modeling
摘要: 为识别与前列腺癌相关的遗传特征,文章提出了一种集成式机器学习方法,用于筛选前列腺癌的关键基因,并深入分析这些靶基因的生物学意义,从而建立高效的疾病诊断预测模型。研究通过UCSC Xena数据库收集了151例前列腺癌组织和152例正常组织的转录组数据,并采用PCA等方法进行批次效应校正。通过差异表达分析筛选出了2586个上下调基因,并结合GO和KEGG富集分析,揭示了与前列腺癌相关的关键致病通路。进一步集成随机森林、LASSO回归和梯度提升机(GBM)三种机器学习算法进行基因二次筛选,最终确定了12个对前列腺癌具有重要影响的关键基因。基于这些基因,构建了8种前列腺癌诊断预测模型,采用混淆矩阵和ROC曲线对模型性能进行评估。结果显示,极限梯度提升(XGBoost)模型的准确度和AUC值分别达到了93%和97%,验证了该模型在前列腺癌诊断中的应用潜力。
Abstract: To identify genetic features associated with prostate cancer, the article proposes an integrated machine learning approach for screening key genes for prostate cancer and analyzing the biological significance of these target genes in-depth to build an efficient predictive model for disease diagnosis. The study collected transcriptome data from 151 prostate cancer tissues and 152 normal tissues through the UCSC Xena database, and corrected for batch effects using PCA and other methods. A total of 2586 up- and down-regulated genes were screened by differential expression analysis and combined with GO and KEGG enrichment analysis to reveal the key pathogenic pathways associated with prostate cancer. A secondary screening was further integrated with three machine learning algorithms, namely Random Forest, LASSO regression and Gradient Boosting Machine (GBM), and 12 key genes with significant impact on prostate cancer were finally identified. Based on these genes, eight prostate cancer diagnosis prediction models were constructed, and the model performance was evaluated using confusion matrix and ROC curve. The results showed that the accuracy and AUC value of the Extreme Gradient Boosting (XGBoost) model reached 93% and 97%, respectively, which verified the potential application of the model in prostate cancer diagnosis.
文章引用:王秉基, 高翔. 基于机器学习的前列腺癌致病基因筛选与疾病预测[J]. 统计学与应用, 2025, 14(8): 85-96. https://doi.org/10.12677/sa.2025.148218

参考文献

[1] Sung, H., Ferlay, J., Siegel, R.L., Laversanne, M., Soerjomataram, I., Jemal, A., et al. (2021) Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA: A Cancer Journal for Clinicians, 71, 209-249. [Google Scholar] [CrossRef] [PubMed]
[2] Siegel, R.L., Miller, K.D., Fuchs, H.E. and Jemal, A. (2022) Cancer Statistics, 2022. CA: A Cancer Journal for Clinicians, 72, 7-33. [Google Scholar] [CrossRef] [PubMed]
[3] Pernar, C.H., Ebot, E.M., Wilson, K.M. and Mucci, L.A. (2018) The Epidemiology of Prostate Cancer. Cold Spring Harbor Perspectives in Medicine, 8, a030361. [Google Scholar] [CrossRef] [PubMed]
[4] Attard, G., Parker, C., Eeles, R.A., Schröder, F., Tomlins, S.A., Tannock, I., et al. (2016) Prostate Cancer. The Lancet, 387, 70-82. [Google Scholar] [CrossRef] [PubMed]
[5] Robinson, D., Van Allen, E.M., Wu, Y., Schultz, N., Lonigro, R.J., Mosquera, J., et al. (2015) Integrative Clinical Genomics of Advanced Prostate Cancer. Cell, 161, 1215-1228. [Google Scholar] [CrossRef] [PubMed]
[6] Barbieri, C.E. and Tomlins, S.A. (2014) The Prostate Cancer Genome: Perspectives and Potential. Urologic Oncology: Seminars and Original Investigations, 32, 53.e15-53.e22. [Google Scholar] [CrossRef] [PubMed]
[7] Kothari, V., Wei, J.S., Shukla, S.K., et al. (2020) Machine Learning-Based Clinical Genomics Analysis of Prostate Cancer Outcomes. Cancers, 12, Article 1164.
[8] Goel, S., Bhatia, V., Kundu, S., Biswas, T., Carskadon, S., Gupta, N., et al. (2021) Transcriptional Network Involving ERG and AR Orchestrates Distal-Less Homeobox-1 Mediated Prostate Cancer Progression. Nature Communications, 12, Article No. 5325. [Google Scholar] [CrossRef] [PubMed]
[9] Gara, R.K., Kumari, S., Ganju, A., Yallapu, M.M., Jaggi, M. and Chauhan, S.C. (2015) Slit/Robo Pathway: A Promising Therapeutic Target for Cancer. Drug Discovery Today, 20, 156-164. [Google Scholar] [CrossRef] [PubMed]
[10] Lu, B., Asara, J.M., Sanda, M.G. and Arredouani, M.S. (2011) The Role of the Transcription Factor SIM2 in Prostate Cancer. PLOS ONE, 6, e28837. [Google Scholar] [CrossRef] [PubMed]
[11] Feng, Q., Kim, H., Barua, A., Huang, L., Bolaji, M., Zachariah, S., Jung, S.Y., He, B., Zhou, T. and Mitra, A. (2023) The Cancer Testis Antigen TDRD1 Regulates Prostate Cancer Proliferation by Associating with snRNP Biogenesis Machinery. Research Square. This is a preprint. [Google Scholar] [CrossRef] [PubMed]
[12] Luo, Z. and Farnham, P.J. (2020) Genome-Wide Analysis of HOXC4 and HOXC6 Regulated Genes and Binding Sites in Prostate Cancer Cells. PLOS ONE, 15, e0228590. [Google Scholar] [CrossRef] [PubMed]
[13] El Khoury, W. and Nasr, Z. (2021) Deregulation of Ribosomal Proteins in Human Cancers. Bioscience Reports, 41, BSR20211577. [Google Scholar] [CrossRef] [PubMed]
[14] Yazbek Hanna, M., Winterbone, M., O’Connell, S.P., Olivan, M., Hurst, R., Mills, R., et al. (2023) Gene-Transcript Expression in Urine Supernatant and Urine Cell-Sediment Are Different but Equally Useful for Detecting Prostate Cancer. Cancers, 15, Article 789. [Google Scholar] [CrossRef] [PubMed]
[15] Zhao, G., Zhao, X., Liu, Z., Wang, B., Dong, P., Watari, H., et al. (2025) Knockout or Inhibition of DHPS Suppresses Ovarian Tumor Growth and Metastasis by Attenuating the TGFβ Pathway. Scientific Reports, 15, Article No. 917. [Google Scholar] [CrossRef] [PubMed]
[16] Javier-DesLoges, J., McKay, R.R., Swafford, A.D., Sepich-Poore, G.D., Knight, R. and Parsons, J.K. (2021) The Microbiome and Prostate Cancer. Prostate Cancer and Prostatic Diseases, 25, 159-164. [Google Scholar] [CrossRef] [PubMed]
[17] Stankewich, M.C., Stabach, P.R. and Morrow, J.S. (2006) Human Sec31B: A Family of New Mammalian Orthologues of Yeast Sec31p That Associate with the COPII Coat. Journal of Cell Science, 119, 958-969. [Google Scholar] [CrossRef] [PubMed]
[18] 吴诗洋, 常爽, 陈晴, 等. 肿瘤微环境调节型细胞器靶向递药系统的研究进展[J]. 药学学报, 2022, 57(6): 1771-1780.
[19] Martignano, F., Gurioli, G., Salvi, S., Calistri, D., Costantini, M., Gunelli, R., et al. (2016) GSTP1 Methylation and Protein Expression in Prostate Cancer: Diagnostic Implications. Disease Markers, 2016, Article ID: 4358292. [Google Scholar] [CrossRef] [PubMed]
[20] Varambally, S., Laxman, B., Mehra, R., Cao, Q., Dhanasekaran, S.M., Tomlins, S.A., et al. (2008) Golgi Protein GOLM1 Is a Tissue and Urine Biomarker of Prostate Cancer. Neoplasia, 10, 1285-IN35. [Google Scholar] [CrossRef] [PubMed]
[21] Qin, X., Liu, L., Li, Y., Luo, H., Chen, H. and Weng, X. (2023) GOLM1 Promotes Epithelial-Mesenchymal Transition by Activating TGFβ1/Smad2 Signaling in Prostate Cancer. Technology in Cancer Research & Treatment, 22, 1-8. [Google Scholar] [CrossRef] [PubMed]