基于多组学数据的肺癌分期预测研究
Study on Multi-Omics Data-Driven Prediction of Lung Cancer Stages
DOI: 10.12677/hjcb.2026.161003, PDF,    科研立项经费支持
作者: 胡思亲:江西服装学院大数据学院,江西 南昌
关键词: 多组学数据肺癌深度学习分期模型Multi-Omics Data Lung Cancer Deep Learning Staging Model
摘要: 癌症是一类由基因变异引发的恶性疾病,其发病率和死亡率均较高,严重威胁人类健康。基因表达调控对生物体发育至关重要,在肿瘤发生与发展中,常表现为沉默基因的异常激活或活跃基因的表达抑制,这被认为是促进肿瘤发展的关键机制之一。此外,人体微生物群落参与调控多种生理过程,其结构或功能失调可提升致癌风险。本研究聚焦于肺癌,整合基因表达与微生物组数据,旨在开发一种用于肿瘤分期预测的计算模型。研究流程如下:首先对基因表达与微生物组数据进行差异分析,筛选显著变化的基因及微生物物种;其次构建融合注意力机制的深度神经网络模型;随后基于弹性网模型选出的关键特征训练模型以预测肺癌分期;最后采用五折交叉验证评估模型性能。实验结果表明,该模型在肺癌分期预测中表现优异,准确率超过80%。
Abstract: Cancer is a malignant disease driven by genetic alterations, characterized by high incidence and mortality rates, posing a severe threat to human health. Precise regulation of gene expression is essential for normal organismal development; in tumorigenesis and progression, it is frequently disrupted through aberrant activation of normally silenced genes or suppression of constitutively active genes—a mechanism widely regarded as pivotal in cancer development. Moreover, the human microbiota modulates a wide array of physiological processes, and dysbiosis—either structural or functional—has been associated with an elevated risk of carcinogenesis. This study focuses on lung cancer and integrates gene expression and microbiome data to develop a computational model for tumor stage prediction. The workflow is as follows: To identify significantly dysregulated genes and microbial taxa, differential analyses were conducted on both gene expression profiles and microbiome compositions. Subsequently, a deep neural network incorporating an attention mechanism is constructed; third, key features selected by an elastic net model are used to train the network for lung cancer staging; Finally, model performance is evaluated via five-fold cross-validation. Experimental results demonstrate that the proposed model achieves superior predictive performance, with an accuracy exceeding 80%.
文章引用:胡思亲. 基于多组学数据的肺癌分期预测研究[J]. 计算生物学, 2026, 16(1): 31-39. https://doi.org/10.12677/hjcb.2026.161003

参考文献

[1] Bray, F., Laversanne, M., Weiderpass, E. and Soerjomataram, I. (2021) The Ever‐Increasing Importance of Cancer as a Leading Cause of Premature Death Worldwide. Cancer, 127, 3029-3030. [Google Scholar] [CrossRef] [PubMed]
[2] Hou, J., Aerts, J., den Hamer, B., van IJcken, W., den Bakker, M., Riegman, P., et al. (2010) Gene Expression-Based Classification of Non-Small Cell Lung Carcinomas and Survival Prediction. PLOS ONE, 5, e10312. [Google Scholar] [CrossRef] [PubMed]
[3] Mountain, C.F. (1997) Revisions in the International System for Staging Lung Cancer. Chest, 111, 1710-1717. [Google Scholar] [CrossRef] [PubMed]
[4] Ma, X., Xi, B., Zhang, Y., Zhu, L., Sui, X., Tian, G., et al. (2020) A Machine Learning-Based Diagnosis of Thyroid Cancer Using Thyroid Nodules Ultrasound Images. Current Bioinformatics, 15, 349-358. [Google Scholar] [CrossRef
[5] Mountain, C.F. and Dresler, C.M. (1997) Regional Lymph Node Classification for Lung Cancer Staging. Chest, 111, 1718-1723. [Google Scholar] [CrossRef] [PubMed]
[6] Tsou, J.A., Hagen, J.A., Carpenter, C.L. and Laird-Offringa, I.A. (2002) DNA Methylation Analysis: A Powerful New Tool for Lung Cancer Diagnosis. Oncogene, 21, 5450-5461. [Google Scholar] [CrossRef] [PubMed]
[7] Hanahan, D. and Weinberg, R.A. (2000) The Hallmarks of Cancer. Cell, 100, 57-70. [Google Scholar] [CrossRef] [PubMed]
[8] Hahn, W.C., Counter, C.M., Lundberg, A.S., Beijersbergen, R.L., Brooks, M.W. and Weinberg, R.A. (1999) Creation of Human Tumour Cells with Defined Genetic Elements. Nature, 400, 464-468. [Google Scholar] [CrossRef] [PubMed]
[9] Jones, P.A. (2012) Functions of DNA Methylation: Islands, Start Sites, Gene Bodies and Beyond. Nature Reviews Genetics, 13, 484-492. [Google Scholar] [CrossRef] [PubMed]
[10] Tan, A.C. and Gilbert, D. (2003) Ensemble Machine Learning on Gene Expression Data for Cancer Classification. Applied Bioinformatics, 2, S75-S83.
[11] Wang, Y., McKay, J.D., Rafnar, T., Wang, Z., Timofeeva, M.N., Broderick, P., et al. (2014) Rare Variants of Large Effect in BRCA2 and CHEK2 Affect Risk of Lung Cancer. Nature Genetics, 46, 736-741. [Google Scholar] [CrossRef] [PubMed]
[12] Anggaraditya, P.B., Adiputra, P.A.T. and Widiana, I.K. (2019) EGFR Nanovaccine in Lung Cancer Treatment. Bali Medical Journal, 8, 844-851. [Google Scholar] [CrossRef
[13] Guo, H., Zhao, L., Zhu, J., Chen, P., Wang, H., Jiang, M., et al. (2022) Microbes in Lung Cancer Initiation, Treatment, and Outcome: Boon or Bane? Seminars in Cancer Biology, 86, 1190-1206. [Google Scholar] [CrossRef] [PubMed]
[14] Bhatt, A.P., Redinbo, M.R. and Bultman, S.J. (2017) The Role of the Microbiome in Cancer Development and Therapy. CA: A Cancer Journal for Clinicians, 67, 326-344. [Google Scholar] [CrossRef] [PubMed]
[15] Schwabe, R.F. and Jobin, C. (2013) The Microbiome and Cancer. Nature Reviews Cancer, 13, 800-812. [Google Scholar] [CrossRef] [PubMed]
[16] Han, P., Zhou, J., Xiang, J., Liu, Q. and Sun, K. (2022) Research Progress on the Therapeutic Effect and Mechanism of Metformin for Lung Cancer (Review). Oncology Reports, 49, Article 3. [Google Scholar] [CrossRef] [PubMed]
[17] Hu, G., Gu, J., Zheng, J., Schnöll, M. and He, F. (2019) Improved Neighborhood Covering Algorithm and Its Lung Cancer Staging Prediction. Journal of Computational Methods in Sciences and Engineering, 19, 317-326. [Google Scholar] [CrossRef
[18] Qu, W., Zhao, J., Wu, Y., Xu, R. and Liu, S. (2021) Recombinant Adeno-Associated Virus 9-Mediated Expression of Kallistatin Suppresses Lung Tumor Growth in Mice. Current Gene Therapy, 21, 72-80. [Google Scholar] [CrossRef] [PubMed]
[19] Xiong, D., Ye, Y., Fu, Y., Wang, J., Kuang, B., Wang, H., et al. (2015) Bmi-1 Expression Modulates Non-Small Cell Lung Cancer Progression. Cancer Biology & Therapy, 16, 756-763. [Google Scholar] [CrossRef] [PubMed]
[20] Robinson, M.D. and Smyth, G.K. (2007) Moderated Statistical Tests for Assessing Differences in Tag Abundance. Bioinformatics, 23, 2881-2887. [Google Scholar] [CrossRef] [PubMed]
[21] Anders, S. and Huber, W. (2010) Differential Expression Analysis for Sequence Count Data. Nature Precedings. [Google Scholar] [CrossRef
[22] Hardcastle, T.J. and Kelly, K.A. (2010) BaySeq: Empirical Bayesian Methods for Identifying Differential Expression in Sequence Count Data. BMC Bioinformatics, 11, Article No. 422. [Google Scholar] [CrossRef] [PubMed]
[23] Rapaport, F., Khanin, R., Liang, Y., Pirun, M., Krek, A., Zumbo, P., et al. (2013) Comprehensive Evaluation of Differential Gene Expression Analysis Methods for RNA-Seq Data. Genome Biology, 14, Article No. 3158. [Google Scholar] [CrossRef] [PubMed]
[24] Chen, T. and Xie, Y. (2005) Literature Review of Feature Dimension Reduction in Text Categorization. Journal of the China Society for Scientific and Technical Information, 24, 691-695.
[25] Liu, T., Liu, S., Chen, Z., et al. (2003) An Evaluation on Feature Selection for Text Clustering. Proceedings of the 20th International Conference on Machine Learning (ICML-03), Washington, 21-24 August 2003, 488-495.
[26] Zou, H. and Hastie, T. (2005) Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67, 301-320. [Google Scholar] [CrossRef
[27] Ogutu, J.O., Schulz-Streeck, T. and Piepho, H. (2012) Genomic Selection Using Regularized Linear Regression Models: Ridge Regression, Lasso, Elastic Net and Their Extensions. BMC Proceedings, 6, Article No. S10. [Google Scholar] [CrossRef] [PubMed]
[28] Srivastava, N., Hinton, G., Krizhevsky, A., et al. (2014) Dropout: A Simple Way to Prevent Neural Networks from Overfitting. The Journal of Machine Learning Research, 15, 1929-1958.
[29] 吴仁迪, 沈吉禹, 王福栋, 等. 嗜麦芽窄食单胞菌对肺腺癌A549细胞系转录组基因表达的影响[J]. 中华实验外科杂志, 2023, 40(4): 682-685.