基于多组学融合和对抗自编码器的生存分析模型
A Survival Analysis Model Based on Multi-Omics Integration and Adversarial Autoencoder
摘要: 多组学整合分析可以利用不同组学之间的互补信息,有利于系统全面地理解癌症疾病的分子生物学机制。多组学数据的高维小样本属性,导致传统的生存分析模型存在严重的过拟合问题。深度学习模型可以从高维数据中进行自动特征提取,在处理复杂的多组学数据方面具有显著优势。为了有效地整合多组学数据,本文提出了基于对抗自编码器的多组学特征提取网络。结合1D-CNNCox生存分析模型,构建了基于多组学融合和生成对抗网络的GAN-1DCCox模型。在8种不同癌症类型的TCGA数据集上进行了消融和对比实验,相比流行的生存分析基准模型,GAN-1DCCox模型取得了更高的C指数值。结果表明GAN-1DCCox模型能够有效地融合多组学数据,筛选出重要的预后特征基因,提升了模型的生存预测性能和稳健性。
Abstract: Multi-omics integration analysis can utilize complementary information from different omics, beneficial for a more systematic and comprehensive understanding of the molecular biology mechanisms of cancer diseases. The high-dimension small-sample size of multi-omics data leads to serious overfitting issues in traditional survival analysis models. Deep learning models can automatically extract features from high-dimensional data and have significant advantages in processing complex multi-omics data. In this study, we proposed a survival analysis model based on multi-omics integration and adversarial autoencoder, called GAN-1DCCox model, which consists of a multi-omics feature extraction module based on generative adversarial networks and a 1D-CNNCox survival analysis module. GAN-1DCCox model achieved the highest C-index values in both ablation and comparative experiments on TCGA datasets of 8 different cancer types. It indicates that GAN-1DCCox model can effectively integrate multi-omics data and screen out important prognostic signature genes, and thereby improving the prediction performance and robustness of survival analysis model.
文章引用:苗馨予, 殷清燕, 张丽丽. 基于多组学融合和对抗自编码器的生存分析模型[J]. 应用数学进展, 2024, 13(6): 2627-2640. https://doi.org/10.12677/aam.2024.136251

参考文献

[1] Hasin, Y., Seldin, M. and Lusis, A. (2017) Multi-Omics Approaches to Disease. Genome Biology, 18, Article No. 83. [Google Scholar] [CrossRef] [PubMed]
[2] Subramanian, I., Verma, S., Kumar S., et al. (2020) Multi-Omics Data Integration, Interpretation, and Its Application. Bioinformatics and Biology Insights, 14. [Google Scholar] [CrossRef] [PubMed]
[3] David, G.K. and Mitchel, K. (2012) Survival Analysis: A Self-Learning Text. 3rd Edition, Springer.
[4] Cox, D.R. (1972) Regression Models and Life-Tables. Journal of the Royal Statistical Society Series B: Statistical Methodology, 34, 187-202. [Google Scholar] [CrossRef
[5] Ching, T., Zhu, X. and Garmire, L.X. (2018) Cox-Nnet: An Artificial Neural Network Method for Prognosis Prediction of High-Throughput Omics Data. PLOS Computational Biology, 14, e1006076. [Google Scholar] [CrossRef] [PubMed]
[6] Katzman, L., Shaham, U., Cloninger, A., et al. (2018) DeepSurv: Personalized Treatment Recommender System Using a Cox Proportional Hazards Deep Neural Network. BMC Medical Research Methodology, 18, Article No. 24. [Google Scholar] [CrossRef] [PubMed]
[7] Hao, J., Kim, Y., Mallavarapu, T., Oh, J.H. and Kang, M. (2019) Interpretable Deep Neural Network for Cancer Survival Analysis by Integrating Genomic and Clinical Data. BMC Medical Genomics, 12, Article No. 189. [Google Scholar] [CrossRef] [PubMed]
[8] Kvamme, H., Borgan, O. and Scheel, I. (2019) Time-to-Event Prediction with Neural Networks and Cox Regression. Journal of Machine Learning Research, 20, 1-30.
[9] Huang, Z., Zhan, X., Xiang, S., Johnson, T.S., Helm, B., Yu, C.Y., et al. (2019) SALMON: Survival Analysis Learning with Multi-Omics Neural Networks on Breast Cancer. Frontiers in Genetics, 10, Article 166. [Google Scholar] [CrossRef] [PubMed]
[10] Zhao, L., Dong, Q., Luo, C., Wu, Y., Bu, D., Qi, X., et al. (2021) Deepomix: A Scalable and Interpretable Multi-Omics Deep Learning Framework and Application in Cancer Survival Analysis. Computational and Structural Biotechnology Journal, 19, 2719-2725. [Google Scholar] [CrossRef] [PubMed]
[11] Tong, L., Mitchel, J., Chatlin, K. and Wang, M.D. (2020) Deep Learning Based Feature-Level Integration of Multi-Omics Data for Breast Cancer Patients Survival Analysis. BMC Medical Informatics and Decision Making, 20, Article No. 225. [Google Scholar] [CrossRef] [PubMed]
[12] Yin, Q., Chen, W., Zhang, C. and Wei, Z. (2022) A Convolutional Neural Network Model for Survival Prediction Based on Prognosis-Related Cascaded Wx Feature Selection. Laboratory Investigation, 102, 1064-1074. [Google Scholar] [CrossRef] [PubMed]
[13] Yang, H., Chen, R., Li, D. and Wang, Z. (2021) Subtype-GAN: A Deep Learning Approach for Integrative Cancer Subtyping of Multi-Omics Data. Bioinformatics, 37, 2231-2237. [Google Scholar] [CrossRef] [PubMed]
[14] Mondol, R.K., Truong, N.D., Reza, M., Ippolito, S., Ebrahimie, E. and Kavehei, O. (2022) Afexnet: An Adversarial Autoencoder for Differentiating Breast Cancer Sub-Types and Extracting Biologically Relevant Genes. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 19, 2060-2070. [Google Scholar] [CrossRef] [PubMed]
[15] Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al. (2014) Generative Adversarial Nets. Communications of the ACM, 63, 2672-2680.
[16] Makhzani, A., Shlens, J., Jaitly, N., et al. (2016) Adversarial Autoencoders. Proceeding of the 4th International Conference on Learning Representation. San Juan, Puerto Rico, 2-4 May 2016, 1-16.
[17] Chaubey, V., Nair, M.S. and Pillai, G.N. (2019). Gene Expression Prediction Using a Deep 1D Convolution Neural Network. 2019 IEEE Symposium Series on Computational Intelligence (SSCI), Xiamen, 6-9 December 2019, 1383-1389.[CrossRef
[18] Mostavi, M., Chiu, Y., Huang, Y. and Chen, Y. (2020) Convolutional Neural Network Models for Cancer Type Prediction Based on Gene Expression. BMC Medical Genomics, 13, Article No. 44. [Google Scholar] [CrossRef] [PubMed]
[19] Simon, N., Friedman, J., Hastie, T. and Tibshirani, R. (2011) Regularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent. Journal of Statistical Software, 39, 1-13. [Google Scholar] [CrossRef] [PubMed]
[20] Ishwaran, H., Kogalur, U.B., Blackstone, E.H. and Lauer, M.S. (2008) Random Survival Forests. The Annals of Applied Statistics, 2, 841-860. [Google Scholar] [CrossRef
[21] Hothorn, T. (2005) Survival Ensembles. Biostatistics, 7, 355-373. [Google Scholar] [CrossRef] [PubMed]
[22] Van Belle, V., Pelckmans, K., Van Huffel, S. and Suykens, J.A.K. (2011) Support Vector Methods for Survival Analysis: A Comparison between Ranking and Regression Approaches. Artificial Intelligence in Medicine, 53, 107-118. [Google Scholar] [CrossRef] [PubMed]
[23] Yuan, M., Pei, J., Li, R., Tian, L., He, X. and Li, Y. (2021) CD40LG as a Prognostic Molecular Marker Regulates Tumor Microenvironment through Immune Process in Breast Cancer. International Journal of General Medicine, 14, 8833-8846. [Google Scholar] [CrossRef] [PubMed]
[24] Li, J., Zhang, X., Liu, B., Shi, C., Ma, X., Ren, S., et al. (2022) The Expression Landscape of FOXP3 and Its Prognostic Value in Breast Cancer. Annals of Translational Medicine, 10, 801-801. [Google Scholar] [CrossRef] [PubMed]
[25] Thomas, J.K., Mir, H., Kapur, N., Bae, S. and Singh, S. (2019) CC Chemokines Are Differentially Expressed in Breast Cancer and Are Associated with Disparity in Overall Survival. Scientific Reports, 9, Article No. 4014. [Google Scholar] [CrossRef] [PubMed]
[26] Zhou, M., Zhang, P., Da, M., Yang, R., Ma, Y., Zhao, J., et al. (2022) A Pan-Cancer Analysis of the Expression of STAT Family Genes in Tumors and Their Relationship to the Tumor Microenvironment. Frontiers in Oncology, 12, Article 925537. [Google Scholar] [CrossRef] [PubMed]