基于深度多视图对比学习方法的多组学数据整合及预后预测模型构建
Integration of Multi-Omics Data and Prognostic Prediction Model Construction Based on Deep Multi-View Contrastive Learning Methods
摘要: 在癌症研究中,精准识别癌症亚型和评估患者预后对制定优化治疗方案至关重要。高通量测序技术生成的大量多组学数据为癌症预后研究提供了宝贵资源。深度学习方法能够有效整合这些数据,精确识别更多癌症亚型。在本研究中,我们分析了12种癌症的多组学数据集,并将其作为模型的输入。我们提出了一种基于卷积自动编码器的深度多视图对比学习模型(dmCLCAE),该模型旨在利用多组学数据预测与生存相关的癌症亚型。为了验证模型的效果,我们对比了多组学因子分析算法(MOFA+)和深度学习模型(ProgCAE)在不同癌症类型分类中的表现。结果显示,dmCLCAE在区分不同生存亚型方面表现出显著优势,同时在预测一致性上也有更优异的表现。
Abstract: In cancer research, accurately identifying cancer subtypes and assessing patient prognosis are crucial for developing optimized treatment strategies. The vast amount of multi-omics data generated by high-throughput sequencing technologies provides valuable resources for cancer prognosis studies. Deep learning methods can effectively integrate these data to accurately identify more cancer subtypes. In this study, we analyzed multi-omics datasets from 12 types of cancer and used them as input for our model. We proposed a deep multi-view contrastive learning model based on a convolutional autoencoder (dmCLCAE), designed to predict survival-related cancer subtypes using multi-omics data. To validate the model’s performance, we compared it with the Multi-Omics Factor Analysis v2 (MOFA+) and prognostic model based on a convolutional autoencoder (ProgCAE) in classifying various cancer types. The results showed that dmCLCAE demonstrated a significant advantage in distinguishing different survival subtypes and exhibited superior consistency in predictions.
文章引用:高新凤. 基于深度多视图对比学习方法的多组学数据整合及预后预测模型构建[J]. 应用数学进展, 2024, 13(9): 4182-4190. https://doi.org/10.12677/aam.2024.139399

参考文献

[1] Conesa, A. and Beck, S. (2019) Making Multi-Omics Data Accessible to Researchers. Scientific Data, 6, Article No. 251. [Google Scholar] [CrossRef] [PubMed]
[2] Hasin, Y., Seldin, M. and Lusis, A. (2017) Multi-Omics Approaches to Disease. Genome Biology, 18, Article No. 83. [Google Scholar] [CrossRef] [PubMed]
[3] Weinstein, J.N., Collisson, E.A., Mills, G.B., Shaw, K.R.M., Ozenberger, B.A., Ellrott, K., et al. (2013) The Cancer Genome Atlas Pan-Cancer Analysis Project. Nature Genetics, 45, 1113-1120. [Google Scholar] [CrossRef] [PubMed]
[4] Alameer, A. and Chicco, D. (2021) Geocancerprognosticdatasetsretriever: A Bioinformatics Tool to Easily Identify Cancer Prognostic Datasets on Gene Expression Omnibus (GEO). Bioinformatics, 38, 1761-1763. [Google Scholar] [CrossRef] [PubMed]
[5] Zhang, J., Baran, J., Cros, A., Guberman, J.M., Haider, S., Hsu, J., et al. (2011) International Cancer Genome Consortium Data Portal—A One-Stop Shop for Cancer Genomics Data. Database, 2011, bar026. [Google Scholar] [CrossRef] [PubMed]
[6] Sørlie, T., Tibshirani, R., Parker, J., Hastie, T., Marron, J.S., Nobel, A., et al. (2003) Repeated Observation of Breast Tumor Subtypes in Independent Gene Expression Data Sets. Proceedings of the National Academy of Sciences, 100, 8418-8423. [Google Scholar] [CrossRef] [PubMed]
[7] Cabassi, A. and Kirk, P.D.W. (2020) Multiple Kernel Learning for Integrative Consensus Clustering of Omic Datasets. Bioinformatics, 36, 4789-4796. [Google Scholar] [CrossRef] [PubMed]
[8] Nguyen, N.D. and Wang, D. (2020) Multiview Learning for Understanding Functional Multiomics. PLOS Computational Biology, 16, e1007677. [Google Scholar] [CrossRef] [PubMed]
[9] Trunk, G.V. (1979) A Problem of Dimensionality: A Simple Example. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1, 306-307. [Google Scholar] [CrossRef] [PubMed]
[10] Rappoport, N. and Shamir, R. (2018) Multi-Omic and Multi-View Clustering Algorithms: Review and Cancer Benchmark. Nucleic Acids Research, 46, 10546-10562. [Google Scholar] [CrossRef] [PubMed]
[11] Reel, P.S., Reel, S., Pearson, E., Trucco, E. and Jefferson, E. (2021) Using Machine Learning Approaches for Multi-Omics Data Analysis: A Review. Biotechnology Advances, 49, Article 107739. [Google Scholar] [CrossRef] [PubMed]
[12] Springenberg, J.T., Dosovitskiy, A., Brox, T. and Riedmiller, M. (2014) Striving for Simplicity: The All Convolutional Net.
[13] Chauhan, R., Ghanshala, K.K. and Joshi, R.C. (2018). Convolutional Neural Network (CNN) for Image Detection and Recognition. 2018 First International Conference on Secure Cyber Computing and Communication (ICSCCC), Jalandhar, 15-17 December 2018, 278-282.[CrossRef
[14] Sun, W., Zheng, B. and Qian, W. (2016). Computer Aided Lung Cancer Diagnosis with Deep Learning Algorithms. SPIE Proceedings, San Diego, California, 24 March 2016, 97850Z.[CrossRef
[15] Masci, J., Meier, U., Cireşan, D. and Schmidhuber, J. (2011) Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction. In: Honkela, T., Duch, W., Girolami, M. and Kaski, S., Eds., Artificial Neural Networks and Machine LearningICANN 2011, Springer, 52-59. [Google Scholar] [CrossRef
[16] Tian, Y., Krishnan, D. and Isola, P. (2020) Contrastive Multiview Coding. In: Vedaldi, A., Bischof, H., Brox, T. and Frahm, J.-M., Eds., Computer VisionECCV 2020, Springer, 776-794. [Google Scholar] [CrossRef
[17] Oord, A.V.D., Li, Y. and Vinyals, O. (2018) Representation Learning with Contrastive Predictive Coding.
[18] 胡深, 钱宇华, 王婕婷, 李飞江, 吕维. 基于对比学习的超多类深度图像聚类模型[J]. 计算机科学, 2023, 50(9): 192-201.
[19] Poirion, O.B., Jing, Z., Chaudhary, K., Huang, S. and Garmire, L.X. (2021) Deepprog: An Ensemble of Deep-Learning and Machine-Learning Models for Prognosis Prediction Using Multi-Omics Data. Genome Medicine, 13, Article No. 112. [Google Scholar] [CrossRef] [PubMed]
[20] Liu, Q. and Song, K. (2023) Progcae: A Deep Learning-Based Method That Integrates Multi-Omics Data to Predict Cancer Subtypes. Briefings in Bioinformatics, 24, bbad196. [Google Scholar] [CrossRef] [PubMed]