多组学数据和卷积自编码器的癌症分型算法
Cancer Subtyping Algorithm Using Multi-Omics Data and Convolutional Autoencoders
摘要: 整合多组学数据对癌症患者进行分型,对于提高患者的诊断、治疗和预后效果是至关重要的。传统的统计学方法,例如主成分分析等,对于处理高纬度的多组学数据集的能力有限。为有效整合多组学数据,提出了一种基于卷积神经网络的自编码器框架MCAEI (Multi-Omics Convolutional Autoen-coder Integration)。所提出的卷积自编码器设置了三个卷积层和反卷积层以及一个全连接自编码器来对多组学数据进行压缩和降维,将MCAEI应用于三种癌症并进行了分型工作。此外,所提出的方法与普通、稀疏、降噪自编码器进行比较,实验结果表明MCAEI方法更优。对于得到的最佳生存亚型,还进行了差异基因表达分析和富集通路分析。
Abstract: Integrating multi-omics data for staging cancer patients is essential to improve patient diagnosis, treatment, and prognosis. However, traditional statistical methods, such as principal component analysis, face limitations when dealing with high-dimensional multi-omics datasets. To effectively integrate multi-omics data, a convolutional neural network-based autoencoder framework, MCAEI (Multi-omics Convolutional Autoencoder Integration), is proposed. The proposed convolutional au-toencoder is composed of three convolutional layers, three corresponding deconvolutional layers, and a fully connected autoencoder. It is utilized to compress and reduce the dimensionality of mul-ti-omics data. The MCAEI method is then applied to three types of cancer for subtype classification. In addition, the proposed method was compared with the normal, sparse, denoising autoencoder. The results demonstrated the superiority of the MCAEI method. For the best survival subtypes ob-tained, differential gene expression analysis and enrichment pathway analysis were also per-formed.
文章引用:郭梦柯. 多组学数据和卷积自编码器的癌症分型算法[J]. 应用数学进展, 2023, 12(12): 5210-5217. https://doi.org/10.12677/AAM.2023.1212512

参考文献

[1] Jiang, W.G., Sanders, A.J., Katoh, M., et al. (2015) Tissue Invasion and Metastasis: Molecular, Biological and Clinical Perspectives. Seminars in Cancer Biology, 35, S244-S275. [Google Scholar] [CrossRef] [PubMed]
[2] Sack, L.M., Davoli, T., Li, M.Z., et al. (2018) Profound Tis-sue Specificity in Proliferation Control Underlies Cancer Drivers and Aneuploidy Patterns. Cell, 173, 499-514. [Google Scholar] [CrossRef] [PubMed]
[3] Kristensen, V.N., Lingjærde, O.C., Russnes, H.G., et al. (2014) Principles and Methods of Integrative Genomic Analyses in Cancer. Nature Reviews Cancer, 14, 299-313. [Google Scholar] [CrossRef] [PubMed]
[4] Sun, Y.V. and Hu, Y.J. (2016) Integrative Analysis of Multi-Omics Data for Discovery and Functional Studies of Complex Human Diseases. Advances in Genetics, 93, 147-190. [Google Scholar] [CrossRef] [PubMed]
[5] Xu, A., Chen, J., Peng, H., et al. (2019) Simultaneous Interro-gation of Cancer Omics to Identify Subtypes with Significant Clinical Differences. Frontiers in Genetics, 10, Article No. 236. [Google Scholar] [CrossRef] [PubMed]
[6] Shen, R., Olshen, A.B. and Ladanyi, M. (2009) Integrative Clustering of Multiple Genomic Data Types Using a Joint Latent Variable Model with Application to Breast and Lung Cancer Subtype Analysis. Bioinformatics, 25, 2906-2912. [Google Scholar] [CrossRef] [PubMed]
[7] Argelaguet, R., Velten, B., Arnol, D., et al. (2018) Multi-Omics Factor Analysis—A Framework for Unsupervised Integration of Multi-Omics Data Sets. Molecular Systems Biology, 14, e8124. [Google Scholar] [CrossRef] [PubMed]
[8] Lecun, Y., Bengio, Y. and Hinton, G. (2015) Deep Learning. Nature, 521, 436-444. [Google Scholar] [CrossRef] [PubMed]
[9] Paul, S. (2022) Capturing the Latent Space of an Autoencoder for Mul-ti-Omics Integration and Cancer Subtyping. Computers in Biology and Medicine, 148, Article ID: 105832. [Google Scholar] [CrossRef] [PubMed]
[10] Wang, H.Q., Li, H.L., Han, J.L., et al. (2023) MMDAE-HGSOC: A Novel Method for High-Grade Serous Ovarian Cancer Molecular Subtypes Classification Based on Multi-Modal Deep Autoencoder. Computational Biology and Chemistry, 105, Article ID: 107906. [Google Scholar] [CrossRef] [PubMed]
[11] Liu, C., Duan, Y., Zhou, Q., et al. (2023) A Classifica-tion Method of Gastric Cancer Subtype Based on Residual Graph Convolution Network. Frontiers in Genetics, 13, Arti-cle ID: 1090394. [Google Scholar] [CrossRef] [PubMed]
[12] Mckinney, W. (2011) Pandas: A Foundational Python Library for Data Analysis and Statistics. Python for High Performance and Scientific Computing, 14, 1-9.
[13] Troyanskaya, O., Cantor, M., Sherlock, G., et al. (2001) Missing Value Estimation Methods for DNA Micro-arrays. Bioinformatics, 17, 520-525. [Google Scholar] [CrossRef] [PubMed]
[14] Bersanelli, M., Mosca, E., Remondini, D., et al. (2016) Methods for the Integration of Multi-Omics Data: Mathematical Aspects. BMC Bioin-formatics, 17, 167-177. [Google Scholar] [CrossRef] [PubMed]
[15] Sharma, D., Paterson, A.D. and Xu, W. (2020) TAXONN: Ensemble of Neural Networks on Stratified Microbiome Data for Disease Prediction. Bioinformatics, 36, 4544-4550. [Google Scholar] [CrossRef] [PubMed]
[16] Glorot, X., Bordes, A. and Bengio, Y. (2011) Deep Sparse Rectifier Neural Networks. Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, 11-13 April 2011, 315-323.
[17] Kingma, D.P. and Ba, J. (2014) Adam: A Method for Sto-chastic Optimization.
[18] Ritchie, M.E., Phipson, B., Wu, D.I., et al. (2015) Limma Powers Differential Expression Analyses for RNA-Sequencing and Microarray Studies. Nucleic Acids Research, 43, e47. [Google Scholar] [CrossRef] [PubMed]
[19] Liu, H., Li, H., Luo, K., et al. (2020) Prognostic Gene Expression Signa-ture Revealed the Involvement of Mutational Pathways in Cancer Genome. Journal of Cancer, 11, 4510-4520. [Google Scholar] [CrossRef] [PubMed]
[20] Lin, X., Gu, Y., Su, Y., et al. (2022) Prediction of Adrenocortical Carcino-ma Relapse and Prognosis with a Set of Novel Multigene Panels. Cancers, 14, Article No. 2805. [Google Scholar] [CrossRef] [PubMed]
[21] Stenman, A., Shabo, I., Ramström, A., et al. (2019) Synchronous Aldosterone- and Cortisol-Producing Adrenocortical Adenomas Diagnosed Using CYP11B Immunohistochemistry. SAGE Open Medical Case Reports, 7. https://journals.sagepub.com/doi/epub/10.1177/2050313X19883770 [Google Scholar] [CrossRef
[22] Gene Ontology Consortium (2004) The Gene Ontology (GO) Database and Informatics Resource. Nucleic Acids Research, 32, D258-D261. [Google Scholar] [CrossRef] [PubMed]
[23] Kawamura, M., Yonezawa, Y., Tanaka, Y., et al. (1985) Corticoidogenic Effect of Acetylcholine in Bovine Adrenocortical Cells. Endocrinologia Japonica, 32, 17-19. [Google Scholar] [CrossRef] [PubMed]
[24] Kool, M.M.J., Galac, S., Van Der Helm, N., et al. (2015) Insu-lin-Like Growth Factor—Phosphatidylinositol 3 Kinase Signaling in Canine Cortisol-Secreting Adrenocortical Tumors. Journal of Veterinary Internal Medicine, 29, 214-224. [Google Scholar] [CrossRef] [PubMed]