IMMO:面向不完整多组学数据的整合分析框架
IMMO: An Integrated Analysis Framework for Incomplete Multi-Omic Data
摘要: 随着多组学数据整合方法的发展,微生物多组学数据在揭示复杂疾病机制方面发挥着越来越重要的作用。然而,在实际研究中,多组学数据普遍存在样本不完全匹配、模态广泛缺失以及高维稀疏性等问题,严重制约了组学信息之间的有效整合。为应对这些挑战,本文提出一种基于动态掩码机制的多组学整合模型(IMMO),采用联合自编码器架构,引入动态掩码策略,在训练过程中自适应地处理缺失数据,实现对不完全多组学数据的表征学习与数据重构。在炎症性肠病(IBD)和糖尿病数据集上的实验结果表明,IMMO在数据重建和疾病分类任务中表现出良好的性能,且所学习到的潜在特征能够有效捕捉与疾病相关的关键微生物模式。这为不完全多组学数据的整合分析提供了一种稳定、高效且可解释的方案。
Abstract: Microbiome multi-omics data are increasingly valuable for deciphering complex diseases. However, their integration is often hindered by incomplete sample matching, widespread missing modalities, and high-dimensional sparsity. To address these challenges, we propose IMMO (Integration Model for Incomplete Multi-Omics), a joint autoencoder-based framework that incorporates a dynamic masking mechanism to adaptively handle missing data during training. Evaluated on inflammatory bowel disease (IBD) and diabetes cohorts, IMMO demonstrates strong performance in both data reconstruction and disease classification, with latent representations capturing disease-relevant microbial patterns. Our approach offers a robust, efficient, and interpretable solution for integrative analysis of incomplete multi-omics data.
文章引用:李佳惠. IMMO:面向不完整多组学数据的整合分析框架[J]. 应用数学进展, 2025, 14(12): 48-58. https://doi.org/10.12677/aam.2025.1412484

参考文献

[1] Young, V.B. (2017) The Role of the Microbiome in Human Health and Disease: An Introduction for Clinicians. British Medical Journal, 356, j831. [Google Scholar] [CrossRef] [PubMed]
[2] Uebanso, T., Shimohata, T., Mawatari, K. and Takahashi, A. (2020) Functional Roles of B-Vitamins in the Gut and Gut Microbiome. Molecular Nutrition & Food Research, 64, Article 2000426. [Google Scholar] [CrossRef] [PubMed]
[3] Gill, S.R., Pop, M., DeBoy, R.T., Eckburg, P.B., Turnbaugh, P.J., Samuel, B.S., et al. (2006) Metagenomic Analysis of the Human Distal Gut Microbiome. Science, 312, 1355-1359. [Google Scholar] [CrossRef] [PubMed]
[4] Glassner, K.L., Abraham, B.P. and Quigley, E.M.M. (2020) The Microbiome and Inflammatory Bowel Disease. Journal of Allergy and Clinical Immunology, 145, 16-27. [Google Scholar] [CrossRef] [PubMed]
[5] Lloyd-Price, J., Arze, C., Ananthakrishnan, A.N., Schirmer, M., Avila-Pacheco, J., Poon, T.W., et al. (2019) Multi-Omics of the Gut Microbial Ecosystem in Inflammatory Bowel Diseases. Nature, 569, 655-662. [Google Scholar] [CrossRef] [PubMed]
[6] Hu, X., Yu, C., He, Y., Zhu, S., Wang, S., Xu, Z., et al. (2024) Integrative Metagenomic Analysis Reveals Distinct Gut Microbial Signatures Related to Obesity. BMC Microbiology, 24, Article No. 119. [Google Scholar] [CrossRef] [PubMed]
[7] Jacobs, J.P., Lagishetty, V., Hauer, M.C., Labus, J.S., Dong, T.S., Toma, R., et al. (2023) Multi-Omics Profiles of the Intestinal Microbiome in Irritable Bowel Syndrome and Its Bowel Habit Subtypes. Microbiome, 11, Article No. 5. [Google Scholar] [CrossRef] [PubMed]
[8] Xu, J. and Yang, Y. (2021) Gut Microbiome and Its Meta-Omics Perspectives: Profound Implications for Cardiovascular Diseases. Gut Microbes, 13, Article 1936379. [Google Scholar] [CrossRef] [PubMed]
[9] Zhou, W., Sailani, M.R., Contrepois, K., Zhou, Y., Ahadi, S., Leopold, S.R., et al. (2019) Longitudinal Multi-Omics of Host-Microbe Dynamics in Prediabetes. Nature, 569, 663-671. [Google Scholar] [CrossRef] [PubMed]
[10] Deek, R.A., Ma, S., Lewis, J. and Li, H. (2024) Statistical and Computational Methods for Integrating Microbiome, Host Genomics, and Metabolomics Data. eLife, 13, e88956. [Google Scholar] [CrossRef] [PubMed]
[11] Ronen, J., Hayat, S. and Akalin, A. (2019) Evaluation of Colorectal Cancer Subtypes and Cell Lines Using Deep Learning. Life Science Alliance, 2, e201900517. [Google Scholar] [CrossRef] [PubMed]
[12] Argelaguet, R., Arnol, D., Bredikhin, D., Deloro, Y., Velten, B., Marioni, J.C., et al. (2020) MOFA+: A Statistical Framework for Comprehensive Integration of Multi-Modal Single-Cell Data. Genome Biology, 21, Article No. 111. [Google Scholar] [CrossRef] [PubMed]
[13] Shen, R., Olshen, A.B. and Ladanyi, M. (2009) Integrative Clustering of Multiple Genomic Data Types Using a Joint Latent Variable Model with Application to Breast and Lung Cancer Subtype Analysis. Bioinformatics, 25, 2906-2912. [Google Scholar] [CrossRef] [PubMed]
[14] Reel, P.S., Reel, S., Pearson, E., Trucco, E. and Jefferson, E. (2021) Using Machine Learning Approaches for Multi-Omics Data Analysis: A Review. Biotechnology Advances, 49, Article 107739. [Google Scholar] [CrossRef] [PubMed]
[15] Li, X., Ma, J., Leng, L., Han, M., Li, M., He, F., et al. (2022) MoGCN: A Multi-Omics Integration Method Based on Graph Convolutional Network for Cancer Subtype Analysis. Frontiers in Genetics, 13, Article 806842. [Google Scholar] [CrossRef] [PubMed]
[16] Liu, Q. and Song, K. (2023) ProgCAE: A Deep Learning-Based Method That Integrates Multi-Omics Data to Predict Cancer Subtypes. Briefings in Bioinformatics, 24, bbad196. [Google Scholar] [CrossRef] [PubMed]
[17] Wang, F.A., Zhuang, Z., Gao, F., He, R., Zhang, S., Wang, L., et al. (2024) TMO-Net: An Explainable Pretrained Multi-Omics Model for Multi-Task Learning in Oncology. Genome Biology, 25, Article No. 149. [Google Scholar] [CrossRef] [PubMed]
[18] Devlin, J., Chang, M.W., Lee, K. and Toutanova, K. (2019) BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North, Minneapolis, June 2019, 4171-4186.
[19] He, K., Chen, X., Xie, S., Li, Y., Dollar, P. and Girshick, R. (2022) Masked Autoencoders Are Scalable Vision Learners. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 16000-16009. [Google Scholar] [CrossRef
[20] Hu, M., Zhu, J., Peng, G., Lu, W., Wang, H. and Xie, Z. (2023) IMOVNN: Incomplete Multi-Omics Data Integration Variational Neural Networks for Gut Microbiome Disease Prediction and Biomarker Identification. Briefings in Bioinformatics, 24, bbad394. [Google Scholar] [CrossRef] [PubMed]
[21] Yao, X., Huang, Z., Hu, X., Yang, J. and Guo, Y. (2024) Masking the Unknown: Leveraging Masked Samples for Enhanced Data Augmentation. Proceedings of the 40th Conference on Uncertainty in Artificial Intelligence (UAI), Barcelona, 15-19 July 2024, 3597-3606.