多模态深度学习中的决策级融合方法及其应用研究
Research on Decision-Level Fusion Method and Application in Multimodal Deep Learning
摘要: 单模态深度学习强大的学习能力已经在诸多应用领域取得了优异的成效。然而,单模态深度学习不能学习到某一现象完整的信息。为了缓解这一问题,多模态深度学习被提出并受到广泛关注。多模态深度学习旨在构建能够关联多个模态信息的模型。本文首先介绍多模态深度学习技术并引出多模态决策级融合,然后对多模态决策级融合中的决策算法进行详细介绍,并介绍决策算法在多模态深度学习领域中的应用现状,最后对多模态决策级融合进行总结和展望。
Abstract: Deep learning has achieved excellent results in the field of single-modal applications due to its powerful learning ability. However, single-modal deep learning cannot learn the complete information of a phenomenon. In order to alleviate this problem, multimodal deep learning has been proposed and received widespread attention. Multimodal deep learning aims to build models that can correlate multiple modal information. This paper first introduces multi-modal deep learning technology and multi-modal decision-level fusion. Then, the decision-making algorithm in multi-modal decision-level fusion is introduced in detail, and the application status of decision-making algorithm in multi-modal deep learning is introduced. Finally, the multi-modal decision-level fusion is summarized and prospected.
文章引用:田傲翔, 张长伦. 多模态深度学习中的决策级融合方法及其应用研究[J]. 图像与信号处理, 2025, 14(4): 412-421. https://doi.org/10.12677/jisp.2025.144038

参考文献

[1] 任泽裕, 王振超, 柯尊旺, 等. 多模态数据融合综述[J]. 计算机工程与应用, 2021, 57(18): 49-64.
[2] Zhang, D.J., Li, D., Le, H., Shou, M.Z., Xiong, C. and Sahoo, D. (2025) Moonshot: Towards Controllable Video Generation and Editing with Motion-Aware Multimodal Conditions. International Journal of Computer Vision, 133, 3629-3644. [Google Scholar] [CrossRef
[3] Marmon, A., Schindler, G., Lezama, J., Kondratyuk, D., Seybold, B. and Essa, I. (2024) CamVIG: Camera Aware Image-to-Video Generation with Multimodal Transformers. arXiv: 2405.13195.
[4] Hwang, J. J., Xu, R., Lin, H., Hung, W. C., Ji, J., Choi, K., Tan, M., et al. (2024) Emma: End-to-end Multimodal Mod-el for Autonomous Driving. arXiv: 2410.23262.
[5] Wang, T., Zheng, P., Li, S. and Wang, L. (2023) Multimodal Human-Robot Interaction for Human‐centric Smart Manufacturing: A Survey. Advanced Intelligent Systems, 6, Article ID: 2300359. [Google Scholar] [CrossRef
[6] Yue, Y., Wang, Y., Kang, B., Han, Y., Wang, S., Song, S., Huang, G., et al. (2024) DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution. arXiv: 2411.02359.
[7] Karbout, K., El Ghazouani, M., Lachgar, M. and Hrimech, H. (2024) Multimodal Data Fusion Techniques in Smart Healthcare. 2024 International Conference on Global Aeronautical Engineering and Satellite Technology (GAST), Marrakesh, 24-26 April 2024, 1-6. [Google Scholar] [CrossRef
[8] Krones, F., Marikkar, U., Parsons, G., Szmul, A. and Mahdi, A. (2025) Review of Multimodal Machine Learning Approaches in Healthcare. Information Fusion, 114, Article ID: 102690. [Google Scholar] [CrossRef
[9] 何俊, 张彩庆, 李小珍, 等. 面向深度学习的多模态融合技术研究综述[J]. 计算机工程, 2020, 46(5): 1-11.
[10] 费业泰. 误差理论与数据处理[M]. 北京: 机械工业出版社, 2004.
[11] Dale, A.I. (2003) Most Hnourable Remembrance: The Life and Work of Thomas Bayes. Springer Science & Business Media.
[12] Klein, L.A. (2004) Sensor and Data Fusion: A Tool for Information Assessment and Decision Making. SPIE Press. [Google Scholar] [CrossRef
[13] Vanik, M.W., Beck, J.L. and Au, S.K. (2000) Bayesian Probabilistic Approach to Structural Health Monitoring. Journal of Engineering Mechanics, 126, 738-745. [Google Scholar] [CrossRef
[14] 郑柳姜. 基于D-S证据理论的船舶碰撞风险评估[D]: [硕士学位论文]. 大连: 大连海事大学, 2024.
[15] 竺承远. 基于集成学习的在线支付欺诈检测研究[D]: [硕士学位论文]. 海口: 海南师范大学, 2024.
[16] 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016.
[17] Breiman, L. (2001) Random Forests. Machine Learning, 45, 5-32. [Google Scholar] [CrossRef
[18] Schapire, R.E. (2013) Explaining AdaBoost. In: Schölkopf, B., Luo, Z. and Vovk, V., Eds., Empirical Inference, Springer Berlin Heidelberg, 37-52. [Google Scholar] [CrossRef
[19] Zong, D., Ding, C., Li, B., Zhou, D., Li, J., Zheng, K., et al. (2023) Building Robust Multimodal Sentiment Recognition via a Simple Yet Effective Multimodal Transformer. Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, 29 October-3 November 2023, 9596-9600. [Google Scholar] [CrossRef
[20] Wang, Y., He, J., Wang, D., Wang, Q., Wan, B. and Luo, X. (2024) Multimodal Transformer with Adaptive Modality Weighting for Multimodal Sentiment Analysis. Neurocomputing, 572, Article ID: 127181. [Google Scholar] [CrossRef
[21] Lu, P., Hu, L., Mitelpunkt, A., Bhatnagar, S., Lu, L. and Liang, H. (2024) A Hierarchical Attention-Based Multimodal Fusion Framework for Predicting the Progression of Alzheimer’s Disease. Biomedical Signal Processing and Control, 88, Article ID: 105669. [Google Scholar] [CrossRef
[22] Xie, Z., Yang, Y., Wang, J., Liu, X. and Li, X. (2024) Trustworthy Multimodal Fusion for Sentiment Analysis in Ordinal Sentiment Space. IEEE Transactions on Circuits and Systems for Video Technology, 34, 7657-7670. [Google Scholar] [CrossRef
[23] Strelet, E., Wang, Z., Peng, Y., Castillo, I., Rendall, R. and Reis, M.S. (2024) Regularized Bayesian Fusion for Multimodal Data Integration in Industrial Processes. Industrial & Engineering Chemistry Research, 63, 20989-21000. [Google Scholar] [CrossRef
[24] Zhou, Y., Chen, P., Fan, Y. and Wu, Y. (2024) A Multimodal Feature Fusion Brain Fatigue Recognition System Based on Bayes-gcForest. Sensors, 24, Article 2910. [Google Scholar] [CrossRef] [PubMed]
[25] 郭家星. 基于深度学习与贝叶斯融合的结构健康监测地震事件可靠识别[D]: [硕士学位论文]. 昆明: 昆明理工大学, 2024.
[26] 耿彦涛. 基于深度学习与D-S证据理论的多模态数据融合方法研究[D]: [硕士学位论文]. 西安: 西安理工大学, 2023.
[27] Yang, Y., Zhang, L., Xu, G., Ren, G. and Wang, G. (2024) An Evidence-Based Multimodal Fusion Approach for Predicting Review Helpfulness with Human-AI Complementarity. Expert Systems with Applications, 238, Article ID: 121878. [Google Scholar] [CrossRef
[28] Wang, X. and Qin, J. (2023) Multimodal Recommendation Algorithm Based on Dempster-Shafer Evidence Theory. Multimedia Tools and Applications, 83, 28689-28704. [Google Scholar] [CrossRef
[29] Huang, L., Ruan, S., Decazes, P. and Denœux, T. (2025) Deep Evidential Fusion with Uncertainty Quantification and Reliability Learning for Multimodal Medical Image Segmentation. Information Fusion, 113, Article ID: 102648. [Google Scholar] [CrossRef
[30] Li, S. (2023) Construction of a Multimodal Poetry Translation Corpus Based on AdaBoost Model. Applied Mathematics and Nonlinear Sciences, 9. [Google Scholar] [CrossRef
[31] Palmal, S., Arya, N., Saha, S. and Tripathy, S. (2024) Integrative Prognostic Modeling for Breast Cancer: Unveiling Optimal Multimodal Combinations Using Graph Convolutional Networks and Calibrated Random Forest. Applied Soft Computing, 154, Article ID: 111379. [Google Scholar] [CrossRef
[32] Avabratha, V.V., Rana, S., Narayan, S., Raju, S.Y. and S, S. (2024) Speech and Facial Emotion Recognition Using Convolutional Neural Network and Random Forest: A Multimodal Analysis. 2024 Asia Pacific Conference on Innovation in Technology (APCIT), Mysore, 26-27 July 2024, 1-5. [Google Scholar] [CrossRef
[33] Ahmed, K.R., Hossain, A., Asif, M.A.B., Mohammad, M., Rahaman, M. and Dewan, M.A. (2025) Optimizing Production and Inventory Management for Defective Items in Manufacturing Systems. 2025 3rd International Conference on Intelligent Systems, Advanced Computing and Communication (ISACC), Silchar, 27-28 February 2025, 86-91. [Google Scholar] [CrossRef