基于Transformer-CNN的红外与可见光融合网络
Infrared and Visible Image Fusion Network Based on Transformer-CNN
摘要: 红外与可见光图像融合是多模态融合领域的重要研究方向之一,广泛应用于医药、工业、自动驾驶等领域。本文提出了一种基于Transformer-CNN网络的融合算法。首先,构建双分支特征提取网络提取不同模态图像的特征信息;然后,在融合阶段通过模态间特征差分计算提取互补特征,采用参数自适应的Swish激活函数动态生成通道权重,结合全局平均池化压缩与特征级联策略,实现红外与可见光模态的跨尺度特征融合。最后,提出一种掩码损失以提升融合质量。在公开数据集上的实验结果表明,该方法在进行红外和可见光图像融合后图像具有清晰的背景纹理细节。在主观和客观评价上均能取得较好或相近的结果。
Abstract: Infrared and visible image fusion is a significant research direction in multimodal fusion, with extensive applications in medicine, industry, autonomous driving, and other fields. This paper proposes a fusion algorithm based on a Transformer-CNN hybrid network. First, a dual-branch feature extraction network is constructed to extract features from different modal images. Subsequently, during the fusion stage, complementary features are extracted through cross-modal feature difference computation. A parameter-adaptive Swish activation function is employed to dynamically generate channel weights, which are combined with global average pooling for feature compression and cascading strategies, enabling cross-scale feature fusion of infrared and visible modalities. Finally, a mask loss is introduced to enhance fusion quality. Experimental results on public datasets demonstrate that the fused images produced by this method exhibit clear background texture details. Both subjective evaluation and objective metrics indicate that our approach achieves superior or comparable results.
文章引用:李玉, 张志超, 祁艳杰. 基于Transformer-CNN的红外与可见光融合网络[J]. 图像与信号处理, 2025, 14(4): 377-386. https://doi.org/10.12677/jisp.2025.144035

参考文献

[1] Tang, L., Zhang, H., Xu, H. and Ma, J. (2023) Deep Learning-Based Image Fusion: A Survey. Journal of Image and Graphics, 28, 3-36. [Google Scholar] [CrossRef
[2] 常天庆, 张杰, 赵立阳, 等. 基于可见光与红外图像融合的装甲目标检测算法[J]. 兵工学报, 2024, 45(7): 2085-2096.
[3] 黄渝萍, 李伟生. 医学图像融合方法综述[J]. 中国图象图形学报, 2023, 28(1): 118-143.
[4] 张宏钢, 杨海涛, 郑逢杰, 等. 特征级红外与可见光图像融合方法综述[J]. 计算机工程与应用, 2024, 60(18): 17-31.
[5] Liu, J., Fan, X., Jiang, J., Liu, R. and Luo, Z. (2022) Learning a Deep Multi-Scale Feature Ensemble and an Edge-Attention Guidance for Image Fusion. IEEE Transactions on Circuits and Systems for Video Technology, 32, 105-119. [Google Scholar] [CrossRef
[6] Zhang, Y., Liu, Y., Sun, P., Yan, H., Zhao, X. and Zhang, L. (2020) IFCNN: A General Image Fusion Framework Based on Convolutional Neural Network. Information Fusion, 54, 99-118. [Google Scholar] [CrossRef
[7] Zhang, H., Le, Z., Shao, Z., Xu, H. and Ma, J. (2021) MFF-GAN: An Unsupervised Generative Adversarial Network with Adaptive and Gradient Joint Constraints for Multi-Focus Image Fusion. Information Fusion, 66, 40-53. [Google Scholar] [CrossRef
[8] Zheng, Q., Zhao, Y., Zhang, X., Zhu, P. and Ma, W. (2022) A Multi‐View Image Fusion Algorithm for Industrial Weld. IET Image Processing, 17, 193-203. [Google Scholar] [CrossRef
[9] Zhou, W., Bovik, A.C., Sheikh, H.R. and Simoncelli, E.P. (2004) Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Transactions on Image Processing, 13, 600-612. [Google Scholar] [CrossRef] [PubMed]
[10] Tang, L., Yuan, J., Zhang, H., Jiang, X. and Ma, J. (2022) PIAFusion: A Progressive Infrared and Visible Image Fusion Network Based on Illumination Aware. Information Fusion, 83, 79-92. [Google Scholar] [CrossRef
[11] Xu, H. (2020) RoadScene Database.
https://github.com/hanna-xu/RoadScene
[12] Xu, H., Ma, J., Jiang, J., Guo, X. and Ling, H. (2022) U2Fusion: A Unified Unsupervised Image Fusion Network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 502-518. [Google Scholar] [CrossRef] [PubMed]
[13] Ma, J., Zhang, H., Shao, Z., Liang, P. and Xu, H. (2021) GANMcC: A Generative Adversarial Network with Multiclassification Constraints for Infrared and Visible Image Fusion. IEEE Transactions on Instrumentation and Measurement, 70, 1-14. [Google Scholar] [CrossRef
[14] Ma, J., Tang, L., Xu, M., Zhang, H. and Xiao, G. (2021) STDFusionNet: An Infrared and Visible Image Fusion Network Based on Salient Target Detection. IEEE Transactions on Instrumentation and Measurement, 70, 1-13. [Google Scholar] [CrossRef
[15] Tang, W., He, F. and Liu, Y. (2024) ITFuse: An Interactive Transformer for Infrared and Visible Image Fusion. Pattern Recognition, 156, Article ID: 110822. [Google Scholar] [CrossRef
[16] Liu, K., Li, M., Chen, C., Rao, C., Zuo, E., Wang, Y., et al. (2024) Dsfusion: Infrared and Visible Image Fusion Method Combining Detail and Scene Information. Pattern Recognition, 154, Article ID: 110633. [Google Scholar] [CrossRef
[17] Qian, Y., Liu, G., Tang, H., Xing, M. and Chang, R. (2024) BTSFusion: Fusion of Infrared and Visible Image via a Mechanism of Balancing Texture and Salience. Optics and Lasers in Engineering, 173, Article ID: 107925. [Google Scholar] [CrossRef
[18] Li, H., Wu, X. and Kittler, J. (2021) RFN-Nest: An End-to-End Residual Fusion Network for Infrared and Visible Images. Information Fusion, 73, 72-86. [Google Scholar] [CrossRef