基于多尺度空谱交互网络的多光谱目标检测

doi:10.12677/mos.2025.144279

期刊菜单

基于多尺度空谱交互网络的多光谱目标检测
Multi-Spectral Object Detection Based on Multi-Scale Spatial-Spectral Interaction Network

DOI: 10.12677/mos.2025.144279, PDF,
作者: 陆召阳, 张荣福, 景李, 魏辉光：上海理工大学光电信息与计算机工程学院，上海
关键词: 深度学习；多光谱目标检测；特征融合；Deep Learning； Multi-Spectral Object Detection； Feature Fusion

摘要: 近年来，可见光图像与热红外图像结合的多光谱目标检测，因其互补特性得到了广泛应用。然后，现有的大多数多光谱目标检测模型主要关注图像的局部特性，忽视了图像全局特征的提取，同时在特征提取过程中往往会丢失关键信息，如纹理和边缘等细节，导致提取的图像特征信息不足。针对这些问题，本文提出了多光谱目标检测模型SSIDet。该模型通过构建多尺度编码网络，分别从热红外图像和可见光图像中提取不同尺度的局部–全局特征；接着设计了一种空间–光谱交互注意力网络，充分融合空间特征和光谱特征，同时通过减少特征之间的冗余来增强其互补性；最后引入多尺度重建网络，进一步实现空间特征与光谱特征的协同增强。通过在FLIR和LLVIP数据集上的大量实验验证，本文方法在性能上优于现有方法。

Abstract: In recent years, multispectral object detection combining visible and thermal infrared images has been widely used due to its complementary characteristics. Then, most of the existing multispectral object detection models mainly focus on the local characteristics of the image, neglecting the extraction of global features of the image, and at the same time, key information, such as details of texture and edges, are often lost in the process of feature extraction, which leads to insufficient information of the extracted image features. Aiming at these problems, this paper proposes a multispectral object detection model SSIDet. The model extracts local-global features at different scales from thermal infrared images and visible images respectively by constructing a multiscale coding network; then a spatial-spectral interactive attention network is designed to fully integrate spatial and spectral features, and at the same time, its complementarity is enhanced by reducing redundancy between features; finally, a multiscale reconstruction network is introduced to further enhance feature extraction, and a multiscale reconstruction network is introduced to further enhance feature extraction. A multi-scale reconstruction network is introduced to further realize the synergistic enhancement of spatial and spectral features. Through extensive experimental validation on FLIR and LLVIP datasets, the method of this paper outperforms the existing methods in terms of performance.

文章引用：陆召阳, 张荣福, 景李, 魏辉光. 基于多尺度空谱交互网络的多光谱目标检测[J]. 建模与仿真, 2025, 14(4): 205-216. https://doi.org/10.12677/mos.2025.144279

参考文献

[1]	Wen, L., Du, D., Cai, Z., Lei, Z., Chang, M., Qi, H., et al. (2020) UA-DETRAC: A New Benchmark and Protocol for Multi-Object Detection and Tracking. Computer Vision and Image Understanding, 193, Article 102907. [Google Scholar] [CrossRef]
[2]	Nascimento, J.C. and Marques, J.S. (2006) Performance Evaluation of Object Detection Algorithms for Video Surveillance. IEEE Transactions on Multimedia, 8, 761-774. [Google Scholar] [CrossRef]
[3]	Li, B., Xie, X., Wei, X. and Tang, W. (2021) Ship Detection and Classification from Optical Remote Sensing Images: A Survey. Chinese Journal of Aeronautics, 34, 145-163. [Google Scholar] [CrossRef]
[4]	Xia, G., Bai, X., Ding, J., Zhu, Z., Belongie, S., Luo, J., et al. (2018) DOTA: A Large-Scale Dataset for Object Detection in Aerial Images. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 3974-3983. [Google Scholar] [CrossRef]
[5]	Yan, H., Li, B., Zhang, H. and Wei, X. (2022) An Antijamming and Lightweight Ship Detector Designed for Spaceborne Optical Images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 15, 4468-4481. [Google Scholar] [CrossRef]
[6]	Wei, Y., Zhao, L., Zheng, W., Zhu, Z., Zhou, J. and Lu, J. (2023) SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, 1-6 October 2023, 21672-21683. [Google Scholar] [CrossRef]
[7]	Yu, F., Chen, H., Wang, X., Xian, W., Chen, Y., Liu, F., et al. (2020) BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 2633-2642. [Google Scholar] [CrossRef]
[8]	Wei, X. and Zhao, S. (2024) Boosting Adversarial Transferability with Learnable Patch-Wise Masks. IEEE Transactions on Multimedia, 26, 3778-3787. [Google Scholar] [CrossRef]
[9]	Kim, J.U., Park, S. and Ro, Y.M. (2022) Uncertainty-Guided Cross-Modal Learning for Robust Multispectral Pedestrian Detection. IEEE Transactions on Circuits and Systems for Video Technology, 32, 1510-1523. [Google Scholar] [CrossRef]
[10]	Song, S., Miao, Z., Yu, H., Fang, J., Zheng, K., Ma, C., et al. (2022) Deep Domain Adaptation Based Multi-Spectral Salient Object Detection. IEEE Transactions on Multimedia, 24, 128-140. [Google Scholar] [CrossRef]
[11]	Xie, Z., Shao, F., Chen, G., Chen, H., Jiang, Q., Meng, X., et al. (2023) Cross-Modality Double Bidirectional Interaction and Fusion Network for RGB-T Salient Object Detection. IEEE Transactions on Circuits and Systems for Video Technology, 33, 4149-4163. [Google Scholar] [CrossRef]
[12]	Wang, K., Tu, Z., Li, C., Zhang, C. and Luo, B. (2024) Learning Adaptive Fusion Bank for Multi-Modal Salient Object Detection. IEEE Transactions on Circuits and Systems for Video Technology, 34, 7344-7358. [Google Scholar] [CrossRef]
[13]	Liu, J., Zhang, S., Wang, S. and Metaxas, D. (2016) Multispectral Deep Neural Networks for Pedestrian Detection. Procedings of the British Machine Vision Conference 2016, New York, 19-22 September 2016, 1-13. [Google Scholar] [CrossRef]
[14]	Li, C. Song, D. Tong, R. and Tang, M. (2018) Multispectral Pedestrian Detection via Simultaneous Detection and Segmentation. British Machine Vision Conference (BMVC) 2018, Newcastle, 3-6 September 2018, 225.
[15]	Cao, Y., Guan, D., Wu, Y., Yang, J., Cao, Y. and Yang, M.Y. (2019) Box-Level Segmentation Supervised Deep Neural Networks for Accurate and Real-Time Multispectral Pedestrian Detection. ISPRS Journal of Photogrammetry and Remote Sensing, 150, 70-79. [Google Scholar] [CrossRef]
[16]	Zhou, K., Chen, L. and Cao, X. (2020) Improving Multispectral Pedestrian Detection by Addressing Modality Imbalance Problems. Computer Vision—ECCV 2020, Glasgow, 23-28 August 2020, 787-803. [Google Scholar] [CrossRef]
[17]	Liu, Q., Zhou, H., Xu, Q., Liu, X. and Wang, Y. (2021) PSGAN: A Generative Adversarial Network for Remote Sensing Image Pan-Sharpening. IEEE Transactions on Geoscience and Remote Sensing, 59, 10227-10242. [Google Scholar] [CrossRef]
[18]	Diao, W., Zhang, F., Sun, J., Xing, Y., Zhang, K. and Bruzzone, L. (2023) ZerGAN: Zero-Reference GAN for Fusion of Multispectral and Panchromatic Images. IEEE Transactions on Neural Networks and Learning Systems, 34, 8195-8209. [Google Scholar] [CrossRef] [PubMed]
[19]	Lee, W., Jovanov, L. and Philips, W. (2023) Cross-modality Attention and Multimodal Fusion Transformer for Pedestrian Detection. Computer Vision—ECCV 2022 Workshops, Tel Aviv, 23-27 October 2022, 608-623. [Google Scholar] [CrossRef]
[20]	Fang, Q., Han, D. and Wang, Z. (2022) Cross-Modality Fusion Transformer for Multispectral Object Detection. arXiv: 2111.00273. [Google Scholar] [CrossRef]
[21]	You, S., Xie, X., Feng, Y., Mei, C. and Ji, Y. (2023) Multi-Scale Aggregation Transformers for Multispectral Object Detection. IEEE Signal Processing Letters, 30, 1172-1176. [Google Scholar] [CrossRef]
[22]	Zhang, H., Fromont, E., Lefevre, S. and Avignon, B. (2020) Multispectral Fusion for Object Detection with Cyclic Fuse-and-Refine Blocks. 2020 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, 25-28 October 2020, 276-280. [Google Scholar] [CrossRef]
[23]	Jia, X., Zhu, C., Li, M., Tang, W. and Zhou, W. (2021) LLVIP: A Visible-Infrared Paired Dataset for Low-Light Vision. 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, 11-17 October 2021, 3489-3497. [Google Scholar] [CrossRef]
[24]	Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C., et al. (2016) SSD: Single Shot Multibox Detector. Computer Vision—ECCV 2016, Amsterdam, 11-14 October 2016, 21-37. [Google Scholar] [CrossRef]
[25]	Lin, T., Goyal, P., Girshick, R., He, K. and Dollar, P. (2017) Focal Loss for Dense Object Detection. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 2999-3007. [Google Scholar] [CrossRef]
[26]	Cai, Z. and Vasconcelos, N. (2021) Cascade R-CNN: High Quality Object Detection and Instance Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 1483-1498. [Google Scholar] [CrossRef] [PubMed]
[27]	Ren, S., He, K., Girshick, R. and Sun, J. (2017) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1137-1149. [Google Scholar] [CrossRef] [PubMed]
[28]	Zhang, S., Wang, X., Wang, J., Pang, J., Lyu, C., Zhang, W., et al. (2023) Dense Distinct Query for End-to-End Object Detection. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 7329-7338. [Google Scholar] [CrossRef]
[29]	Zhang, H., Fromont, E., Lefevre, S. and Avignon, B. (2021) Guided Attentive Feature Fusion for Multispectral Pedestrian Detection. 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, 3-8 January 2021, 72-80. [Google Scholar] [CrossRef]
[30]	Chen, Y., Shi, J., Ye, Z., Mertz, C., Ramanan, D. and Kong, S. (2022) Multimodal Object Detection via Probabilistic Ensembling. Computer Vision—ECCV 2022, Tel Aviv, 23-27 October 2022, 139-158. [Google Scholar] [CrossRef]
[31]	Zuo, X., Wang, Z., Liu, Y., Shen, J. and Wang, H. (2022) LGADet: Light-Weight Anchor-Free Multispectral Pedestrian Detection with Mixed Local and Global Attention. Neural Processing Letters, 55, 2935-2952. [Google Scholar] [CrossRef]
[32]	Cao, Y., Bin, J., Hamari, J., Blasch, E. and Liu, Z. (2023) Multimodal Object Detection by Channel Switching and Spatial Attention. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Vancouver, 17-24 June 2023, 403-411. [Google Scholar] [CrossRef]

为你推荐

友情链接