基于Transformer的遥感图像目标检测算法研究

doi:10.12677/csa.2024.144081

期刊菜单

基于Transformer的遥感图像目标检测算法研究
Research on Remote Sensing Image Target Detection Algorithm Based on Transformer

DOI: 10.12677/csa.2024.144081, PDF,
作者: 魏玉梅, 江涛^*, 白金燕：云南民族大学数学与计算机科学学院，云南昆明
关键词: 遥感图像；目标检测；Transformer；SE注意力机制；Remote Sensing Image； Target Detection； Transformer； SE Attention Mechanism

摘要: 针对遥感图像中目标特征不明显等导致的精度低、性能差问题。我们给出基于改进Transformer的遥感图像目标检测模型。首先，运用迁移学习加载模型，并且用ResNet101替换原始主干；其次在特征提取阶段，在主干网的bottlenet层中引入SE注意力机制；最后，将原有损失函数优化为L1损失和CIOU损失的结合。实验结果证实，改进之后的模型相对于基准而言，在精度和性能上都有一定的提高。

Abstract: Aiming at the problem of low accuracy and poor performance caused by unobvious target features in remote sensing images, we give a remote sensing image target detection model based on improved Transformer. Firstly, transfer learning is used to load the model, and ResNet101 is used to replace the original trunk. Secondly, in the feature extraction stage, the SE attention mechanism is introduced into the bottlenet layer of the backbone network; finally, the original loss function is optimized to a combination of L1 loss and CIOU loss. The experimental results show that the improved model has a certain improvement in accuracy and performance compared with the benchmark.

文章引用：魏玉梅, 江涛, 白金燕. 基于Transformer的遥感图像目标检测算法研究[J]. 计算机科学与应用, 2024, 14(4): 105-114. https://doi.org/10.12677/csa.2024.144081

参考文献

[1]	Yang, F., Fan, H., Chu, P., et al. (2019) Clustered Object Detection in Aerial Images. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 8310-8319. [Google Scholar] [CrossRef]
[2]	张意, 阚子文, 邵志敏, 等. 基于注意力机制和感知损失的遥感图像去噪[J]. 四川大学学报(自然科学版), 2021, 58(4): 45-55.
[3]	李国祥, 马文斌, 王继军. 稠密特征编码的遥感场景分类算法[J]. 小型微型计算机系统, 2021, 42(4): 766-772.
[4]	刘通, 胡亮, 王永军, 等. 基于卷积神经网络的卫星遥感图像拼接[J]. 吉林大学学报(理学版), 2022, 60(1): 99-108.
[5]	Cheng, G. and Han, J. (2016) A Survey on Object Detection in Optical Remote Sensing Images. ISPRS Journal of Photogrammetry and Remote Sensing, 117, 11-28. [Google Scholar] [CrossRef]
[6]	Maktav, D. and Berberoglu, S. (2018) Different Digital Image Processing Methods for Remote Sensing Applications. Journal of the Indian Society of Remote Sensing, 46, 1201-1202. [Google Scholar] [CrossRef]
[7]	Wei, W., Zhang, J., Zhang, L., et al. (2018) Deep Cube-Pair Network for Hyperspectral Imagery Classification. Remote Sensing, 10, Article 783. [Google Scholar] [CrossRef]
[8]	李章维, 胡安顺, 王晓飞. 基于视觉的目标检测方法综述[J]. 计算机工程与应用, 2020, 56(8): 1-9.
[9]	Lowe, D.G. (2004) Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 60, 91-110. [Google Scholar] [CrossRef]
[10]	Kuang, H.-L., Chan, L.L.H. and Yan, H. (2015) Multi-Class Fruit Detection Based on Multiple Color Channels. 2015 International Conference on Wavelet Analysis and Pattern Recognition (ICWAPR), Guangzhou, 12-15 July 2015, 1-7. [Google Scholar] [CrossRef]
[11]	Felzenszwalb, P., McAllester, D. and Ramanan, D. (2008) A Discriminatively Trained, Multiscale, Deformable Part model. 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, 23-28 June 2008, 1-8. [Google Scholar] [CrossRef]
[12]	Girshick, R., Donahue, J., Darrell, T., et al. (2014) Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 23-28 June 2014, 580-587. [Google Scholar] [CrossRef]
[13]	Girshick, R. (2015) Fast R-CNN. 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 7-13 December 2015, 1440-1448. [Google Scholar] [CrossRef]
[14]	Ren, S., He, K., Girshick, R.B., et al. (2017) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1137-1149. [Google Scholar] [CrossRef]
[15]	Redmon, J., Santosh, K.D., Ross, B.G., et al. (2016) You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 779-788. [Google Scholar] [CrossRef]
[16]	Redmon, J. and Farhadi, A. (2017) Yolo9000: Better, Faster, Stronger. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 6517-6525. [Google Scholar] [CrossRef]
[17]	Redmon, J. and Farhadi, J. (2018) YOLOv3: An Incremental Improvement. arXiv:1804.02767.
[18]	Ge, Z., Liu, S.T., Wang, F., et al. (2021) YOLOX: Exceeding YOLO Series in 2021. arXiv:2107.08430.
[19]	张馨月, 降爱莲. 融合特征增强和自注意力的SSD小目标检测算法[J]. 计算机工程与应用, 2022, 58(5): 247-255.
[20]	Vaswani, A., Shazeer, N., Parmar, N., et al. (2017) Attention Is All You Need. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 6000-6010.
[21]	Ding, J., Xue, D., Long, Y., et al. (2019) Learning RoI Transformer for Oriented Object Detection in Aerial Images. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, 15-20 June 2019, 2849-2858. [Google Scholar] [CrossRef]
[22]	Yang, X., Yan, J.C., Feng, Z.M., et al. (2021) R3Det: Refined Single-Stage Detector with Feature Refinement for Rotating Object. Proceedings of the AAAI Conference on Artificial Intelligence, 35, 3163-3171. [Google Scholar] [CrossRef]
[23]	Ma, T., Mao, M., Zheng, H., et al. (2021) Oriented Object Detection with Transformer. arXiv:2106.03146.
[24]	Carion, N., Massa, F., Synnaeve, G., et al. (2020) End-to-End Object Detection with Transformers. In: Vedaldi, A., Bischof, H., Brox, T. and Frahm, JM., Eds., Computer Vision—ECCV 2020. ECCV 2020. Lecture Notes in Computer Science, Springer, Cham, 213-229. [Google Scholar] [CrossRef]
[25]	Hu, J., Shen, L., Albanie, S., et al. (2020) Squeeze-and-Excitation Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 2011-2023. [Google Scholar] [CrossRef]
[26]	Zheng, Z.H., Wang, P., Liu, W., et al. (2020) Distance-Iou Loss: Faster and Better Learning for Bounding Box Regression. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 12993-13000. [Google Scholar] [CrossRef]
[27]	Xia, G.S., Bai, X., Ding, J., et al. (2018) DOTA: A Large-Scale Dataset for Object Detection in Aerial Images. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 3974-3983. [Google Scholar] [CrossRef]

为你推荐

友情链接