基于跨尺度动态特征金字塔的无人机图像车辆检测算法

doi:10.12677/mos.2025.142138

期刊菜单

基于跨尺度动态特征金字塔的无人机图像车辆检测算法
Vehicle Detection Algorithm in UAV Images Based on Cross-Scale Dynamic Feature Pyramid

DOI: 10.12677/mos.2025.142138, PDF,
作者: 何佳桥, 李朝阳：上海理工大学光电信息与计算机工程学院，上海
关键词: 小目标检测；IOU；注意力机制；动态采样；Small Object Detection； IOU； Attention Mechanism； Dynamic Sampling

摘要: 近年来，无人机(UAV)在交通监控和智能停车等多个领域得到了广泛应用，其中车辆的实时监测和分类成为其关键任务之一。车辆检测面临多种挑战，尤其是在小型车辆和无人机飞行角度变化引起的目标尺度变化下，检测网络优化的难度加大。此外，高空航拍图像中的小目标使得可提取的特征有限，进一步影响检测精度。为了解决这些问题，本文基于YOLOv8算法提出了一种高效且实时的车辆检测网络，主要改进包括：1) 在网络的backbone部分引入CPCA注意力模块，以增强模型对小目标的关注能力，进而提升特征提取效果；2) 对YOLOv8的Neck结构进行改进，借鉴DAMO-YOLO中的GFPN思想，以较小的参数量显著提升了检测精度，同时将传统的双线性插值上采样替换为DySample动态上采样，使模型能更好地适应目标尺度变化，最终构建了Cross-Scale Dynamic Feature Pyramid Network (CS-DyFPN)网络；3) 提出了Inner-Focaler-IoU损失，结合了Inner-IoU与Focaler-IoU的优势，能够自适应地聚焦困难样本，相比CIOU提升了检测精度。实验结果表明，本文方法在VisDrone2019数据集上相较于原始YOLOv8算法，在实时性和准确性方面取得了显著提升，特别是在小目标检测任务中表现优异。

Abstract: In recent years, unmanned aerial vehicles (UAVs) have been widely applied in various fields, such as traffic monitoring and smart parking, where real-time vehicle detection and classification have become critical tasks. Vehicle detection faces several challenges, particularly due to target scale variations caused by small vehicles and changes in the flight angle of drones, which complicate network optimization. Additionally, small targets in aerial images limit the features that can be extracted, further affecting detection accuracy. To address these issues, this paper proposes an efficient and real-time vehicle detection network based on the YOLOv8 algorithm. The main improvements include: 1) Introducing the CPCA attention module into the backbone of the network to enhance the model’s focus on small targets, thereby improving feature extraction; 2) Modifying the Neck structure of YOLOv8, inspired by the GFPN concept from DAMO-YOLO, which significantly improves detection accuracy with fewer parameters. Additionally, the traditional bilinear interpolation upsampling is replaced by DySample dynamic upsampling to better adapt to target scale variations, resulting in the Cross-Scale Dynamic Feature Pyramid Network (CS-DyFPN); 3) Proposing the Inner-Focaler-IoU loss, which combines the advantages of Inner-IoU and Focaler-IoU, allowing the model to focus on difficult samples and improving detection accuracy compared to CIOU. Experimental results show that the proposed method significantly improves both real-time performance and accuracy on the VisDrone2019 dataset, particularly excelling in small target detection tasks compared to the original YOLOv8 algorithm.

文章引用：何佳桥, 李朝阳. 基于跨尺度动态特征金字塔的无人机图像车辆检测算法[J]. 建模与仿真, 2025, 14(2): 127-141. https://doi.org/10.12677/mos.2025.142138

参考文献

[1]	江波, 屈若锟, 李彦冬, 等. 基于深度学习的无人机航拍目标检测研究综述[J]. 航空学报, 2021, 42(4): 131-145.
[2]	Byun, S., Shin, I., Moon, J., Kang, J. and Choi, S. (2021) Road Traffic Monitoring from UAV Images Using Deep Learning Networks. Remote Sensing, 13, Article 4027. [Google Scholar] [CrossRef]
[3]	Girshick, R., Donahue, J., Darrell, T. and Malik, J. (2014) Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 23-28 June 2014, 580-587. [Google Scholar] [CrossRef]
[4]	Girshick, R. (2015) Fast R-CNN. 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 7-13 December 2015, 1440-1448. [Google Scholar] [CrossRef]
[5]	Ren, S., He, K., Girshick, R. and Sun, J. (2017) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1137-1149. [Google Scholar] [CrossRef] [PubMed]
[6]	Redmon, J., Divvala, S., Girshick, R. and Farhadi, A. (2016) You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 779-788. [Google Scholar] [CrossRef]
[7]	Redmon, J. and Farhadi, A. (2018) YOLOv3: An Incremental Improvement. arXiv: 1804.02767. https://arxiv.org/abs/1804.02767
[8]	Bochkovskiy, A., Wang, C. and Liao, H.M. (2020) YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv: 2004.10934. [Google Scholar] [CrossRef]
[9]	Jocher, G., Stoken, A., Borovec, J., et al. (2020) Ultralytics/yolov5. https://github.com/ultralytics/yolov5
[10]	Reis, D., Kupec, J., Hong, J., et al. (2023) Real-Time Flying Object Detection with YOLOv8. arXiv: 2305.09972.
[11]	Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C., et al. (2016) SSD: Single Shot Multibox Detector. In: Leibe, B., Matas, J., Sebe, N. and Welling, M., Eds., Computer Vision—ECCV 2016, Springer, 21-37. [Google Scholar] [CrossRef]
[12]	Deng, Z., et al. (2023) Improved YOLOv5 Helmet Wear Detection Algorithm for Small Targets. Computer Engineering and Applications, 60, 78-87.
[13]	Wang, H., Han, D., Cui, M. and Chen, C. (2023) NAS-YOLOX: A SAR Ship Detection Using Neural Architecture Search and Multi-Scale Attention. Connection Science, 35, 1-32. [Google Scholar] [CrossRef]
[14]	Li, X., et al. (2023) Improved Target Detection Algorithm for UAV Aerial Images with YOLOv5. Computer Engineering and Applications, 23, 5786.
[15]	Cheng, H., et al. (2023) Target Detection Algorithm for UAV Aerial Images Based on Improved YOLOv8. Radiotehnika, 14, 1-10.
[16]	Wang, G., Chen, Y., An, P., Hong, H., Hu, J. and Huang, T. (2023) UAV-YOLOv8: A Small-Object-Detection Model Based on Improved YOLOv8 for UAV Aerial Photography Scenarios. Sensors, 23, Article 7190. [Google Scholar] [CrossRef] [PubMed]
[17]	Huang, H., Chen, Z., Zou, Y., Lu, M., Chen, C., Song, Y., et al. (2024) Channel Prior Convolutional Attention for Medical Image Segmentation. Computers in Biology and Medicine, 178, Article ID: 108784. [Google Scholar] [CrossRef] [PubMed]
[18]	Hu, J., Shen, L. and Sun, G. (2018) Squeeze-and-Excitation Networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 7132-7141. [Google Scholar] [CrossRef]
[19]	Woo, S., Park, J., Lee, J. and Kweon, I.S. (2018) CBAM: Convolutional Block Attention Module. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., Computer Vision—ECCV 2018, Springer, 3-19. [Google Scholar] [CrossRef]
[20]	Lin, T., Dollar, P., Girshick, R., He, K., Hariharan, B. and Belongie, S. (2017) Feature Pyramid Networks for Object Detection. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 936-944. [Google Scholar] [CrossRef]
[21]	Liu, S., Qi, L., Qin, H.F., Shi, J.P. and Jia, J.Y. (2018) Path Aggregation Network for Instance Segmentation. arXiv: 1803.01534. [Google Scholar] [CrossRef]
[22]	Jiang, Y., Tan, Z., Wang, J., Sun, X., Lin, M. and Li, H. (2022) Giraffe Det: A Heavy-Neck Paradigm for Object Detection. arXiv: 2202.04256.
[23]	Liu, S., Qi, L., Qin, H., Shi, J. and Jia, J. (2018) Path Aggregation Network for Instance Segmentation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 8759-8768. [Google Scholar] [CrossRef]
[24]	Tan, M., Pang, R. and Le, Q.V. (2020) EfficientDet: Scalable and Efficient Object Detection. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 10778-10787. [Google Scholar] [CrossRef]
[25]	Liu, W., Lu, H., Fu, H. and Cao, Z. (2023) Learning to Upsample by Learning to Sample. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, 1-6 October 2023, 6004-6014. [Google Scholar] [CrossRef]
[26]	Wang, J., Chen, K., Xu, R., Liu, Z., Loy, C.C. and Lin, D. (2019) CARAFE: Content-Aware Reassembly of Features. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 3007-3016. [Google Scholar] [CrossRef]
[27]	Zhang, H., Xu, C. and Zhang, S.J. (2023) Inner-IoU: More Effective Intersection over Union Loss with Auxiliary Bounding Box. arXiv: 2311.02877. https://arxiv.org/abs/2311.02877
[28]	Zhang, H. and Zhang, S.J. (2024) Focaler-IoU: More Focused Intersection over Union Loss. arXiv: 2401.10525. https://arxiv.org/abs/2401.10525

为你推荐

友情链接