复杂环境下基于多策略YOLOv5s的无人机小目标检测
Small Target Detection of UAVs in Complex Environments Based on Multi-Strategy YOLOv5s
DOI: 10.12677/airr.2025.143058, PDF,    科研立项经费支持
作者: 罗 茜, 王宇泽, 王 芳*:燕山大学理学院,河北 秦皇岛;陆荣灿, 吴晓梅, 葛嘉玄:燕山大学电气工程学院,河北 秦皇岛
关键词: 无人机航拍YOLOv5s多头自注意力机制(MSHA)BiFPN网络SimAM注意力机制UAV Aerial Photography YOLOv5s Multi-Head Self-Attention Mechanism (MHSA) BiFPN Network SimAM Attention Mechanism
摘要: 针对复杂场景下无人机航拍小目标检测时特征提取的主观性和局限性,本文提出了三种改进策略:1) 为了提升无人机对不同尺度目标的检测能力,将多头自注意力机制(MHSA)融入到YOLOv5s骨干网络的最后一层;2) 为了增强特征信息的利用,构建了BiFPN特征融合网络;3) 将SimAM模块集成到YOLOv5s模型中,以提高语义与位置信息的匹配。通过将上述三种改进策略两两组合,构建了三种多策略YOLOv5s检测模型:第一种是多头自注意力机制(MHSA)与BiFPN特征融合网络的结合;第二种是多头自注意力机制(MHSA)与SimAM注意力机制的结合;第三种是SimAM注意力机制与BiFPN特征融合网络的结合。在VisDrone2019数据集上的对比实验结果表明,第二种多策略模型在检测效果上优于其他两种模型,其平均精度(mAP)提升至38.9%,比原模型提高了4.8%。
Abstract: Aiming at the subjectivity and limitations of feature extraction in small target detection of UAV aerial photography in complex scenarios, this paper proposes three improvement strategies: 1) To enhance the detection capability of UAVs for targets of different scales, the Multi-Head Self-Attention mechanism (MHSA) is integrated into the last layer of the YOLOv5s backbone network; 2) To strengthen the utilization of feature information, a Bi-directional Feature Pyramid Network (BiFPN) for feature fusion is constructed; 3) The SimAM module is incorporated into the YOLOv5s model to improve the matching of semantic and positional information. By combining the above three improvement strategies in pairs, three multi-strategy YOLOv5s detection models are built: The first model combines the Multi-Head Self-Attention mechanism (MHSA) with the BiFPN feature fusion network; The second model combines the Multi-Head Self-Attention mechanism (MHSA) with the SimAM attention mechanism; The third model combines the SimAM attention mechanism with the BiFPN feature fusion network. Comparative experiments on the VisDrone2019 dataset show that the second multi-strategy model outperforms the other two models in detection performance, which improves the mean Average Precision (mAP) to 38.9%, a 4.8% increase compared to the original model.
文章引用:罗茜, 陆荣灿, 王宇泽, 吴晓梅, 葛嘉玄, 王芳. 复杂环境下基于多策略YOLOv5s的无人机小目标检测[J]. 人工智能与机器人研究, 2025, 14(3): 590-604. https://doi.org/10.12677/airr.2025.143058

参考文献

[1] Girshick, R., Donahue, J., Darrell, T. and Malik, J. (2014) Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 24-27 June 2014, 580-587. [Google Scholar] [CrossRef
[2] Jisoo, J., Hyojin, P. and Nojun, K. (2017) Enhancement of SSD by Concatenating Feature Maps for Object Detection. arXiv: 1705.09587.
[3] 张杨, 辛国江, 王鑫, 等. 基于改进的YOLOv5网络的舌象检测算法[J]. 计算机技术与发展, 2024, 34(2): 156-162.
[4] He, K., Gkioxari, G., Dollar, P. and Girshick, R. (2017) Mask R-CNN. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 2980-2988. [Google Scholar] [CrossRef
[5] Girshick, R. (2015) Fast R-CNN. 2015 IEEE International Conference on Computer Vision (ICCV), Chile, 13-16 December 2015, 1440-1448. [Google Scholar] [CrossRef
[6] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C., et al. (2016) SSD: Single Shot Multibox Detector. In: Proceedings of the European Conference on Computer Vision (ECCV), Springer, 21-37. [Google Scholar] [CrossRef
[7] 贾世娜. 基于改进YOLOv5的小目标检测算法研究[D]: [硕士学位论文]. 南昌: 南昌大学, 2023.
[8] 郝紫霄, 王琦, 高尚. 基于YOLO-v5算法的航拍图像小目标检测改进算法[J]. 常州大学学报, 2023, 35(6): 45-51.
[9] 李华清. 基于SSD的航拍图像小目标快速检测算法研究[D]: [硕士学位论文]. 西安: 西安电子科技大学, 2019.
[10] 郭君斌, 于琳, 于传强. 改进YOLOv5s算法在交通标志检测中的应用[J]. 国防科技大学学报, 2024, 46(6): 123-130.
[11] 席光泽, 周建平, 许燕, 等. 基于改进YOLOv5s的复杂环境下棉花顶芽检测[J]. 中国农机化学报, 2024, 45(12): 275-280.
[12] 谢忠坚, 廖珩宇, 文春明, 等. 基于改进YOLOv5s的蔗节检测方法[J]. 中国农机化学报, 2024, 45(12): 224-229.
[13] 宁涛, 付世沫, 常青, 等. 基于改进YOLOv5s的无人机航拍图像目标检测[J]. 电光与控制, 2024, 31(12): 41-47+63.
[14] 吕佳铭, 张峰, 罗亚波. 基于改进YOLOv5s的烟梗物料目标检测算法[J]. 浙江大学学报(工学版), 2024, 58(12): 2438-2446.
[15] Liu, S., Qi, L., Qin, H., Shi, J. and Jia, J. (2018) Path Aggregation Network for Instance Segmentation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-22 June 2018, 8759-8768. [Google Scholar] [CrossRef