融合全局特征和局部特征的行人遮挡目标检测
Detecting Occluded Pedestrian Targets by Integrating Global and Local Features
摘要: 在遮挡图像中,行人目标通常被其他物体部分或完全遮挡,导致其外观特征不完整、边缘模糊,甚至与背景或遮挡物混淆。行人遮挡目标的检测需要算法能够在特征缺失的情况下,仍然准确识别和定位目标。为了解决这一挑战,本文基于YOLOv10提出一种融合多尺度自注意力机制(Efficient Multi-directional Self-Attention, EMSA)的多尺度感知能力的YOLOv10改进方法。首先在YOLOv10中的C2f中融合MSDA注意力机制,增强了模型在多尺度上的特征捕捉能力,提升了对不同尺度遮挡目标的检测能力,通过自适应地加权不同通道的特征,提高了对遮挡目标特征的关注;其次基于动态聚焦机制引入新的损失函数Focaleriou,动态调整损失焦点,提高对不同尺度目标的检测能力,同时改善边界框回归损失收敛速度,之后添加了小目标检测头,增强小遮挡目标的特征提取能力;最后使用公开数据集Citypersons进行消融实验。结果表明,该融合了MSDA注意力机制的模型平均精度(Map@0.5)达到了62.3%,相较于官方YOLOv10n提升了2.2%。实验结果表明该EMSA注意力能够有效改进行人遮挡目标的检测,满足自动驾驶、监控等应用场景下的行人遮挡场景的检测需求。
Abstract: In occluded images, pedestrian targets are often partially or completely blocked by other objects, leading to incomplete appearance features, blurred edges, and even confusion with the background or occluding objects. Detecting occluded pedestrian targets requires algorithms capable of accurately recognizing and localizing targets despite missing features. To address this challenge, this paper proposes an improved YOLOv10 method with enhanced multi-scale perception by integrating an Efficient Multi-directional Self-Attention (EMSA) mechanism. Firstly, the MSDA attention mechanism is incorporated into the C2f module of YOLOv10 to enhance the model’s ability to capture features at multiple scales, improving the detection of occluded targets of various sizes. By adaptively weighting features across channels, the method increases focus on occluded target features. Secondly, a novel loss function, Focaleriou, is introduced based on a dynamic focusing mechanism. This adjusts the focus of the loss dynamically, enhancing the detection of targets at different scales and improving the convergence speed of bounding box regression loss. Additionally, a small-object detection head is added to strengthen feature extraction for small occluded targets. Finally, ablation experiments are conducted on the public Citypersons dataset. Results show that the model incorporating the MSDA attention mechanism achieves a mean average precision (mAP@0.5) of 62.3%, which is 2.2% higher than the official YOLOv10n. Experimental findings demonstrate that the EMSA attention mechanism effectively improves the detection of occluded pedestrian targets, meeting the requirements for scenarios such as autonomous driving and surveillance under occluded pedestrian conditions.
文章引用:徐升, 苏庆华, 万开政, 戚翔宇, 张智超. 融合全局特征和局部特征的行人遮挡目标检测[J]. 计算机科学与应用, 2025, 15(1): 28-36. https://doi.org/10.12677/csa.2025.151004

参考文献

[1] Redmon, J., Divvala, S., Girshick, R. and Farhadi, A. (2016) You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 779-788. [Google Scholar] [CrossRef
[2] Bochkovskiy, A., Wang, C.Y. and Liao, H.Y.M. (2020) YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv: 2004.10934.
[3] Chen, Z., Wang, Y. and Li, H. (2023) Dynamic Head YOLO for Detecting Occluded Objects in Complex Scenes. Computer Vision and Image Understanding, 237, Article ID: 103446.
[4] Sun, X., Zhao, Y. and Gao, T. (2022) Multi-Modal YOLO for Detecting Occluded Objects in Traffic Surveillance. Sensors, 22, Article 1943.
[5] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.A., Kaiser, Ł. and Polosukhin, I. (2017) Attention Is All You Need. arXiv: 1706.03762.
[6] Woo, S., Park, J., Lee, J. and Kweon, I.S. (2018) CBAM: Convolutional Block Attention Module. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., Computer VisionECCV 2018, Springer, 3-19. [Google Scholar] [CrossRef
[7] Hu, J., Shen, L. and Sun, G. (2018) Squeeze-and-Excitation Networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 7132-7141. [Google Scholar] [CrossRef
[8] Wang, S., et al. (2020) Linformer: Self-Attention with Linear Complexity. arXiv: 2006.04768.
[9] Wang, A., Chen, H., Liu, L., Chen, K., Lin, Z., Han, J. and Ding, G. (2024) YOLOv10: Real-Time End-to-End Object Detection. arXiv: 2405.14458.