基于改进YOLOv8的复杂环境行人识别模型
Pedestrian Recognition Model for Complex Environment Based on Improved YOLOv8
DOI: 10.12677/airr.2025.146139, PDF,   
作者: 韦剑锋:广东工业大学物理与光电工程学院,广东 广州
关键词: 行人图像识别多尺度注意力YOLOPedestrian Image Recognition Multi-Scale Attention YOLO
摘要: 针对现有行人识别模型在复杂场景下,存在小目标及密集目标误检漏检率高等问题,提出了基于YOLOv8改进的行人检测模型Swin-YOLO。首先设计了SE-Conv模块,该模块通过将卷积操作和注意力机制进行融合,有效提升了模型的特征选择能力;其次,引入Swin Transformer (ST)模块,利用ST模块大感受野特性提升模型的长距离建模能力;最后,在输出端构建了基于深度卷积与空洞卷积的多尺度卷积模块,该模块旨在以较低的计算复杂度实现高效的多尺度特征融合。实验结果表明,在小目标场景数据集TinyPerson中,Swin-YOLO模型相比YOLOv8模型在检测精度指标mAP@0.5和mAP@0.5:0.95上分别提升了6.5和2.5个百分点,有效减少了误检和漏检,为解决智能辅助驾驶及道路状况预警提供一种有效的改进YOLOv8行人识别模型。
Abstract: To address the high false positive and false negative rates of existing pedestrian detection models in complex scenes, particularly for small and densely clustered targets, we propose Swin-YOLO, an improved pedestrian detection model based on YOLOv8. First, the SE-Conv module was designed, which effectively enhances the model’s feature selection capability by integrating convolutional operations with attention mechanisms. Second, the Swin Transformer (ST) module was introduced to leverage its large receptive field for improved long-range modeling. Finally, a multi-scale convolutional module based on deep convolutions and dilated convolutions was constructed at the output layer to achieve efficient multi-scale feature fusion with reduced computational complexity. Experimental results demonstrate that on the TinyPerson dataset for small-object scenarios. The Swin-YOLO model outperforms the YOLOv8 model by 6.5 and 2.5 percentage points in mAP@0.5 and mAP@0.5:0.95 respectively. This effectively reduces false positives and false negatives, to address intelligent driving assistance and road condition warnings, an effective improved YOLOv8 pedestrian detection model is provided.
文章引用:韦剑锋. 基于改进YOLOv8的复杂环境行人识别模型[J]. 人工智能与机器人研究, 2025, 14(6): 1489-1498. https://doi.org/10.12677/airr.2025.146139

参考文献

[1] 王晓路, 李晓婷, 谭永辉. 基于深度融合3D与2D卷积网络的步态识别方法[J]. 现代电子技术, 2025, 48(9): 109-115.
[2] 宋冬影. 基于深度学习的电子信息网络异常行为检测技术研究[J]. 信息记录材料, 2025, 26(5): 47-49+52.
[3] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C., et al. (2016) SSD: Single Shot Multibox Detector. In: Computer Vision-ECCV 2016: 14th European Conference, Springer International Publishing, 21-37. [Google Scholar] [CrossRef
[4] 徐彦威, 李军, 董元方, 等. YOLO系列目标检测算法综述[J]. 计算机科学与探索, 2024, 18(9): 2221-2238.
[5] 宋宇博, 高嘉振. 改进YOLOv3算法的交通多目标检测方法[J]. 北京邮电大学学报, 2022, 45(5): 103-108.
[6] Fu, P., Zhang, X. and Yang, H. (2023) Answer Sheet Layout Analysis Based on YOLOv5s-DC and MSER. The Visual Computer, 40, 6111-6122. [Google Scholar] [CrossRef
[7] Hu, J., Shen, L., Sun, G., et al. (2017) Squeeze-and-Excitation Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 2011-2023.
[8] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., et al. (2021) Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 10012-10022. [Google Scholar] [CrossRef
[9] Chollet, F. (2017) Xception: Deep Learning with Depthwise Separable Convolutions. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 1251-1258. [Google Scholar] [CrossRef
[10] Yu, F., Koltun, V. and Funkhouser, T. (2017) Dilated Residual Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 472-480. [Google Scholar] [CrossRef
[11] Zhu, X., Lyu, S., Wang, X. and Zhao, Q. (2021) TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-Captured Scenarios. 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, 11-17 October 2021, 2778-2788. [Google Scholar] [CrossRef
[12] Varghese, R. and Sambath, M. (2024) YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness. 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, 18-19 April 2024, 1-6. [Google Scholar] [CrossRef
[13] Li, J., Wu, W., Zhang, D., Fan, D., Jiang, J., Lu, Y., et al. (2023) Multi-Pedestrian Tracking Based on KC-YOLO Detection and Identity Validity Discrimination Module. Applied Sciences, 13, Article No. 12228. [Google Scholar] [CrossRef
[14] Yu, X., Gong, Y., Jiang, N., Ye, Q. and Han, Z. (2020) Scale Match for Tiny Person Detection. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Snowmass, 1-5 March 2020, 1257-1265. [Google Scholar] [CrossRef
[15] Shao, S., Zhao, Z., Li, B., et al. (2018) Crowdhuman: A Benchmark for Detecting Human in a Crowd.
[16] 江旺玉, 王乐, 姚叶鹏, 等. 多尺度特征聚合扩散和边缘信息增强的小目标检测算法[J]. 计算机工程与应用, 2025, 61(7): 105-116.
[17] 袁婷婷, 赖惠成, 汤静雯, 等. LMFI-YOLO: 复杂场景下的轻量化行人检测算法[J]. 计算机工程与应用, 2025, 61(15): 111-123.
[18] 黄俊杰, 胡畅, 包嘉琪, 等. 轻量型密集行人检测算法研究[J]. 计算机仿真, 2024, 41(5): 183-188.
[19] 王泽宇, 徐慧英, 朱信忠, 等. 基于YOLOv8改进的密集行人检测算法: MER-YOLO [J]. 计算机工程与科学, 2024, 46(6): 1050-1062.
[20] 姚聪, 方遒, 郭星浩. 改进YOLOv8的轻量化密集行人检测方法[J]. 计算机工程与应用, 2025, 61(13): 138-150.