基于改进YOLOv9的工业安全帽检测应用
Industrial Helmet Detection Application Based on Improved YOLOv9
DOI: 10.12677/airr.2026.153068, PDF,   
作者: 马宇航:北京建筑大学理学院,北京
关键词: 目标检测YOLOv9DualConvNAM安全帽检测轻量化Object Detection YOLOv9 DualConv NAM Helmet Detection Lightweight
摘要: 对建筑工地等高风险作业场景中安全帽佩戴检测的实时性与准确性需求,本文提出一种融合双卷积核(DualConv)与归一化注意力模块(NAM)的轻量化YOLOv9改进模型。首先,在YOLOv9主干网络第六层引入DualConv结构,通过并行融合3 × 3与1 × 1卷积核,在保持多尺度特征提取能力的同时有效降低模型参数量与计算复杂度;其次,在网络输出端嵌入NAM注意力机制,利用Batch Normalization缩放因子动态评估通道重要性,增强关键特征响应并抑制冗余信息,进一步提升模型的特征表征效率。在自建安全帽数据集(约2000张图像)上的实验结果表明:改进模型参数量由6940.9万降至5988.2万(减少约13.7%),网络层数由1475层精简至933层,训练效率显著提升;检测精度方面,Precision达0.9383 (略优于原模型0.9365),mAP为0.9056,虽较原模型略有下降,但整体在精度与效率之间取得了良好平衡。本研究为资源受限的嵌入式部署场景提供了一种可行的轻量化安全帽检测方案,后续将通过扩大数据集、优化训练策略及开展消融实验进一步提升模型综合性能。
Abstract: Addressing the real-time and accuracy requirements for helmet detection in high-risk work scenarios such as construction sites, this paper proposes a lightweight YOLOv9 improved model that integrates DualConv and Normalized Attention Module (NAM). Firstly, the DualConv structure is introduced into the sixth layer of the YOLOv9 backbone network, effectively reducing the model parameter count and computational complexity while maintaining multi-scale feature extraction capabilities through parallel integration of 3 × 3 and 1 × 1 convolutional kernels. Secondly, the NAM attention mechanism is embedded at the network output, utilizing the Batch Normalization scaling factor to dynamically assess channel importance, enhance key feature responses, and suppress redundant information, further improving the model’s feature representation efficiency. Experimental results on a self-built helmet dataset (approximately 2000 images) show that the parameter count of the improved model has been reduced from 69.409 million to 59.882 million (a decrease of approximately 13.7%), and the number of network layers has been streamlined from 1475 to 933, significantly improving training efficiency. In terms of detection accuracy, the Precision reaches 0.9383 (slightly better than the original model’s 0.9365) and the mAP is 0.9056. Although there is a slight decrease compared to the original model, overall, a good balance between accuracy and efficiency has been achieved. This study provides a feasible lightweight helmet detection solution for resource-constrained embedded deployment scenarios. Future improvements will focus on expanding the dataset, optimizing training strategies, and conducting ablation experiments to further enhance the comprehensive performance of the model.
文章引用:马宇航. 基于改进YOLOv9的工业安全帽检测应用[J]. 人工智能与机器人研究, 2026, 15(3): 722-730. https://doi.org/10.12677/airr.2026.153068

参考文献

[1] Wang, C.Y., Yeh, I.H. and Mark Liao, H.Y. (2024) YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. European Conference on Computer Vision, Milan, 29 September-4 October, 1-21.
[2] Bochkovskiy, A., Wang, C.Y. and Liao, H.Y.M. (2004) YOLOv4: Optimal Speed and Accuracy of Object Detection.
https://arxiv.org/abs/2004.10934
[3] Ge, Z., Liu, S., Wang, F., et al. (2021) YOLOx: Exceeding YOLO Series in 2021.
https://arxiv.org/abs/2107.08430
[4] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A. and Zagoruyko, S. (2020) End-to-End Object Detection with Transformers. In: Lecture Notes in Computer Science, Springer, 213-229. [Google Scholar] [CrossRef
[5] Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2020) An Image Is Worth 16 × 16 Words: Transformers for Image Recognition at Scale.
https://arxiv.org/abs/2010.11929
[6] Bao, H., Dong, L., Piao, S., et al. (2021) Beit: Bert Pre-Training of Image Transformers.
https://arxiv.org/abs/2106.08254
[7] Feng, C., Zhong, Y., Gao, Y., Scott, M.R. and Huang, W. (2021) TOOD: Task-Aligned One-Stage Object Detection. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 3490-3499. [Google Scholar] [CrossRef
[8] Chen, Y., Yuan, X., Wang, J., Wu, R., Li, X., Hou, Q., et al. (2025) YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-Time Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47, 4240-4252. [Google Scholar] [CrossRef] [PubMed]
[9] Gao, S.H., Cheng, M.M., Zhao, K., et al. (2019) Res2net: A New Multi-Scale Backbone Architecture. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 652-662. [Google Scholar] [CrossRef] [PubMed]
[10] Zhong, J., Chen, J. and Mian, A. (2023) DualConv: Dual Convolutional Kernels for Lightweight Deep Neural Networks. IEEE Transactions on Neural Networks and Learning Systems, 34, 9528-9535. [Google Scholar] [CrossRef] [PubMed]
[11] Liu, Y.C., Shao, Z.R., Teng, Y.Y., et al. (2022) NAM: Normalization-Based Attention Module. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 22-23 September 2022, 12345-12354.
[12] Ding, M., Xiao, B., Codella, N., Luo, P., Wang, J. and Yuan, L. (2022) Davit: Dual Attention Vision Transformers. In: Lecture Notes in Computer Science, Springer, 74-92. [Google Scholar] [CrossRef
[13] Chen, K., Lin, W., Li, J., See, J., Wang, J. and Zou, J. (2020) Ap-Loss for Accurate One-Stage Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 3782-3798. [Google Scholar] [CrossRef] [PubMed]
[14] Ge, Z., Liu, S., Li, Z., Yoshie, O. and Sun, J. (2021) OTA: Optimal Transport Assignment for Object Detection. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 303-312. [Google Scholar] [CrossRef