YOLOv8n-CCNet:一种具有渐进式卷积的轻量级人群计数网络
YOLOv8n-CCNet: A Lightweight Crowd Counting Network with Progressive Ghost Convolution
DOI: 10.12677/csa.2026.162042, PDF,    科研立项经费支持
作者: 田雪晴, 张东明*, 郭亦涵, 赵文会, 陈立家:河南大学物理与电子学院,河南 开封
关键词: 人群计数YOLOv8n注意力机制轻量化网络Crowd Counting YOLOv8n Attention Mechanism Lightweight Network
摘要: 人群计数技术在公共安全、智慧城市和交通管理等领域具有重要应用价值。然而,现实场景中的群体图像存在尺度剧烈变化、遮挡严重以及背景复杂等挑战,导致现有方法难以兼顾准确性与效率。为应对这些问题,本文基于改进的YOLOv8n架构,提出一种人群计数网络YOLOv8n-CCNet。该网络通过三项核心创新实现性能提升:首先,在骨干网络中引入渐进式GhostConv替换策略,并设计轻量化特征提取模块,在保持多尺度感知能力的同时减少27.3%的参数数量;其次,在特征融合层加入通道与位置注意力机制,通过局部跨通道交互和方向感知的位置编码,增强对密集小目标的定位能力;最后,采用WIoUv3边界框回归损失函数,通过动态非单调聚焦机制优化梯度特性,提升遮挡场景下的回归稳定性。为验证所提方法的有效性,在包含1500张图像的高密度、多尺度人群自制数据集上进行了实验。结果表明,YOLOv8n-CCNet的mAP50达到65.3%,mAP50:95为35.6%,召回率为56.4%。相比基线模型,在计数精度和推理速度方面均有显著提升,证明了其在复杂现实场景中的有效性。
Abstract: Crowd counting has significant applications in public safety, smart cities, and traffic management. However, real-world crowd images present challenges such as drastic scale variations, severe occlusion, and complex backgrounds, making it difficult for existing methods to balance accuracy and efficiency. To address these challenges, this paper proposes a crowd counting network, YOLOv8n-CCNet, based on an improved YOLOv8n architecture. This network achieves performance improvements through three core innovations: First, a progressive GhostConv replacement strategy is introduced into the backbone network, and a lightweight feature extraction module is designed, reducing the number of parameters by 27.3% while maintaining multi-scale perception capabilities. Second, a channel and position attention mechanism (CPAM) is incorporated into the feature fusion layer, enhancing localization capabilities for dense small targets through local cross-channel interaction and orientation-aware position encoding. Finally, the WIoUv3 bounding box regression loss is adopted, and gradient characteristics are optimized through a dynamic non-monotonic focusing mechanism to improve regression stability in occluded scenarios. To verify the effectiveness of the proposed method, experiments were conducted on a self-made dataset of high-density, multi-scale crowds containing 1500 images. The results show that YOLOv8n-CCNet achieves an mAP50 of 65.3%, an mAP50:95 of 35.6%, and a recall of 56.4%. Compared with the baseline model, it demonstrates significant improvements in both counting accuracy and inference speed, proving its effectiveness in handling complex real-world scenarios.
文章引用:田雪晴, 张东明, 郭亦涵, 赵文会, 陈立家. YOLOv8n-CCNet:一种具有渐进式卷积的轻量级人群计数网络[J]. 计算机科学与应用, 2026, 16(2): 102-110. https://doi.org/10.12677/csa.2026.162042

参考文献

[1] Li, Y., Zhang, X. and Chen, D. (2018) CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-22 June 2018, 1091-1100. [Google Scholar] [CrossRef
[2] Gao, J., Wang, Q. and Yuan, Y. (2019) SCAR: Spatial-/Channel-Wise Attention Regression Networks for Crowd Counting. Neurocomputing, 363, 1-8. [Google Scholar] [CrossRef
[3] Jocher, G., Chaurasia, A. and Qiu, J. (2023) YOLOv8: A State-of-the-Art Object Detection Model. Ultralytics.
[4] Liu, S., Zhao, W. and Huang, G. (2024) EfficientNet-Ghost: A Hybrid Lightweight Architecture for Real-Time Detection on Mobile Devices. Proceedings of the AAAI Conference on Artificial Intelligence, 38, 6213-6221.
[5] Wang, C., Bochkovskiy, A. and Liao, H.M. (2023) YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 18-22 June 2023, 7464-7475. [Google Scholar] [CrossRef
[6] Hu, J., Shen, L. and Sun, G. (2018) Squeeze-and-Excitation Networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-22 June 2018, 7132-7141. [Google Scholar] [CrossRef
[7] Woo, S., Park, J., Lee, J. and Kweon, I.S. (2018) CBAM: Convolutional Block Attention Module. Proceedings of the European Conference on Computer Vision (ECCV), Munich, 8-14 September 2018, 3-19. [Google Scholar] [CrossRef
[8] Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C. and Xu, C. (2020) Ghostnet: More Features from Cheap Operations. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 14-19 June 2020, 1580-1589. [Google Scholar] [CrossRef
[9] Tan, M. and Le, Q. (2019) EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. International Conference on Machine Learning. PMLR, Long Beach, 9-15 June 2019, 6105-6114.
[10] Hou, Q., Zhou, D. and Feng, J. (2021) Coordinate Attention for Efficient Mobile Network Design. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 19-25 June 2021, 13713-13722. [Google Scholar] [CrossRef
[11] Chen, X., Liu, Y. and Zhang, Q. (2024) Multi-Scale Attention Fusion Network for Crowd Counting in Complex Scenes. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, 16-22 June 2024, 88-97.
[12] Tong, Z., Chen, Y., Xu, Z., et al. (2023) Wise-IoU: Bounding Box Regression Loss with Dynamic Focusing Mechanism.
[13] Zhou, Y., Xu, T. and Chen, L. (2025) Video-Based Crowd Counting with Temporal Attention and Motion Modeling. IEEE Transactions on Circuits and Systems for Video Technology, 35, 321-335.