基于多层级语义信息融合的区域一致性半监督人群计数研究
Research on Regional Consistency Semi-Supervised Crowd Counting Based on Multi-Level Semantic Information Fusion
DOI: 10.12677/csa.2025.1510262, PDF,    科研立项经费支持
作者: 郭禹辰, 宋蔓蔓:河北金融学院河北省科技金融重点实验室,河北 保定
关键词: 人群计数多层级语义增强半监督框架Crowd Counting Multi-Level Semantic Enhancement Semi-Supervised Framework
摘要: 面向智慧城市场景的人群计数问题,本文提出一种基于多层级语义特征提取网络的半监督人群计数框架。所提方法以ResNet-34为骨干网络,设计自上而下的多层级语义增强模块,在不同层级间通过门控进行语义信息增强与轻量级特征融合,抑制背景噪声、突出前景人群信息。在无标注样本上引入区域一致性约束,设计半监督框架,提升模型的泛化能力。训练阶段采用先监督训练预热,而后交替迭代更新的策略。在ShanghaiTech数据集和UCF_CC_50数据集上进行了训练和测试,实验结果显示,该框架在ShanghaiTech数据集A和B部分上的MAE分别为65.4和9.2,在UCF_CC_50数据集上的MAE为201.2,算法在不同密度人群与复杂背景场景下均表现出了较好的识别精度,可以为基于视觉的人群计数任务提供高效的解决方案。
Abstract: Aiming at the problem of crowd counting in smart city scenes, this paper proposes a semi-supervised crowd counting framework based on multi-level semantic feature extraction network. The proposed method uses ResNet-34 as the backbone network and designs a top-down multi-level semantic enhancement module. Semantic information enhancement and lightweight fusion are performed through gating between different levels to suppress background noise and highlight foreground crowd information. By introducing regional consistency constraints on unlabeled samples, a semi-supervised framework is designed to improve the model’s generalization ability. In the training stage, the strategy of supervised training preheating and then alternating iterative updating is adopted. The framework is trained and tested on the ShanghaiTech dataset and the UCF_CC_50 dataset. The experimental results show that the MAE of the framework on the A and B parts of the ShanghaiTech dataset is 65.4 and 9.2, respectively, and the MAE on the UCF_CC_50 dataset is 201.2. The algorithm shows good recognition accuracy in different crowd densities and complex background scenarios. This algorithm can provide an efficient solution for vision-based crowd counting tasks.
文章引用:郭禹辰, 宋蔓蔓. 基于多层级语义信息融合的区域一致性半监督人群计数研究[J]. 计算机科学与应用, 2025, 15(10): 221-231. https://doi.org/10.12677/csa.2025.1510262

参考文献

[1] 中华人民共和国国务院. 中华人民共和国国民经济和社会发展第十四个五年规划和2035年远景目标纲要[EB/OL]. 中国政府网.
http://www.gov.cn/xinwen/2021-03/13/content_5592681.html, 2021-03-13.
[2] 陈冲, 白硕, 黄丽达, 等. 基于视频分析的人群密集场所客流监控预警研究[J]. 中国安全生产科学技术, 2020, 16(4): 143-148.
[3] 卢振坤, 刘胜, 钟乐, 等. 人群计数研究综述[J]. 计算机工程与应用, 2022, 58(11): 33-46.
[4] Zhang, Y., Zhou, D., Chen, S., Gao, S. and Ma, Y. (2016) Single-Image Crowd Counting via Multi-Column Convolutional Neural Network. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 589-597. [Google Scholar] [CrossRef
[5] 徐松, 许文利. 基于卷积神经网络的人群计数研究综述[J]. 科技与创新, 2025(10): 183-186.
[6] 陈永, 董珂, 安卓奥博, 等. 密集连接注意力与尺度感知重组增强的人群计数[J]. 光学精密工程, 2024, 32(22): 3395-3408.
[7] Jiang, X., Zhang, L., Zhang, T., Lv, P., Zhou, B., Pang, Y., et al. (2020) Density-Aware Multi-Task Learning for Crowd Counting. IEEE Transactions on Multimedia, 23, 443-453. [Google Scholar] [CrossRef
[8] Pan, X., Mo, H., Zhou, Z. and Wu, W. (2020) Attention Guided Region Division for Crowd Counting. 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, 4-8 May 2020, 2568-2572.
[9] Zhang, A., Shen, J., Xiao, Z., Zhu, F., Zhen, X., Cao, X., et al. (2019) Relational Attention Network for Crowd Counting. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 6787-6796. [Google Scholar] [CrossRef
[10] Yang, Y., Li, G., Wu, Z., Su, L., Huang, Q. and Sebe, N. (2020) Weakly-Supervised Crowd Counting Learns from Sorting Rather than Locations. In: Lecture Notes in Computer Science, Springer, 1-17. [Google Scholar] [CrossRef
[11] 余鹰, 范在昌, 曾康利, 等. 渐进式认知引导的双域半监督人群计数[J]. 计算机研究与发展, 2025, 62(9): 2194-2207.
[12] 王鑫. 有限标注数据的复杂场景人群计数方法研究[D]: [博士学位论文]. 北京: 北京交通大学, 2024.
[13] Idrees, H., Saleemi, I., Seibert, C. and Shah, M. (2013) Multi-Source Multi-Scale Counting in Extremely Dense Crowd Images. 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, 23-28 June 2013, 2547-2554. [Google Scholar] [CrossRef
[14] Song, Q., Wang, C., Jiang, Z., Wang, Y., Tai, Y., Wang, C., et al. (2021) Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 3345-3354. [Google Scholar] [CrossRef
[15] 彭思凡. 基于密度估计的人群计数方法研究[D]: [博士学位论文]. 合肥: 中国科学技术大学, 2022.
[16] Li, Y., Zhang, X. and Chen, D. (2018) CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 1091-1100. [Google Scholar] [CrossRef
[17] Cao, X., Wang, Z., Zhao, Y. and Su, F. (2018) Scale Aggregation Network for Accurate and Efficient Crowd Counting. In: Lecture Notes in Computer Science, Springer, 757-773. [Google Scholar] [CrossRef
[18] Liu, X., van de Weijer, J. and Bagdanov, A.D. (2018) Leveraging Unlabeled Data for Crowd Counting by Learning to Rank. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 7661-7669. [Google Scholar] [CrossRef