基于多任务学习的智慧城市场景人群计数研究
Multi-Task Learning-Based Crowd Counting for Smart-City Scenes
DOI: 10.12677/csa.2025.1511288, PDF,    科研立项经费支持
作者: 郭禹辰:河北金融学院河北省科技金融重点实验室,河北 保定;刘 梦:河北金融学院金融博物馆,河北 保定
关键词: 人群计数透视差异多尺度特征前景分割多任务学习框架Crowd Counting Perspective Difference Multi-Scale Features Foreground Segmentation Multi-Task Learning Framework
摘要: 为解决城市场景摄像头高度与视角差异导致的人群图像中强透视、多尺度问题,提出一种以多任务学习为框架的多尺度人群计数算法。在点监督框架上引入一种一体化多尺度金字塔模块,提升对不同大小人头的特征提取能力,同时将点标注经多尺度高斯扩散与自适应阈值自动生成人群前景分割标签,作为辅助任务与计数任务联合训练,以计数损失和分割损失构成多任务目标,抑制背景干扰并实现多尺度人群的识别。选取以城市场景为代表的ShanghaiTech数据集A、B和UCF-QNRF数据集进行实验,MAE分别为57.8,7.6和86.2,在强透视和密度不均场景下均取得较好的效果,表现出较高鲁棒性。所提方法以零额外标注成本显著提升了智慧城市场景下人群计数的准确性与可部署性,适用于跨视角的城市监控设备。
Abstract: To address the strong perspective distortion and multi-scale variations in crowd images caused by differences in camera height and viewpoint in urban settings, we propose a multi-scale crowd counting method under a multi-task learning (MTL) framework. On top of a point-supervised paradigm, we introduce an integrated multi-scale pyramid module that enhances feature extraction for heads of different sizes. Meanwhile, point annotations are converted—via multi-scale Gaussian diffusion and adaptive thresholding—into crowd-foreground segmentation labels, which serve as an auxiliary task jointly trained with the counting task. The overall objective combines counting loss and segmentation loss, suppressing background interference and enabling reliable recognition across scales. Experiments on urban-scene representative datasets—ShanghaiTech Part A, ShanghaiTech Part B, and UCF-QNRF—yield MAEs of 57.8, 7.6, and 86.2, respectively, demonstrating strong performance under severe perspective and uneven density conditions and indicating high robustness. The proposed approach improves accuracy and deployability for smart-city crowd counting without any additional annotation cost, making it suitable for cross-view urban surveillance systems.
文章引用:郭禹辰, 刘梦. 基于多任务学习的智慧城市场景人群计数研究[J]. 计算机科学与应用, 2025, 15(11): 102-110. https://doi.org/10.12677/csa.2025.1511288

参考文献

[1] 林园园, 杨会成, 胡耀聪. 基于轻量化卷积神经网络的人数估计算法研究[J/OL]. 重庆工商大学学报(自然科学版), 1-13.
https://link.cnki.net/urlid/50.1155.N.20240226.1244.014, 2025-11-13.
[2] 介艳良, 郝磊, 闫树军, 等. 基于图像处理的城市轨道交通监控系统设计[J]. 自动化与仪器仪表, 2023(2): 126-130+136.
[3] Bai, H., Mao, J. and Gary Chan, S.-H. (2022) A Survey on Deep Learning-Based Single Image Crowd Counting: Network Design, Loss Function and Supervisory Signal. Neurocomputing, 508, 1-18. [Google Scholar] [CrossRef
[4] 蔡一庆, 马振伟, 王庭枢, 等. 面向跨域人群计数的头部感知密度适应网络[J]. 计算机辅助设计与图形学学报, 2021, 33(10): 1514-1523.
[5] Song, Q., Wang, C., Jiang, Z., Wang, Y., Tai, Y., Wang, C., et al. (2021) Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 3345-3354. [Google Scholar] [CrossRef
[6] 王大正, 张涛. 双任务交互下的四段监督人群计数网络[J]. 小型微型计算机系统, 2023, 44(10): 2120-2126.
[7] 陈训敏, 叶书函, 詹瑞. 基于多任务学习及由粗到精的卷积神经网络人群计数模型[J]. 计算机科学, 2020, 47(S2): 183-187+208.
[8] Zhang, Y., Zhou, D., Chen, S., Gao, S. and Ma, Y. (2016) Single-Image Crowd Counting via Multi-Column Convolutional Neural Network. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 589-597. [Google Scholar] [CrossRef
[9] Chen, I., Chen, W., Liu, Y., Yang, M. and Kuo, S. (2024) Improving Point-Based Crowd Counting and Localization Based on Auxiliary Point Guidance. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T. and Varol, G., Eds., Lecture Notes in Computer Science, Springer, 428-444. [Google Scholar] [CrossRef
[10] Lin, W. and Chan, A.B. (2023) Optimal Transport Minimization: Crowd Localization on Density Maps for Semi-Supervised Counting. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 21663-21673. [Google Scholar] [CrossRef
[11] 文帅, 蒋勇, 杨丹, 等. 基于多尺度注意力网络的密集人群计数[J]. 计算机应用与软件, 2025, 42(1): 130-136+157.
[12] 桑军, 刘新悦, 吴志伟, 等. 基于背景辅助的高效人群计数多任务学习网络[J]. 西南师范大学学报(自然科学版), 2022, 47(8): 1-8.
[13] Idrees, H., Tayyab, M., Athrey, K., Zhang, D., Al-Maadeed, S., Rajpoot, N., et al. (2018) Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., Lecture Notes in Computer Science, Springer International Publishing, 544-559. [Google Scholar] [CrossRef
[14] 马圣南, 严华. 基于自适应策略的人群密度图纠正算法[J]. 现代计算机, 2024, 30(10): 23-28.
[15] Li, Y., Zhang, X. and Chen, D. (2018) CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 1091-1100. [Google Scholar] [CrossRef
[16] Ma, Z., Wei, X., Hong, X. and Gong, Y. (2019) Bayesian Loss for Crowd Count Estimation with Point Supervision. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October 2019-2 November 2019, 6142-6151. [Google Scholar] [CrossRef
[17] Liu, W., Salzmann, M. and Fua, P. (2019) Context-Aware Crowd Counting. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 5094-5103. [Google Scholar] [CrossRef
[18] Wang, B., Liu, H., Samaras, D. and Nguyen, M.H. (2020) Distribution Matching for Crowd Counting. Advances in Neural Information Processing Systems, 33, 1595-1607.