基于频域特征的人群计数单域泛化
Single Domain Generalization Based on Frequency Domain for Crowd Counting
DOI: 10.12677/JISP.2026.151009, PDF,   
作者: 李 通:广东海洋大学,数学与计算机学院,广东 湛江
关键词: 特征解耦特征融合单域泛化人群计数Feature Decoupling Feature Fusion Single Domain Generalization Crowd Counting
摘要: 在跨域人群计数任务中,源域与目标域之间的特征分布差异会导致模型泛化性能显著下降。为此, 本文提出了一种基于频域特征解耦的单域泛化人群计数方法。具体而言,我们首先利用二维傅里 叶变换对输入图像进行频域分解,显式分离出高频与低频成分。其中,高频特征包含与域无关的 边缘与纹理信息,而低频特征则反映了图像的全局结构和密度分布。考虑到密度回归中局部密度 变化平缓且主要受低频成分影响,我们引用了内容错误掩码以过滤低频特征中的域特异信息,并 构建高频引导的空间注意力机制,实现频域特征的有效融合。通过注意力一致性约束,进一步保 证原图与增强图在空间关注区域上的一致性,从而提升模型的域鲁棒性。大量实验结果表明,本 文方法在多个跨域人群计数基准上均取得了优异的泛化性能,验证了所提频域特征解耦与融合策 略的有效性。
Abstract: In cross-domain crowd counting, discrepancies in feature distributions between source and target domains often lead to a significant drop in model generalization perfor- mance. To address this issue, we propose a single-domain generalization method for crowd counting based on frequency-specific feature decoupling. Specifically, we first apply a two-dimensional Fourier Transform to decompose input images into frequency components, explicitly separating high-frequency and low-frequency information. The high-frequency components capture domain-invariant edge and texture details, while the low-frequency components represent the global structural and density distribution information of the image. Considering that density regression involves locally smooth variations that are mainly influenced by low-frequency components, we introduce a Content Error Mask (CEM) to filter domain-specific information from low-frequency features. Furthermore, a high-frequency-guided spatial attention mechanism is de- signed to achieve effective frequency-domain feature fusion. An additional attention consistency constraint is applied to ensure consistent spatial focus between the orig- inal and augmented images, thereby improving cross-domain robustness. Extensive experimental results on multiple benchmark datasets demonstrate that our method achieves superior generalization performance, validating the effectiveness of the pro- posed frequency-specific feature decoupling and fusion strategy.
文章引用:李通. 基于频域特征的人群计数单域泛化[J]. 图像与信号处理, 2026, 15(1): 102-117. https://doi.org/10.12677/JISP.2026.151009

参考文献

[1] Sindagi, V., Yasarla, R. and Patel, V.M.M. (2020) JHU-CROWD++: Large-Scale Crowd Counting Dataset and a Benchmark Method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 2594-2609. [Google Scholar] [CrossRef
[2] Zhang, Y., Zhou, D., Chen, S., Gao, S. and Ma, Y. (2016) Single-Image Crowd Counting via Multi-Column Convolutional Neural Network. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 589-597. [Google Scholar] [CrossRef
[3] Wang, Q., Gao, J., Lin, W. and Yuan, Y. (2019) Learning from Synthetic Data for Crowd Counting in the Wild. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR), Long Beach, 15-20 June 2019, 8198-8207. [Google Scholar] [CrossRef
[4] Zhang, C., Li, H., Wang, X. and Yang, X. (2015) Cross-Scene Crowd Counting via Deep Convolutional Neural Networks. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 7-12 June 2015, 833-841. [Google Scholar] [CrossRef
[5] Zhu, H., Yuan, J., Yang, Z., Zhong, X. and Wang, Z. (2022) Fine-Grained Fragment Diffusion for Cross Domain Crowd Counting. Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, 10-14 October 2022, 5659-5668. [Google Scholar] [CrossRef
[6] Babu Sam, D., Surya, S. and Venkatesh Babu, R. (2017) Switching Convolutional Neural Net- work for Crowd Counting. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 4031-4039. [Google Scholar] [CrossRef
[7] Jiang, X., Zhang, L., Xu, M., Zhang, T., Lv, P., Zhou, B., et al. (2020) Attention Scaling for Crowd Counting. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 4705-4714. [Google Scholar] [CrossRef
[8] Song, Q., Wang, C., Wang, Y., Tai, Y., Wang, C., Li, J., et al. (2021) To Choose or to Fuse? Scale Selection for Crowd Counting. Proceedings of the AAAI Conference on Artificial Intelligence, 35, 2576-2583. [Google Scholar] [CrossRef
[9] Shu, W., Wan, J., Tan, K.C., Kwong, S. and Chan, A.B. (2022) Crowd Counting in the Frequency Domain. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 19618-19627. [Google Scholar] [CrossRef
[10] Lin, H., Ma, Z., Ji, R., Wang, Y. and Hong, X. (2022) Boosting Crowd Counting via Multi- faceted Attention. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 19596-19605. [Google Scholar] [CrossRef
[11] Liu, Y., Wang, Z., Shi, M., Satoh, S., Zhao, Q. and Yang, H. (2020) Towards Unsupervised Crowd Counting via Regression-Detection Bi-Knowledge Transfer. Proceedings of the 28th ACM International Conference on Multimedia, Seattle, 12-16 October 2022, 129-137. [Google Scholar] [CrossRef
[12] Wu, Q., Wan, J. and Chan, A.B. (2021) Dynamic Momentum Adaptation for Zero-Shot Cross- Domain Crowd Counting. Proceedings of the 29th ACM International Conference on Multi- media, 20-24 October 2021, 658-666. [Google Scholar] [CrossRef
[13] Zhu, H., Yuan, J., Zhong, X., Yang, Z., Wang, Z. and He, S. (2023) DAOT: Domain- Agnostically Aligned Optimal Transport for Domain-Adaptive Crowd Counting. Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, 29 October-3 November 2023, 4319-4329. [Google Scholar] [CrossRef
[14] Zhu, H., Yuan, J., Zhong, X., Liao, L. and Wang, Z. (2024) Find Gold in Sand: Fine-Grained Similarity Mining for Domain-Adaptive Crowd Counting. IEEE Transactions on Multimedia, 26, 3842-3855. [Google Scholar] [CrossRef
[15] Pan, X., Luo, P., Shi, J. and Tang, X. (2018) Two at Once: Enhancing Learning and Gener- alization Capacities via IBN-Net. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., Computer Vision—ECCV 2018. ECCV 2018. Lecture Notes in Computer Science, Vol. 11208, Springer, 484-500. [Google Scholar] [CrossRef
[16] Pan, X., Zhan, X., Shi, J., Tang, X. and Luo, P. (2019) Switchable Whitening for Deep Repre- sentation Learning. 2019 IEEE/CVF International Conference on Computer Vision (ICCV),Seoul, 27 October-2 November 2019, 1863-1871. [Google Scholar] [CrossRef
[17] Choi, S., Jung, S., Yun, H., Kim, J.T., Kim, S. and Choo, J. (2021) RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20- 25 June 2021, 11575-11585. [Google Scholar] [CrossRef
[18] Mansilla, L., Echeveste, R., Milone, D.H. and Ferrante, E. (2021) Domain Generalization via Gradient Surgery. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 6610-6618. [Google Scholar] [CrossRef
[19] Du, Z., Deng, J. and Shi, M. (2023) Domain-General Crowd Counting in Unseen Scenarios. Proceedings of the AAAI Conference on Artificial Intelligence, 37, 561-570. [Google Scholar] [CrossRef
[20] Peng, Z. and Chan, S.-H.G. (2024) Single Domain Generalization for Crowd Counting. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 16-22 June 2024, 28025-28034. [Google Scholar] [CrossRef
[21] Idrees, H., Tayyab, M., Athrey, K., Zhang, D., Al-Maadeed, S., Rajpoot, N., et al. (2018) Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds. In: Ferrari, V., et al., Eds., Lecture Notes in Computer Science, Springer International Publishing, 544-559. [Google Scholar] [CrossRef
[22] Simonyan, K. and Zisserman, A. (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556
[23] Loshchilov, I. and Hutter, F. (2017) Decoupled Weight Decay Regularization. ArXiv, ab- s/1711.05101
[24] Smith, L.N. and Topin, N. (2019) Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates.Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications , 11006,369–386.
[25] Ma, Z., Wei, X., Hong, X. and Gong, Y. (2019) Bayesian Loss for Crowd Count Estimation with Point Supervision. 2019 IEEE/CVF International Conference on Computer Vision (ICCV),Seoul, 27 October-2 November 2019, 6141-6150. [Google Scholar] [CrossRef
[26] Wang, B., Liu, H., Samaras, D. and Hoai, M. (2020) Distribution Matching for Crowd Count- ing. Proceedings of the 34th International Conference on Neural Information Processing Sys- tems, Vancouver, 6-12 December 2020, 1595-1607.