基于频域特征的人群计数单域泛化

doi:10.12677/JISP.2026.151009

期刊菜单

基于频域特征的人群计数单域泛化
Single Domain Generalization Based on Frequency Domain for Crowd Counting

DOI: 10.12677/JISP.2026.151009, PDF,
作者: 李通：广东海洋大学，数学与计算机学院，广东湛江
关键词: 特征解耦；特征融合；单域泛化；人群计数；Feature Decoupling； Feature Fusion； Single Domain Generalization； Crowd Counting

摘要: 在跨域人群计数任务中，源域与目标域之间的特征分布差异会导致模型泛化性能显著下降。为此，本文提出了一种基于频域特征解耦的单域泛化人群计数方法。具体而言，我们首先利用二维傅里叶变换对输入图像进行频域分解，显式分离出高频与低频成分。其中，高频特征包含与域无关的边缘与纹理信息，而低频特征则反映了图像的全局结构和密度分布。考虑到密度回归中局部密度变化平缓且主要受低频成分影响，我们引用了内容错误掩码以过滤低频特征中的域特异信息，并构建高频引导的空间注意力机制，实现频域特征的有效融合。通过注意力一致性约束，进一步保证原图与增强图在空间关注区域上的一致性，从而提升模型的域鲁棒性。大量实验结果表明，本文方法在多个跨域人群计数基准上均取得了优异的泛化性能，验证了所提频域特征解耦与融合策略的有效性。

Abstract: In cross-domain crowd counting, discrepancies in feature distributions between source and target domains often lead to a signiﬁcant drop in model generalization perfor- mance. To address this issue, we propose a single-domain generalization method for crowd counting based on frequency-speciﬁc feature decoupling. Speciﬁcally, we ﬁrst apply a two-dimensional Fourier Transform to decompose input images into frequency components, explicitly separating high-frequency and low-frequency information. The high-frequency components capture domain-invariant edge and texture details, while the low-frequency components represent the global structural and density distribution information of the image. Considering that density regression involves locally smooth variations that are mainly inﬂuenced by low-frequency components, we introduce a Content Error Mask (CEM) to ﬁlter domain-speciﬁc information from low-frequency features. Furthermore, a high-frequency-guided spatial attention mechanism is de- signed to achieve eﬀective frequency-domain feature fusion. An additional attention consistency constraint is applied to ensure consistent spatial focus between the orig- inal and augmented images, thereby improving cross-domain robustness. Extensive experimental results on multiple benchmark datasets demonstrate that our method achieves superior generalization performance, validating the eﬀectiveness of the pro- posed frequency-speciﬁc feature decoupling and fusion strategy.

文章引用：李通. 基于频域特征的人群计数单域泛化[J]. 图像与信号处理, 2026, 15(1): 102-117. https://doi.org/10.12677/JISP.2026.151009

参考文献

[1]	Sindagi, V., Yasarla, R. and Patel, V.M.M. (2020) JHU-CROWD++: Large-Scale Crowd Counting Dataset and a Benchmark Method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 2594-2609. [Google Scholar] [CrossRef]
[2]	Zhang, Y., Zhou, D., Chen, S., Gao, S. and Ma, Y. (2016) Single-Image Crowd Counting via Multi-Column Convolutional Neural Network. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 589-597. [Google Scholar] [CrossRef]
[3]	Wang, Q., Gao, J., Lin, W. and Yuan, Y. (2019) Learning from Synthetic Data for Crowd Counting in the Wild. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR), Long Beach, 15-20 June 2019, 8198-8207. [Google Scholar] [CrossRef]
[4]	Zhang, C., Li, H., Wang, X. and Yang, X. (2015) Cross-Scene Crowd Counting via Deep Convolutional Neural Networks. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 7-12 June 2015, 833-841. [Google Scholar] [CrossRef]
[5]	Zhu, H., Yuan, J., Yang, Z., Zhong, X. and Wang, Z. (2022) Fine-Grained Fragment Diﬀusion for Cross Domain Crowd Counting. Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, 10-14 October 2022, 5659-5668. [Google Scholar] [CrossRef]
[6]	Babu Sam, D., Surya, S. and Venkatesh Babu, R. (2017) Switching Convolutional Neural Net- work for Crowd Counting. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 4031-4039. [Google Scholar] [CrossRef]
[7]	Jiang, X., Zhang, L., Xu, M., Zhang, T., Lv, P., Zhou, B., et al. (2020) Attention Scaling for Crowd Counting. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 4705-4714. [Google Scholar] [CrossRef]
[8]	Song, Q., Wang, C., Wang, Y., Tai, Y., Wang, C., Li, J., et al. (2021) To Choose or to Fuse? Scale Selection for Crowd Counting. Proceedings of the AAAI Conference on Artiﬁcial Intelligence, 35, 2576-2583. [Google Scholar] [CrossRef]
[9]	Shu, W., Wan, J., Tan, K.C., Kwong, S. and Chan, A.B. (2022) Crowd Counting in the Frequency Domain. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 19618-19627. [Google Scholar] [CrossRef]
[10]	Lin, H., Ma, Z., Ji, R., Wang, Y. and Hong, X. (2022) Boosting Crowd Counting via Multi- faceted Attention. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 19596-19605. [Google Scholar] [CrossRef]
[11]	Liu, Y., Wang, Z., Shi, M., Satoh, S., Zhao, Q. and Yang, H. (2020) Towards Unsupervised Crowd Counting via Regression-Detection Bi-Knowledge Transfer. Proceedings of the 28th ACM International Conference on Multimedia, Seattle, 12-16 October 2022, 129-137. [Google Scholar] [CrossRef]
[12]	Wu, Q., Wan, J. and Chan, A.B. (2021) Dynamic Momentum Adaptation for Zero-Shot Cross- Domain Crowd Counting. Proceedings of the 29th ACM International Conference on Multi- media, 20-24 October 2021, 658-666. [Google Scholar] [CrossRef]
[13]	Zhu, H., Yuan, J., Zhong, X., Yang, Z., Wang, Z. and He, S. (2023) DAOT: Domain- Agnostically Aligned Optimal Transport for Domain-Adaptive Crowd Counting. Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, 29 October-3 November 2023, 4319-4329. [Google Scholar] [CrossRef]
[14]	Zhu, H., Yuan, J., Zhong, X., Liao, L. and Wang, Z. (2024) Find Gold in Sand: Fine-Grained Similarity Mining for Domain-Adaptive Crowd Counting. IEEE Transactions on Multimedia, 26, 3842-3855. [Google Scholar] [CrossRef]
[15]	Pan, X., Luo, P., Shi, J. and Tang, X. (2018) Two at Once: Enhancing Learning and Gener- alization Capacities via IBN-Net. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., Computer Vision—ECCV 2018. ECCV 2018. Lecture Notes in Computer Science, Vol. 11208, Springer, 484-500. [Google Scholar] [CrossRef]
[16]	Pan, X., Zhan, X., Shi, J., Tang, X. and Luo, P. (2019) Switchable Whitening for Deep Repre- sentation Learning. 2019 IEEE/CVF International Conference on Computer Vision (ICCV),Seoul, 27 October-2 November 2019, 1863-1871. [Google Scholar] [CrossRef]
[17]	Choi, S., Jung, S., Yun, H., Kim, J.T., Kim, S. and Choo, J. (2021) RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20- 25 June 2021, 11575-11585. [Google Scholar] [CrossRef]
[18]	Mansilla, L., Echeveste, R., Milone, D.H. and Ferrante, E. (2021) Domain Generalization via Gradient Surgery. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 6610-6618. [Google Scholar] [CrossRef]
[19]	Du, Z., Deng, J. and Shi, M. (2023) Domain-General Crowd Counting in Unseen Scenarios. Proceedings of the AAAI Conference on Artiﬁcial Intelligence, 37, 561-570. [Google Scholar] [CrossRef]
[20]	Peng, Z. and Chan, S.-H.G. (2024) Single Domain Generalization for Crowd Counting. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 16-22 June 2024, 28025-28034. [Google Scholar] [CrossRef]
[21]	Idrees, H., Tayyab, M., Athrey, K., Zhang, D., Al-Maadeed, S., Rajpoot, N., et al. (2018) Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds. In: Ferrari, V., et al., Eds., Lecture Notes in Computer Science, Springer International Publishing, 544-559. [Google Scholar] [CrossRef]
[22]	Simonyan, K. and Zisserman, A. (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556
[23]	Loshchilov, I. and Hutter, F. (2017) Decoupled Weight Decay Regularization. ArXiv, ab- s/1711.05101
[24]	Smith, L.N. and Topin, N. (2019) Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates.Artiﬁcial Intelligence and Machine Learning for Multi-Domain Operations Applications , 11006,369–386.
[25]	Ma, Z., Wei, X., Hong, X. and Gong, Y. (2019) Bayesian Loss for Crowd Count Estimation with Point Supervision. 2019 IEEE/CVF International Conference on Computer Vision (ICCV),Seoul, 27 October-2 November 2019, 6141-6150. [Google Scholar] [CrossRef]
[26]	Wang, B., Liu, H., Samaras, D. and Hoai, M. (2020) Distribution Matching for Crowd Count- ing. Proceedings of the 34th International Conference on Neural Information Processing Sys- tems, Vancouver, 6-12 December 2020, 1595-1607.

友情链接