基于不确定性感知学习的鲁棒表情识别方法
Uncertainty-Aware Learning for Robust Facial Expression Recognition Method
DOI: 10.12677/csa.2026.162067, PDF,   
作者: 贾开熠:合肥工业大学计算机与信息学院,安徽 合肥
关键词: 面部表情识别标签噪声学习不确定性学习Facial Expression Recognition Label Noise Learning Uncertainty Learning
摘要: 真实场景下的面部表情识别(FER)深受数据模糊性和标签噪声的困扰。现有方法主要依赖确定性嵌入,将每张面部图像映射到特征空间中的一个固定点。然而,这种范式迫使模型将模糊样本(如存在遮挡、低分辨率或细微复合情绪的样本)过拟合到固定的类别标签上,进而降低了泛化能力。为解决这一问题,本文提出一种新颖的模糊感知与抑制模块(APSM)。与传统方法不同,APSM将面部特征建模为概率高斯分布,其特征由均值(语义中心)和方差(不确定性)表征。本文引入一种不确定性衰减损失函数(Uncertainty-Attenuated Loss),该函数动态权衡学习过程:估计不确定性高的样本对梯度更新的贡献更小,从而有效抑制噪声数据的影响。在RAF-DB和AffectNet数据集上的大量实验表明,本文的方法在无需复杂外部数据或人工清理的情况下,显著提升了鲁棒性并达到了先进性能。
Abstract: Facial Expression Recognition (FER) in real-world scenarios is severely hampered by data ambiguity and label noise. Existing methods predominantly rely on deterministic embeddings, mapping each facial image to a fixed point in the feature space. However, this paradigm forces the model to overfit ambiguous samples—such as those with occlusion, low resolution, or subtle compound emotions—to rigid categorical labels, thereby degrading generalization capabilities. To address this, we propose a novel Ambiguity Perception & Suppression Module (APSM). Unlike traditional approaches, APSM models facial features as Probabilistic Gaussian Distributions, characterized by a mean (semantic center) and a variance (uncertainty). We introduce an Uncertainty-Attenuated Loss that dynamically weighs the learning process: samples with high estimated uncertainty contribute less to the gradient update, effectively suppressing the impact of noisy data. Extensive experiments on the RAF-DB and AffectNet datasets demonstrate that our method significantly improves robustness and achieves state-of-the-art performance without requiring complex external data or manual cleaning.
文章引用:贾开熠. 基于不确定性感知学习的鲁棒表情识别方法[J]. 计算机科学与应用, 2026, 16(2): 381-394. https://doi.org/10.12677/csa.2026.162067

参考文献

[1] Li, S. and Deng, W. (2022) Deep Facial Expression Recognition: A Survey. IEEE Transactions on Affective Computing, 13, 1195-1215. [Google Scholar] [CrossRef
[2] Li, S., Deng, W. and Du, J.P. (2017) Reliable Crowdsourcing and Deep Locality-Preserving Learning for Expression Recognition in the Wild. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 2852-2861. [Google Scholar] [CrossRef
[3] Mollahosseini, A., Hasani, B. and Mahoor, M.H. (2019) Affectnet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild. IEEE Transactions on Affective Computing, 10, 18-31. [Google Scholar] [CrossRef
[4] He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef
[5] Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2021) An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. International Conference on Learning Representations (ICLR), 3-7 May 2021, 611-632.
https://openreview.net/forum?id=YicbFdNTTy
[6] Mao, J., Xu, R., Yin, X., Chang, Y., Nie, B., Huang, A., et al. (2025) POSTER++: A Simpler and Stronger Facial Expression Recognition Network. Pattern Recognition, 157, Article ID: 110951. [Google Scholar] [CrossRef
[7] Xue, F., Wang, Q. and Guo, G. (2021) TransFER: Learning Relation-Aware Facial Expression Representations with Transformers. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 11-17 October 2021, 3605-3614. [Google Scholar] [CrossRef
[8] Wang, K., Peng, X., Yang, J., Lu, S. and Qiao, Y. (2020) Suppressing Uncertainties for Large-Scale Facial Expression Recognition. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 14-19 June 2020, 6897-6906. [Google Scholar] [CrossRef
[9] She, J., Hu, Y., Shi, H., Wang, J., Shen, Q. and Mei, T. (2021) Dive into Ambiguity: Latent Distribution Mining and Pairwise Uncertainty Estimation for Facial Expression Recognition. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 19-25 June 2021, 6248-6257. [Google Scholar] [CrossRef
[10] Zhang, Y., Wang, C. and Deng, X. (2021) Relative Uncertainty Learning for Facial Expression Recognition. 35th Conference on Neural Information Processing Systems (NeurIPS 2021), 6-14 December 2021, 17616-17627.
https://proceedings.neurips.cc/paper/2021/hash/9332c513ef44b682e9347822c2e457ac-Abstract.html
[11] Wen, Z., Lin, W., Wang, T. and Xu, G. (2023) Distract Your Attention: Multi-Head Cross Attention Network for Facial Expression Recognition. Biomimetics, 8, Article No. 199. [Google Scholar] [CrossRef] [PubMed]
[12] Zhang, Y., Wang, C., Ling, X. and Deng, W. (2022) Learn from All: Erasing Attention Consistency for Noisy Label Facial Expression Recognition. European Conference on Computer Vision (ECCV), Tel Aviv, 23-27 October 2022, 418-434. [Google Scholar] [CrossRef
[13] Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z. and Matthews, I. (2010) The Extended Cohn-Kanade Dataset (CK+): A Complete Dataset for Action Unit and Emotion-Specified Expression. 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, San Francisco, 13-18 June 2010, 94-101. [Google Scholar] [CrossRef
[14] Goodfellow, I.J., Erhan, D., Carrier, P.L., Courville, A., Mirza, M., Hamner, B., et al. (2013) Challenges in Representation Learning: A Report on Three Machine Learning Contests. International Conference on Neural Information Processing (ICONIP), Daegu, 3-7 November 2013, 117-124. [Google Scholar] [CrossRef
[15] Benitez-Quiroz, C.F., Srinivasan, R. and Martinez, A.M. (2016) EmotioNet: An Accurate, Real-Time Algorithm for the Automatic Annotation of a Million Facial Expressions in the Wild. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 5562-5570. [Google Scholar] [CrossRef
[16] Shan, C., Gong, S. and McOwan, P.W. (2009) Facial Expression Recognition Based on Local Binary Patterns: A Comprehensive Study. Image and Vision Computing, 27, 803-816. [Google Scholar] [CrossRef
[17] Ding, H., Zhou, P. and Chellappa, R. (2020) Occlusion-Adaptive Deep Network for Robust Facial Expression Recognition. 2020 IEEE International Joint Conference on Biometrics (IJCB), Houston, 28 September-1 October 2020, 3624-3633. [Google Scholar] [CrossRef
[18] Vo, T.H., Lee, G.S., Yang, H.J. and Kim, S.H. (2020) Pyramid with Super Resolution for In-the-Wild Facial Expression Recognition. IEEE Access, 8, 131988-132001. [Google Scholar] [CrossRef
[19] Li, H., Wang, N., Ding, X., Yang, X. and Gao, X. (2021) Adapting Facial Expression Recognition from Lab to Wild via Knowledge Transfer. IEEE Transactions on Image Processing (TIP), 30, 4253-4263.
[20] Han, B., Yao, Q., Yu, X., et al. (2018) Co-Teaching: Robust Training of Deep Neural Networks with Extremely Noisy Labels. 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, 3-8 December 2018, 8536-8546.
https://proceedings.neurips.cc/paper/2018/hash/a19744e268754fb0148b017647355b7b-Abstract.html
[21] Li, J., Socher, R. and Hoi, S.C.H. (2020) DivideMix: Learning with Noisy Labels as Semi-Supervised Learning. 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, 26-30 April 2020, 6391-6406.
https://openreview.net/forum?id=HJgExaVtwr
[22] Kendall, A. and Gal, Y. (2017) What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? Annual Conference on Neural Information Processing Systems 2017, Long Beach, 4-9 December 2017, 5574-5584.
https://proceedings.neurips.cc/paper/2017/hash/2650d6089a6d640c5e85b2b88265dc2b-Abstract.html
[23] Lakshminarayanan, B., Pritzel, A. and Blundell, C. (2017) Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. Annual Conference on Neural Information Processing Systems 2017, Long Beach, 4-9 December 2017, 6402-6413.
https://proceedings.neurips.cc/paper/2017/hash/9ef2ed4b7fd2c810847ffa5fa85bce38-Abstract.html
[24] Shi, Y. and Jain, A. (2019) Probabilistic Face Embeddings. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27-28 October 2019, 6902-6911. [Google Scholar] [CrossRef
[25] Chang, J., Lan, Z., Cheng, C. and Wei, Y. (2020) Data Uncertainty Learning in Face Recognition. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 14-19 June 2020, 5710-5719. [Google Scholar] [CrossRef
[26] Chen, S., Wang, J., Chen, Y., Shi, Z., Geng, X. and Rui, Y. (2020) Label Distribution Learning on Auxiliary Label Space Graphs for Facial Expression Recognition. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 14-19 June 2020, 13984-13993. [Google Scholar] [CrossRef
[27] Zhao, Z., Liu, Q. and Zhou, F. (2021) Robust Lightweight Facial Expression Recognition Network with Label Distribution Training. Proceedings of the AAAI Conference on Artificial Intelligence, 35, 3510-3519. [Google Scholar] [CrossRef
[28] Ruan, D., Yan, Y., Lai, S., Chai, Z., Shen, C. and Wang, H. (2021) Feature Decomposition and Reconstruction Learning for Effective Facial Expression Recognition. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 19-25 June 2021, 7660-7669. [Google Scholar] [CrossRef
[29] Deng, J., Guo, J., Xue, N. and Zafeiriou, S. (2019) ArcFace: Additive Angular Margin Loss for Deep Face Recognition. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 16-20 June 2019, 4690-4699. [Google Scholar] [CrossRef
[30] Zhang, K., Zhang, Z., Li, Z. and Qiao, Y. (2016) Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks. IEEE Signal Processing Letters, 23, 1499-1503. [Google Scholar] [CrossRef
[31] Wang, K., Peng, X., Yang, J., Meng, D. and Qiao, Y. (2020) Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition. IEEE Transactions on Image Processing, 29, 4057-4069. [Google Scholar] [CrossRef] [PubMed]
[32] Farzaneh, A.H. and Qi, X. (2021) Discriminative Attention-Based Contrastive Learning for Video Facial Expression Recognition. IEEE Winter Conference on Applications of Computer Vision, WACV 2021, Waikoloa, 3-8 January 2021, 1571-1580.
[33] Li, H., Sui, J., Zhao, F., Zha, Z. and Wu, F. (2021) MVT: Mask Vision Transformer for Facial Expression Recognition in the Wild.
[34] Zeng, D., Shan, S. and Chen, X. (2022) Face2Exp: Combating Data Biases for Facial Expression Recognition. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 2229-2238.
[35] Ma, F., Sun, B. and Li, S. (2021) Facial Expression Recognition with Visual Transformers and Feature Fusion. IEEE Transactions on Affective Computing, 14, 2269-2283.