基于图像序列的微表情识别方法
Micro-Expression Recognition Method Based on Image Sequences
摘要: 微表情因其持续时间极短、强度低及面部运动高度局部化的特点,是情感计算与模式识别领域的重要研究方向,在安全监控、临床诊断、人机交互等方面具有广泛应用价值。然而,现有微表情识别方法仍面临两大挑战:一是难以从时序与空间维度精准定位关键运动区域;二是在小样本、类别不平衡数据下模型识别效果较弱。为此,本文提出一种基于图像序列的微表情识别方法AD-Net。该方法以DenseNet121为主干网络,引入坐标注意力机制,在通道与空间维度上增强对微表情关键区域的特征提取能力;进一步结合投影梯度下降对抗训练策略,提升模型对噪声、光照等扰动的鲁棒性,缓解小样本过拟合问题。此外,通过在损失函数中引入类别权重矩阵,改善类别不平衡对模型性能的影响。在CASME2和SAMM两个主流数据集上进行的消融实验与对比实验结果表明,AD-Net在关键性能指标(如准确率、召回率)上均显著优于现有基线方法,充分验证了所提出模块与策略的有效性。
Abstract: Micro-expressions, characterized by their extremely short duration, low intensity, and highly localized facial movements, constitute a crucial research direction in affective computing and pattern recognition, with broad application value in fields such as security monitoring, clinical diagnosis, and human-computer interaction. However, existing micro-expression recognition methods still face two major challenges: first, difficulty in precisely localizing key motion regions across both temporal and spatial dimensions; second, weak model recognition performance under conditions of small sample sizes and class imbalance. To address these issues, this paper proposes an image-sequence-based micro-expression recognition method named AD-Net. The method adopts DenseNet121 as the backbone network and incorporates a coordinate attention mechanism to enhance feature extraction of key micro-expression regions across channel and spatial dimensions. Furthermore, a projected gradient descent adversarial training strategy is integrated to improve the model’s robustness against disturbances such as noise and illumination variation, thereby mitigating overfitting issues in small-sample scenarios. Additionally, a class weight matrix is introduced into the loss function to alleviate the negative impact of class imbalance on model performance. Ablation and comparative experiments conducted on two mainstream datasets, CASME2 and SAMM, demonstrate that AD-Net significantly outperforms existing baseline methods across key performance metrics (such as accuracy and recall), thereby fully validating the effectiveness of the proposed modules and strategies.
文章引用:李佳艳, 陈爽, 冯浩, 魏立臻, 张丽艳. 基于图像序列的微表情识别方法[J]. 人工智能与机器人研究, 2026, 15(2): 593-602. https://doi.org/10.12677/airr.2026.152057

参考文献

[1] Zeng, X., Zhao, X., Zhong, X. and Liu, G. (2023) A Survey of Micro-Expression Recognition Methods Based on LBP, Optical Flow and Deep Learning. Neural Processing Letters, 55, 5995-6026. [Google Scholar] [CrossRef
[2] Kim, D.H., Baddar, W.J. and Ro, Y.M. (2016) Micro-Expression Recognition with Expression-State Constrained Spatio-Temporal Feature Representations. Proceedings of the 24th ACM international conference on Multimedia, Amsterdam, 15-19 October 2016, 382-386. [Google Scholar] [CrossRef
[3] Liong, S., See, J., Wong, K. and Phan, R.C. (2018) Less Is More: Micro-Expression Recognition from Video Using Apex Frame. Signal Processing: Image Communication, 62, 82-92. [Google Scholar] [CrossRef
[4] Ren, Y., Chen, X.Q., Wang, D.R. and Chen, X.Y. (2024) Improved Residual Network and Apex Frame for Micro-Expression Recognition. Journal of Chongqing Technology and Business University (Natural Sciences Edition), 41, 21-29.
[5] Hou, Q., Zhou, D. and Feng, J. (2021) Coordinate Attention for Efficient Mobile Network Design. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville 20-25 June 2021, 13708-13717. [Google Scholar] [CrossRef
[6] Madry, A., Makelov, A., Schmidt, L., et al. (2018) Towards Deep Learning Models Resistant to Adversarial Attacks. arXiv: 1706.06083.
[7] Deng, J,. Guo, J., Zhou, Y., et al. (2019) RetinaFace: Single-Stage Dense Face Localisation in the Wild. arXiv: 1905.00641.
[8] Liu, M.L., Wang, S., Yang, Z.Q., et al. (2026) Change Detection in High-Resolution Satellite Imagery Based on Improved Attention and Convolution. Information Recording Materials, 27, 193-196.
[9] Gan, Y.S., Liong, S., Yau, W., Huang, Y. and Tan, L. (2019) Off-ApexNet on Micro-Expression Recognition System. Signal Processing: Image Communication, 74, 129-139. [Google Scholar] [CrossRef
[10] Liong, S., Gan, Y.S., See, J., Khor, H. and Huang, Y. (2019) Shallow Triple Stream Three-Dimensional CNN (STSTNet) for Micro-Expression Recognition. 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), Lille, 14-18 May 2019, 1-5. [Google Scholar] [CrossRef
[11] Zhang, L., Hong, X., Arandjelovic, O. and Zhao, G. (2022) Short and Long Range Relation Based Spatio-Temporal Transformer for Micro-Expression Recognition. IEEE Transactions on Affective Computing, 13, 1973-1985. [Google Scholar] [CrossRef
[12] Fu, C., Yang, W., Chen, D. and Wei, F. (2023) AM3F-FlowNet: Attention-Based Multi-Scale Multi-Branch Flow Network. Entropy, 25, 1064. [Google Scholar] [CrossRef] [PubMed]
[13] Li, H., Sui, M., Zhu, Z. and Zhao, F. (2022) MMNet: Muscle Motion-Guided Network for Micro-Expression Recognition. Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, Vienna, 23-29 July 2022, 1074-1080. [Google Scholar] [CrossRef