基于改进3D-ResNet34模型的视频人员情绪识别研究
Research on Video Personnel Emotion Recognition Based on Improved 3D-ResNet34 Model
摘要: 由于人员情绪的多变性,面部表情会随着时间的推移与情绪的变化而变化,这一点在视频数据中表现尤为显著,此时使用只对单帧图像进行训练的网络模型来判别视频数据中的人员心情显然不可行。因此为解决上述问题,提出一种结合3D-ResNet34、ConvLSTM以及Transformer的网络模型用于人员情绪识别的方法。首先,对视频数据集进行预处理,将完整的视频数据划分成若干个连续片段,使用3D-ResNet34网络对视频片段进行空间特征提取。其次,在网络模型中设计添加ConvLSTM模块用于从空间特征中提取更深层次的时间维度特征。然后,通过使用自设计的Transformer模块对每个视频片段生成一个注意力分布,用于加权融合各帧特征,得到一个表示整个视频片段的注意力特征。最后对提取完成的时间维度特征与注意力特征进行特征融合,形成一个新的特征表示,并送入Softmax层进行分类识别。实验表明,本文设计的方法取得了较好的识别准确率。
Abstract: Due to the variability of people’s emotions, facial expressions will change with the passage of time and the changes of emotions, which is particularly significant in video data. At this time, it is obviously not feasible to use the network model trained only on a single frame image to identify people’s moods in video data. Therefore, in order to solve the above problems, a method combining 3D-ResNet34, ConvLSTM and Transformer network model is proposed for human emotion recognition. Firstly, the video data set is preprocessed, the complete video data is divided into several continuous fragments, and the spatial features of the video fragments are extracted using 3D-ResNet34 network. Secondly, ConvLSTM modules are added to the network model to extract deeper temporal features from spatial features. Then, a self-designed Transformer module can be used to generate an attention distribution for each video clip, which can be used to weight and fuse each frame feature to obtain an attention feature representing the entire video clip. Finally, the extracted time dimension features and attention features are fused to form a new feature representation, which is sent to the Softmax layer for classification and recognition. Experiments show that the method designed in this paper achieves a good recognition accuracy.
文章引用:郭复澳, 姚克明, 王中洲. 基于改进3D-ResNet34模型的视频人员情绪识别研究[J]. 图像与信号处理, 2024, 13(3): 338-347. https://doi.org/10.12677/jisp.2024.133029

参考文献

[1] Li, S. and Deng, W.H. (2020) Deep Facial Expression Recognition: A Survey. IEEE Transactions on Affective Computing, 13, 1195-1215.
[2] 唐宏, 向俊玲, 陈海涛. 多区域融合轻量级人脸表情识别网络[J]. 激光与光电子学进展, 2023, 60(6): 81-89.
[3] Zhi, R., Xu, H., Wan, M. and Li, T. (2019) Combining 3D Convolutional Neural Networks with Transfer Learning by Supervised Pre-Training for Facial Micro-Expression Recognition. IEICE Transactions on Information and Systems, 102, 1054-1064. [Google Scholar] [CrossRef
[4] Nagaraju, M., Yannam, A., Sreedhar P, S.S. and Bhargavi, M. (2022) Double Optconnet Architecture Based Facial Expression Recognition in Video Processing. The Imaging Science Journal, 70, 46-60. [Google Scholar] [CrossRef
[5] Hossain, S., Umer, S., Rout, R.K. and Tanveer, M. (2023) Fine-Grained Image Analysis for Facial Expression Recognition Using Deep Convolutional Neural Networks with Bilinear Pooling. Applied Soft Computing, 134, Article ID: 109997. [Google Scholar] [CrossRef
[6] Kumar, A.R. and Divya, G. (2020) Learning Effective Video Features for Facial Expression Recognition via Hybrid Deep Learning. International Journal of Recent Technology and Engineering (IJRTE), 8, 5602-5604. [Google Scholar] [CrossRef
[7] 徐胜超, 叶力洪. 基于多方向特征融合的动态人脸微表情识别方法[J]. 计算机与数字工程, 2022, 50(8): 1818-1822.
[8] 何晓云, 许江淳, 史鹏坤. 基于注意力机制的视频人脸表情识别[J]. 信息技术, 2020, 44(2): 103-107.
[9] Li, D.H., et al. (2022) Emotion Recognition of Subjects with Hearing Impairment Based on Fusion of Facial Expression and EEG Topographic Map. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 31, 437-445.
[10] 史志博, 谭志. 融入注意力的残差网络表情识别方法[J]. 计算机应用与软件, 2023, 40(9): 222-228.
[11] Qu, Z. and Niu, D. (2023) Leveraging ResNet and Label Distribution in Advanced Intelligent Systems for Facial Expression Recognition. Mathematical Biosciences and Engineering, 20, 11101-11115. [Google Scholar] [CrossRef] [PubMed]
[12] Pan, H., Xie, L. and Wang, Z. (2023) C3DBed: Facial Micro-Expression Recognition with Three-Dimensional Convolutional Neural Network Embedding in Transformer Model. Engineering Applications of Artificial Intelligence, 123, Article ID: 106258. [Google Scholar] [CrossRef
[13] Wu, C. and Guo, F. (2020) TSNN: Three-Stream Combining 2D and 3D Convolutional Neural Network for Micro‐Expression Recognition. IEEJ Transactions on Electrical and Electronic Engineering, 16, 98-107. [Google Scholar] [CrossRef
[14] Naveen, P. (2023) Occlusion-Aware Facial Expression Recognition: A Deep Learning Approach. Multimedia Tools and Applications, 83, 32895-32921. [Google Scholar] [CrossRef
[15] Xie, W.C., et al. (2019) Adaptive Weighting of Handcrafted Feature Losses for Facial Expression Recognition. IEEE Transactions on Cybernetics, 51, 2787-2800.
[16] Zhang, K., Huang, Y., Du, Y. and Wang, L. (2017) Facial Expression Recognition Based on Deep Evolutional Spatial-Temporal Networks. IEEE Transactions on Image Processing, 26, 4193-4203. [Google Scholar] [CrossRef] [PubMed]
[17] Kumawat, S., Verma, M. and Raman, S. (2019) LBVCNN: Local Binary Volume Convolutional Neural Network for Facial Expression Recognition from Image Sequences. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, 16-17 June 2019, 207-216. [Google Scholar] [CrossRef