基于标签分布学习的眼部情绪识别
Eye Emotion Recognition Based on Label Distribution Learning
摘要: 眼部情绪识别指的是仅从眼部区域的情绪特征识别用户的情绪状态,分析出用户佩戴智能头盔的情景下被遮挡的真实情绪。为了改善眼部区域情绪信息量少、标签歧义性所带来的识别准确度低和识别效率低的问题,本文提出一种用于识别眼部情绪的神经网络模型。该模型包含情绪标签分布生成网络和轻量级的眼部情绪识别网络两个模块。情绪标签分布生成网络会生成眼部图像的情绪分布标签,用于辅助眼部情绪识别网络的参数训练。眼部情绪识别网络包含基于注意力机制的全局特征增强模块以及局部特征增强模块,能够从信息量少的眼部图像推理出用户情绪类别。同时,为了评估网络模型性能,我们构建了2个数据集,分别是REED (Realistic Eye Emotion Datasets)和EMUG (Eye-Multimedia Understanding Group)。实验结果表明,在REED和EMUG数据集的四分类的平均准确率分别达到68.5%和80.9%,七分类的平均准确率分别达到62.0%和68.1%。同时,虽然本文模型的参数量远小于其他网络模型,但是识别效率也要优于其他模型。
Abstract: Eye emotion recognition refers to identifying the emotional state of the user only from the emotional characteristics of the eye region, and analyzing the true emotion that the user is blocked when wearing the smart helmet. In order to improve the recognition accuracy and recognition efficiency caused by the lack of emotional information in the eye region and label ambiguity, a neural network model for eye emotion recognition is proposed. The model consists of two modules: emotional label distribution generation network and a lightweight eye emotion recognition network. The emotion label distribution generation network generates emotion distribution labels of eye images, which are used to assist the parameter training of the eye emotion recognition network. The eye emotion recognition network includes a global feature enhancement module and a local feature enhance-ment module based on the attention mechanism, which can infer user’s emotion categories from eye images with less information. At the same time, in order to evaluate the performance of the network model, we constructed 2 datasets, REED (Realistic Eye Emotion Datasets) and EMUG (Eye-Multimedia Understanding Group). The experimental results show that the average accuracy of these four categories is 68.5% and 80.9% respectively, and the average accuracy of the seven categories is 62.0% and 68.1% respectively on Reed and EMUG datasets. At the same time, although the parameters of the model in this paper are much smaller than other network models, the recognition efficiency is also better than other models.
文章引用:李学聪, 战荫伟, 杨卓, 陆玉波. 基于标签分布学习的眼部情绪识别[J]. 计算机科学与应用, 2022, 12(4): 1213-1225. https://doi.org/10.12677/CSA.2022.124123

参考文献

[1] Lorenz, O. and Thomas, U. (2019) Real Time Eye Gaze Tracking System Using CNN-Based Facial Features for Human Attention Measurement. Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, Vol. 5, Prague, 25-27 February 2019, 598-606. [Google Scholar] [CrossRef
[2] Friesen, W.V. and Ekman, P. (1983) EMFACS-7: Emotional Facial Action Coding System. Unpublished Manuscript, University of California at San Francisco, San Francisco.
[3] Ma, N., Zhang, X., Zheng, H.T. and Sun, J. (2018) Shufflenet v2: Practical Guidelines for Efficient CNN Architecture Design. Proceedings of the European Conference on Computer Vision, Munich, 8-14 September 2018, 122-138. [Google Scholar] [CrossRef
[4] He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef
[5] Aifanti, N., Papachristou, C. and Delopoulos, A. (2010) The MUG Facial Expression Database. 11th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 10), Desenzano del Garda, 12-14 April 2010, 1-4.
[6] Scheirer, J., Fernandez, R. and Picard, R.W. (1999) Expression Glasses: A Wearable Device for Facial Expression Recognition. CHI’99 Extended Abstracts on Human Factors in Computing Systems, Pittsburgh, 15-20 May 1999, 262-263. [Google Scholar] [CrossRef
[7] Fukumoto, K., Terada, T. and Tsukamoto, M. (2013) A Smile/Laughter Recognition Mechanism for Smile-Based Life Logging. Pro-ceedings of the 4th Augmented Human International Conference, Stuttgart, 7-8 March 2013, 213-220. [Google Scholar] [CrossRef
[8] Masai, K., Kunze, K., Sugimoto, M. and Billinghurst, M. (2016) Empa-thy Glasses. Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems, San Jose, 7-12 May 2016, 1257-1263. [Google Scholar] [CrossRef
[9] Masai, K., Sugiura, Y., Suzuki, K., Shimamura, S., Kunze, K., Ogata, M., et al. (2015) AffectiveWear: Towards Recognizing Affect in Real Life. Adjunct Proceedings of the 2015 ACM Inter-national Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2015 ACM International Symposium on Wearable Computers, Osaka, 7-11 September 2015, 357-360. [Google Scholar] [CrossRef
[10] Cherkassky, V. and Ma, Y. (2004) Practical Selection of SVM Parame-ters and Noise Estimation for SVM Regression. Neural Networks, 17, 113-126. [Google Scholar] [CrossRef
[11] Kwon, J., Ha, J., Kim, D.H., Choi, J.W. and Kim, L. (2021) Emotion Recognition Using a Glasses-Type Wearable Device via Multi-Channel Facial Responses. IEEE Access, 9, 146392-146403. [Google Scholar] [CrossRef
[12] Yang, J., Zhang, D., Frangi, A.F. and Yang, J.-Y. (2004) Two-Dimensional PCA: A New Approach to Appearance-Based Face Representation and Recognition. IEEE Transac-tions on Pattern Analysis and Machine Intelligence, 26, 131-137. [Google Scholar] [CrossRef
[13] Soleymani, M., Pantic, M. and Pun, T. (2011) Multimodal Emotion Recognition in Response to Videos. IEEE Transactions on Affective Computing, 3, 211-223. [Google Scholar] [CrossRef
[14] Nie, J., Hu, Y., Wang, Y., Xia, S. and Jiang, X. (2020) SPIDERS: Low-Cost Wireless Glasses for Continuous In-Situ Bio-Signal Acquisition and Emotion Recognition. 2020 IEEE/ACM 5th International Conference on Internet-of-Things Design and Implementation (IoTDI), Sydney, 21-24 April 2020, 27-39. [Google Scholar] [CrossRef
[15] Yuan, Z.W. and Zhang, J. (2016) Feature Extraction and Image Retrieval Based on AlexNet. Eighth International Conference on Digital Image Processing (ICDIP 2016), International Society for Optics and Photonics, 100330E.
[16] Babiker, A., Faye, I., Prehn, K. and Malik, A. (2015) Machine Learn-ing to Differentiate between Positive and Negative Emotions Using Pupil Diameter. Frontiers in Psychology, 6, Article No. 1921. [Google Scholar] [CrossRef] [PubMed]
[17] Nummenmaa, L., Hyönä, J. and Calvo, M.G. (2006) Eye Movement Assessment of Selective Attentional Capture by Emotional Pictures. Emotion, 6, 257-268. [Google Scholar] [CrossRef] [PubMed]
[18] Hickson, S., Dufour, N., Sud, A., Kwatra, V. and Essa, I. (2019) Eyemotion: Classifying Facial Expressions in VR Using Eye-Tracking Cameras. 2019 IEEE Winter Conference on Ap-plications of Computer Vision (WACV), Waikoloa, 7-11 January 2019, 1626-1635. [Google Scholar] [CrossRef
[19] Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. and Wojna, Z. (2016) Rethinking the Inception Architecture for Computer Vision. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 2818-2826. [Google Scholar] [CrossRef
[20] Wu, H., Feng, J., Tian, X., Sun, E., Liu, Y., Dong, B., et al. (2020) EMO: Real-Time Emotion Recognition from Single-Eye Im-ages for Resource-Constrained Eyewear Devices. Proceedings of the 18th International Conference on Mobile Systems, Applications, and Services, Toronto, 15-19 June 2020, 448-461. [Google Scholar] [CrossRef
[21] Krishna, K. and Murty, M.N. (1999) Genetic K-Means Algorithm. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 29, 433-439. [Google Scholar] [CrossRef] [PubMed]
[22] Tang, S., Andriluka, M., Andres, B. and Schiele, B. (2017) Multiple People Tracking by Lifted Multicut and Person re-Identification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 3701-3710. [Google Scholar] [CrossRef
[23] Yoo, D., Park, S., Lee, J.Y., Paek, A.S. and Kweon, I.S. (2015) Attentionnet: Aggregating Weak Directions for Accurate Ob-ject Detection. Proceedings of the IEEE International Conference on Computer Vision, Santiago, 7-13 December 2015, 2659-2667. [Google Scholar] [CrossRef
[24] Park, J., Woo, S., Lee, J.Y., et al. (2018) Bam: Bottle-neck Attention Module. arXiv preprint arXiv:1807.06514,.
[25] Zhao, Z., Liu, Q. and Zhou, F. (2021) Robust Light-weight Facial Expression Recognition Network with Label Distribution Training. Proceedings of the AAAI Conference on Artificial Intelligence, 35, 3510-3519. [Google Scholar] [CrossRef
[26] Goodfellow, I.J., Erhan, D., Carrier, P.L., Courville, A., Mir-za, M., Hamner, B., et al. (2013) Challenges in Representation Learning: A Report on Three Machine Learning Contests. International Conference on Neural Information Processing, Daegu, 3-7 November 2013, 117-124. [Google Scholar] [CrossRef
[27] Eivazi, S., Santini, T., Keshavarzi, A., Kübler, T. and Mazzei, A. (2019) Improving Real-Time CNN-Based Pupil Detection through Domain-Specific Data Augmentation. Proceedings of the 11th ACM Symposium on Eye Tracking Research & Applications, Denver, 25-28 June 2019, Article No. 40. [Google Scholar] [CrossRef