基于Swish激活函数的人脸情绪识别的深度学习模型研究
A Research on Deep Learning Model for Face Emotion Recognition Based on Swish Activation Function
DOI: 10.12677/JISP.2019.83016, PDF,  被引量    国家自然科学基金支持
作者: 王灵矫, 李 乾, 郭 华:湘潭大学,信息工程学院,湖南 湘潭
关键词: 激活函数反向传播卷积神经网络深度学习计算机视觉Activation Function Back Propagation Convolutional Neural Network Deep Learning Computer Vision
摘要: 近年来,深度学习模型得到飞速发展,深度卷积神经网络作为其中一种方法在计算机视觉中得到广泛应用。影响深度学习模型性能的因素众多,其中激活函数的选取和神经网络的结构对深度学习模型的性能有着重要的影响。本文分析了传统激活函数与新型Swish激活函数的优缺点,将Swish函数引入人脸情绪深度学习模型,提出了一种改进的反向传播算法,并在卷积神经网络中使用多层小尺寸卷积模块替代大尺寸卷积模块,提取细化特征,构建了一种新型的人脸情绪识别深度学习模型Swish-FER-CNNs。实验结果表明,相比于ReLU、L-ReLU、P-ReLU等激活函数,基于Swish激活函数的深度学习模型的识别准确率更高。结合改进的网络结构,本文构建的深度学习模型Swish-FER-CNNS相对于现存模型,识别准确率提高了4.02%。
Abstract: In recent years, deep learning model has been developed rapidly. As one of the methods, deep convolution neural network has been widely used in computer vision. There are many factors af-fecting the performance of deep learning model, among which the selection of activation function and the structure of neural network have important impact on the performance of deep learning model. This paper analyses the advantages and disadvantages of the traditional activation function and the new Swish activation function, introduces Swish function into the deep learning model of facial emotion, proposes an improved back propagation algorithm, and uses multi-layer small-size convolution module instead of large-size convolution module in the convolution neural network to extract refinement features, and constructs a new deep learning model of facial emotion recognition, Swish-FER-CNNs. The experimental results show that the recognition accuracy of deep learning model based on Swish activation function is higher than that of activation functions such as ReLU, L-ReLU and P-ReLU. With the improved network structure, the recognition accuracy of the deep learning model of Swish-FER-CNNS constructed in the paper is improved by 4.02% compared with the existing model.
文章引用:王灵矫, 李乾, 郭华. 基于Swish激活函数的人脸情绪识别的深度学习模型研究[J]. 图像与信号处理, 2019, 8(3): 110-120. https://doi.org/10.12677/JISP.2019.83016

参考文献

[1] Giannopoulos, P., Perikos, I. and Hatzilygeroudis, I. (2018) Deep Learning Approaches for Facial Emotion Recognition: A Case Study on FER-2013. In: Hatzilygeroudis, I. and Palade, V., Eds., Advances in Hybridization of Intelligent Methods, Smart Innovation, Systems and Technologies, Springer, Cham, 1-16.
[Google Scholar] [CrossRef
[2] Bruce, V. and Young, A. (1986) Understanding Face Recognition. British Journal of Psychology, 77, 305-327.
[Google Scholar] [CrossRef] [PubMed]
[3] Ioffe, S. and Szegedy, C. (2015) Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. Arxiv: 1502.03167.
[4] Simonyan, K. and Zisserman, A. (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition. Arxiv: 1409.1556.
[5] Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2012) ImageNet Classification with Deep Convolutional Neural Networks. In: Pereira, F., Burges, C.J.C., Bottou, L. and Weinberger, K.Q., Eds., Advances in Neural Information Processing Systems, The MIT Press, ‎Cambridge, MA, 1097-1105.
[6] Zhang, C. and Woodland, P.C. (2016) DNN Speaker Adaptation Using Parameterised Sigmoid and ReLU Hidden Activation Functions. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, Shanghai, 20-25 March 2016, 5300-5304.
[Google Scholar] [CrossRef
[7] Taigman, Y., Yang, M., Ranzato, M.A. and Wolf, L. (2014) Deepface: Closing the Gap to Human-Level Performance in Face Verification. 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, 23-28 June 2014, 1701-1708.
[Google Scholar] [CrossRef
[8] Szegedy, C., Ioffe, S., Vanhoucke, V. and Alemi, A.A. (2017) Inception-v4, Inception-Resnet and the Impact of Residual Connections on Learning. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, 4-9 February 2017, 4278-4284.
[9] Wang, X., Girshick, R., Gupta, A. and He, K. (2018) Non-Local Neural Networks. 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 18-23 June 2018, 7794-7803.
[Google Scholar] [CrossRef
[10] Jeon, J., Park, J.C., Jo, Y.J., et al. (2016) A Real-Time Facial Expression Rec-ognizer Using Deep Neural Network. In: Proceedings of the 10th International Conference on Ubiquitous Information Management and Communication, ACM, New York, 94:1-94:4.
[Google Scholar] [CrossRef
[11] LeCun, Y., Boser, B.E., Denker, J.S., et al. (1990) Handwritten Digit Recognition with a Back-Propagation Network. In: Pereira, F., Burges, C.J.C., Bottou, L. and Weinberger, K.Q., Eds., Advances in Neural Information Processing Systems, The MIT Press, ‎Cambridge, MA, 396-404.
[12] Ramachandran, P., Zoph, B. and Le, Q.V. (2017) Searching for Activation Functions. Computer Science, ArXiv: 1710.05941.
[13] Saxe, A.M., McClelland, J.L. and Ganguli, S. (2013) Exact Solutions to the Nonlinear Dynamics of Learning in Deep Linear Neural Networks. Computer Science, ArXiv: 1312.6120.
[14] Eger, S., Youssef, P. and Gurevych, I. (2019) Is It Time to Swish? Comparing Deep Learning Activation Functions Across NLP Tasks. Computer Science, arXiv: 1901.02671.
[15] Maas, A.L., Hannun, A.Y. and Ng, A.Y. (2013) Rectifier Nonlinearities Improve Neural Network Acoustic Models. International Conference on Machine Learning.
[16] Xu, B., Wang, N., Chen, T. and Li, M. (2015) Empirical Evaluation of Rectified Activations in Convolutional Network. Computer Science, arXiv: 1505.00853.
[17] Zhang, X., Zou, Y. and Shi, W. (2017) Dilated Convolution Neural Network with LeakyReLU for Environmental Sound Classification. 2017 22nd International Conference on Digital Signal Processing, London, 23-25 August 2017, 1-5.
[Google Scholar] [CrossRef
[18] He, K., Zhang, X., Ren, S. and Sun, J. (2015) Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification.2015 IEEE International Conference on Computer Vision, Santiago, Chile, 7-13 December 2015, 1026-1034.
[Google Scholar] [CrossRef
[19] Gulcehre, C., Moczulski, M., Denil, M. and Bengio, Y. (2016) Noisy Activation Functions. In: Proceedings of the 33rd International Conference on Machine Learning, PMLR 48, 3059-3068.
[20] Jia, Y., Shelhamer, E., Donahue, J., et al. (2014) Caffe: Convolutional Architecture for Fast Feature Embedding. In: Balcan, M.F. and Weinberger, K.Q., Eds., Proceedings of the 22nd ACM International Conference on Multimedia, ICML, New York, 675-678.