多层循环神经网络在动作识别中的应用

doi:10.12677/CSA.2020.106132

期刊菜单

多层循环神经网络在动作识别中的应用
Multilayer Recurrent Neural Network for Action Recognition

DOI: 10.12677/CSA.2020.106132, PDF, 被引量
作者: 杜溦：北方工业大学，北京
关键词: 人体动作识别；扩张卷积；长短期记忆网络；深度学习；Action Recognition； Dilated Convolution； Long Short-Term Memory Network； Deep Learning

摘要: 人体动作识别是目前计算机视觉的一个研究热点。本文在传统双流法的基础上，引入目标识别网络，提出了一种基于多层循环神经网络的人体动作识别算法。该算法利用三维扩张卷积金字塔处理连续视频图像，结合长短期记忆网络，给出了一种能够实时分析人体动作行为的金字塔卷积长短期记忆网络。本文利用NTU RGB + D人体动作识别数据库，对五种人体动作，如梳头、坐下、起立、挥手、跌倒等动作进行识别。试验结果表明算法由于采取了扩张卷积，参数量明显降低，在监控视频处理方面具有较好的准确性和实时性。

Abstract: Human action recognition is a research hotspot of computer vision. In this paper, we introduce an object detection model to typical two-stream network and propose an action recognition model based on multilayer recurrent neural network. Our model uses three-dimensional pyramid dilated convolution network to process serial video images, and combines with Long Short-Term Memory Network to provide a pyramid convolutional Long Short-Term Memory Network that can analyze human actions in real-time. This paper uses five kinds of human actions from NTU RGB + D action recognition datasets, such as brush hair, sit down, stand up, hand waving, falling down. The experimental results show that our model has good accuracy and real-time in the aspect of monitoring video processing due to using dilated convolution and obviously reduces parameters.

文章引用：杜溦. 多层循环神经网络在动作识别中的应用[J]. 计算机科学与应用, 2020, 10(6): 1277-1285. https://doi.org/10.12677/CSA.2020.106132

参考文献

[1]	Xiao, B., Wu, H. and Wei, Y. (2018) Simple Baselines for Human Pose Estimation and Tracking. The European Con-ference on Computer Vision (ECCV), Munich, 8-14 September 2018, 466-481. [Google Scholar] [CrossRef]
[2]	Simonyan, K. and Zisserman, A. (2014) Two-Stream Convo-lutional Networks for Action Recognition in Videos. Advances in Neural Information Processing Systems, 27, 568-576.
[3]	Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D. and Tang, X. (2016) Temporal Segment Networks: Towards Good Practices for Deep Action Recognition. The European Conference on Computer Vision, Amsterdam, 8-16 October 2016, 20-36. [Google Scholar] [CrossRef]
[4]	Carreira, J. and Zisserman, A. (2017) Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 6299-6308. [Google Scholar] [CrossRef]
[5]	Feichtenhofer, C., Pinz, A. and Zisserman, A. (2016) Convolutional Two-Stream Network Fusion for Video Action Recognition. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 1933-1941. [Google Scholar] [CrossRef]
[6]	Tran, D., Bourdev, L., Fergus, R., Torresani, L. and Paluri, M. (2015) Learning Spatiotemporal Features with 3D Convolu-tional Networks. The IEEE International Conference on Computer Vision (ICCV), Santiago, 7-13 December 2015, 4489-4497. [Google Scholar] [CrossRef]
[7]	Shi, X., Chen, Z., Wang, H. and Yeung, D. (2015) Con-volutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. Advances in Neural Infor-mation Processing Systems, 28, 802-810.
[8]	Majd, M. and Safabakhsh, R. (2019) A Motion-Aware ConvLSTM Network for Action Recognition. Applied Intelligence, 49, 2515-2521. [Google Scholar] [CrossRef]
[9]	Zhu, G., Zhang, L., Shen, P. and Shah, S.A.A. (2019) Continuous Gesture Segmentation and Recognition Using 3DCNN and Convolutional LSTM. IEEE Transactions on Multimedia, 21, 1011-1021. [Google Scholar] [CrossRef]
[10]	Song, H., Wang, W., Shen, J., Zhao, S. and Lam, K.M. (2018) Pyramid Dilated Deeper ConvLSTM for Video Salient Object Detection. The European Conference on Computer Vision (ECCV), Munich, 8-14 September 2018, 715-731. [Google Scholar] [CrossRef]
[11]	He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef]
[12]	Kim, S., Hong, S., Joh, M. and Song. S. (2017) DeepRain: ConvLSTM Network for Precipitation Prediction Using Multichannel Radar Data. Climate Informatics Workshop. arXiv:1711.02316 [cs.LG]
[13]	Shahroudy, A., Liu, J., Ng, T. and Wang, G. (2016) NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 1010-1019. [Google Scholar] [CrossRef]
[14]	Fawcett, T. (2006) An Introduction to ROC Anal-ysis. Pattern Recognition Letters, 27, 861-874. [Google Scholar] [CrossRef]

为你推荐

友情链接