SA-C3D神经网络在动作识别上的应用

doi:10.12677/SEA.2022.116161

期刊菜单

SA-C3D神经网络在动作识别上的应用
Application of SA-C3D Neural Network in Action Recognition

DOI: 10.12677/SEA.2022.116161, PDF,
作者: 张宏博^*, 陈胜：上海理工大学光电信息与计算机工程学院，上海
关键词: C3D；3维卷积神经网络；自注意力；Non-Local；动作识别；C3D； 3-Dimensional Convolutional Neural Networks； Self-Attention； Non-Local； Action Recognition

摘要: 本文的主要目的是利用自注意力机制加强C3D网络在动作识别方面的准确率。C3D神经网络作为比较早提出的模型，在视频动作识别领域中有着重要的地位。随着各项研究的进展，C3D网络已经渐渐过时，识别准确率也较低。所以本文主要以C3D网络为基础，结合目前的自注意力机制，在C3D网络中集成了Non-Local模块，同时将固定学习率衰减替换为余弦退火学习率衰减，提高模型跳出局部最优解的能力。利用3D卷积提取动作视频的局部特征，再使用自注意力机制捕捉人体动作的全局信息，开发出新的SA-C3D网络。在没有预训练的前提下，对UCF-101数据集进行训练，识别准确率较之前的C3D网络以及一系列优秀的动作识别模型有了较大的提高，识别准确率高达95%。

Abstract: The main objective of this paper is to enhance the accuracy of C3D networks for action recognition using a self-attentive mechanism. C3D neural networks, as a relatively early proposed model, have an important place in the field of video action recognition. With the progress of various researches, C3D networks have gradually become obsolete and the recognition accuracy is low. Therefore, this paper focuses on the C3D network as the basis, combining the current self-attentive mechanism, integrating the Non-Local module in the C3D network, while replacing the fixed learning rate decay with the cosine annealing learning rate decay to improve the ability of the model to jump out of the local optimal solution. The new SA-C3D network is developed by using 3D convolution to extract local features of action videos, and then using a self-attentive mechanism to capture global information of human actions. Trained on the UCF-101 dataset without pre-training, the recognition accuracy has improved significantly over the previous C3D network and a series of excellent action recognition models, with recognition accuracy as high as 95%.

文章引用：张宏博, 陈胜. SA-C3D神经网络在动作识别上的应用[J]. 软件工程与应用, 2022, 11(6): 1561-1569. https://doi.org/10.12677/SEA.2022.116161

参考文献

[1]	Wang, C., Liu, M. and Qi, F. (2018) Summary of Dynamic Target Detection and Recognition Algorithm in Intelligent Video Surveillance System. Electrical Engineering.
[2]	李坤坤, 刘正熙, 熊运余. 基于深度学习的目标检测系统性文献综述[J]. 现代计算机, 2021(16): 98-102, 117.
[3]	Zhang, S., Wei, Z., Nie, J., et al. (2017) A Review on Human Activity Recognition Using Vision-Based Method. Journal of Healthcare Engineering, No. 3, 1-31. [Google Scholar] [CrossRef] [PubMed]
[4]	钱闻卓. 基于MA-C3D神经网络的人体动作识别技术[J]. 现代计算机, 2021, 27(35): 70-74+94.
[5]	孙毅, 成金勇, 禹继国. 基于C3D模型的视频分类技术[J]. 曲阜师范大学学报(自然科学版), 2020, 46(3): 85-89.
[6]	Tran, D., Bourdev, L., Fergus, R., et al. (2015) Learning Spatiotemporal Features with 3d Convolutional Networks. Proceedings of the IEEE international Conference on Computer Vision, Santiago, 11-18 December 2015, 4489-4497. [Google Scholar] [CrossRef]
[7]	Vaswani, A., Shazeer, N., Parmar, N., et al. (2017) Attention Is All You Need. 31st Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 5998-6008.
[8]	Deng, J., et al. (2009) Imagenet: A Large-Scale Hierarchical Image Database. IEEE Conference on Computer Vision and Pattern Recognition, Miami, 20-25 June 2009, 248-255. [Google Scholar] [CrossRef]
[9]	Krizhevsky, A. and Hinton, G. (2009) Learning Multiple Layers of Features from Tiny Images.
[10]	Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P. and Zitnick, C.L. (2014) Microsoft COCO: Common Objects in Context. 13th European Conference, Zurich, 6-12 September 2014, 740-755. [Google Scholar] [CrossRef]
[11]	Wang, X., Girshick, R., Gupta, A., et al. (2018) Non-Local Neural Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-22 June 2018, 7794-7803. [Google Scholar] [CrossRef]
[12]	Loshchilov, I. and Hutter, F. (2016) Sgdr: Stochastic Gradient Descent with Warm Restarts.
[13]	Hara, K., Kensho, H. and Satoh, Y. (2017) Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition. 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), Venice, 22-29 October 2017, 1109-1115. [Google Scholar] [CrossRef]
[14]	Abdel-Aty, H., Zagrosek, A., Schulz-Menger, J., et al. (2004) Delayed Enhancement and T2-Weighted Cardiovascular Magnetic Resonance Imaging Differentiate Acute from Chronic Myocardial Infarction. Circulation, 109, 2411-2416. [Google Scholar] [CrossRef]
[15]	Smulders, M.W., Bekkers, S.C.A.M., Kim, H.W., et al. (2015) Performance of CMR Methods for Differentiating Acute from Chronic MI. JACC: Cardiovascular Imaging, 8, 669-679. [Google Scholar] [CrossRef] [PubMed]

为你推荐

友情链接