基于注意力机制与双向LSTM的行为识别

doi:10.12677/CSA.2021.116166

期刊菜单

基于注意力机制与双向LSTM的行为识别
Action Recognition Based on Attention and Bi-LSTM

DOI: 10.12677/CSA.2021.116166, PDF,
作者: 张玉铭, 吴克伟, 金依珂, 周龙辉：合肥工业大学计算机与信息学院，安徽合肥
关键词: 行为识别；光流；运动特征；长依赖问题；时序上下文信息；Action Recognition； Optical Flow； Motion Features； Long-Dependency Problems； Temporal Context Information

摘要: 采用光流作为运动特征进行行为识别需要预先计算并存储光流，需要巨大的计算成本和存储资源，并且由于光流特征主要表征了相邻帧之间的运动特征，导致行为识别中存在长依赖问题。针对这些问题，本文提出了一种新的运动特征建模方式以取代光流特征，并且提出了一种长依赖时序运动建模模块。实验结果表明，本文提出的方法在增加极低的计算成本的情况下，能更好的对远距离图像帧间的时序上下文信息建模，显著提高行为识别的准确度。

Abstract: Using optical flow as motion features for action recognition requires pre-computation and storage of optical flow, which requires huge computational cost and storage resources. And optical flow features mainly characterize the motion features between adjacent frames, which leads to long-dependency problems in action recognition. To address these problems, this paper proposes a new way of modeling motion features to replace optical flow features and proposes a long-dependency temporal motion modeling module. Experimental results show that the proposed method in this paper can better model the temporal context information between long-range frames and significantly improve the accuracy of action recognition with very low increase in computational cost.

文章引用：张玉铭, 吴克伟, 金依珂, 周龙辉. 基于注意力机制与双向LSTM的行为识别[J]. 计算机科学与应用, 2021, 11(6): 1607-1616. https://doi.org/10.12677/CSA.2021.116166

参考文献

[1]	Wang, H., et al. (2011) Action Recognition by Dense Trajectories. The 24th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011, Colorado Springs, 20-25 June 2011, 3169-3176. [Google Scholar] [CrossRef]
[2]	Simonyan, K. and Zisserman, A. (2014) Two-Stream Convolu-tional Networks for Action Recognition in Videos. 28th Annual Conference on Neural Information Processing Systems (NIPS 2014), Montreal, 8-13 December 2014, 568-576.
[3]	Wang, L., et al. (2016) Temporal Segment Networks: To-wards Good Practices for Deep Action Recognition.
[4]	Zhou, B., et al. (2018) Temporal Relational Reasoning in Videos.
[5]	Tran, D., et al. (2015) Learning Spatiotemporal Features with 3D Convolutional Networks. 2015 IEEE In-ternational Conference on Computer Vision (ICCV), Santiago, 7-13 December 2015, 4489-4497. [Google Scholar] [CrossRef]
[6]	Carreira, J. and Zisserman, A. (2017) Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 4724-4733. [Google Scholar] [CrossRef]
[7]	Qiu, Z.F., et al. (2017) Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 5534-5542. [Google Scholar] [CrossRef]
[8]	Danelljan, M., et al. (2017) ECO: Efficient Convolution Operators for Tracking. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 6931-6939. [Google Scholar] [CrossRef]
[9]	Feichtenhofer, C. (2020) X3D: Expanding Architectures for Efficient Video Recognition. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 14-19 June 2020, 200-210. [Google Scholar] [CrossRef]
[10]	Wu, C.-Y., et al. (2018) Compressed Video Action Recog-nition. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 6026-6035. [Google Scholar] [CrossRef]
[11]	Zhu, Y., et al. (2018) Hidden Two-Stream Convolu-tional Networks for Action Recognition. 14th Asian Conference on Computer Vision, Perth, 2-6 December 2018, 363-378.
[12]	Lin, J., et al. (2019) TSM: Temporal Shift Module for Efficient Video Understanding. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27-28 October 2019, 7082-7092. [Google Scholar] [CrossRef]
[13]	Jiang, B.Y., et al. (2019) STM: SpatioTemporal and Motion En-coding for Action Recognition. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27-28 October 2019, 2000-2009. [Google Scholar] [CrossRef]
[14]	Li, Y., et al. (2020) TEA: Temporal Excitation and Aggregation for Action Recognition. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 14-19 June 2020, 906-915. [Google Scholar] [CrossRef]
[15]	Tran, D., et al. (2018) A Closer Look at Spatiotemporal Convolutions for Action Recognition. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 6450-6459. [Google Scholar] [CrossRef]

为你推荐

友情链接