基于注意力机制与双向LSTM的行为识别
Action Recognition Based on Attention and Bi-LSTM
摘要: 采用光流作为运动特征进行行为识别需要预先计算并存储光流,需要巨大的计算成本和存储资源,并且由于光流特征主要表征了相邻帧之间的运动特征,导致行为识别中存在长依赖问题。针对这些问题,本文提出了一种新的运动特征建模方式以取代光流特征,并且提出了一种长依赖时序运动建模模块。实验结果表明,本文提出的方法在增加极低的计算成本的情况下,能更好的对远距离图像帧间的时序上下文信息建模,显著提高行为识别的准确度。
Abstract: Using optical flow as motion features for action recognition requires pre-computation and storage of optical flow, which requires huge computational cost and storage resources. And optical flow features mainly characterize the motion features between adjacent frames, which leads to long-dependency problems in action recognition. To address these problems, this paper proposes a new way of modeling motion features to replace optical flow features and proposes a long-dependency temporal motion modeling module. Experimental results show that the proposed method in this paper can better model the temporal context information between long-range frames and significantly improve the accuracy of action recognition with very low increase in computational cost.
文章引用:张玉铭, 吴克伟, 金依珂, 周龙辉. 基于注意力机制与双向LSTM的行为识别[J]. 计算机科学与应用, 2021, 11(6): 1607-1616. https://doi.org/10.12677/CSA.2021.116166

参考文献

[1] Wang, H., et al. (2011) Action Recognition by Dense Trajectories. The 24th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011, Colorado Springs, 20-25 June 2011, 3169-3176. [Google Scholar] [CrossRef
[2] Simonyan, K. and Zisserman, A. (2014) Two-Stream Convolu-tional Networks for Action Recognition in Videos. 28th Annual Conference on Neural Information Processing Systems (NIPS 2014), Montreal, 8-13 December 2014, 568-576.
[3] Wang, L., et al. (2016) Temporal Segment Networks: To-wards Good Practices for Deep Action Recognition.
[4] Zhou, B., et al. (2018) Temporal Relational Reasoning in Videos.
[5] Tran, D., et al. (2015) Learning Spatiotemporal Features with 3D Convolutional Networks. 2015 IEEE In-ternational Conference on Computer Vision (ICCV), Santiago, 7-13 December 2015, 4489-4497. [Google Scholar] [CrossRef
[6] Carreira, J. and Zisserman, A. (2017) Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 4724-4733. [Google Scholar] [CrossRef
[7] Qiu, Z.F., et al. (2017) Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 5534-5542. [Google Scholar] [CrossRef
[8] Danelljan, M., et al. (2017) ECO: Efficient Convolution Operators for Tracking. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 6931-6939. [Google Scholar] [CrossRef
[9] Feichtenhofer, C. (2020) X3D: Expanding Architectures for Efficient Video Recognition. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 14-19 June 2020, 200-210. [Google Scholar] [CrossRef
[10] Wu, C.-Y., et al. (2018) Compressed Video Action Recog-nition. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 6026-6035. [Google Scholar] [CrossRef
[11] Zhu, Y., et al. (2018) Hidden Two-Stream Convolu-tional Networks for Action Recognition. 14th Asian Conference on Computer Vision, Perth, 2-6 December 2018, 363-378.
[12] Lin, J., et al. (2019) TSM: Temporal Shift Module for Efficient Video Understanding. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27-28 October 2019, 7082-7092. [Google Scholar] [CrossRef
[13] Jiang, B.Y., et al. (2019) STM: SpatioTemporal and Motion En-coding for Action Recognition. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27-28 October 2019, 2000-2009. [Google Scholar] [CrossRef
[14] Li, Y., et al. (2020) TEA: Temporal Excitation and Aggregation for Action Recognition. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 14-19 June 2020, 906-915. [Google Scholar] [CrossRef
[15] Tran, D., et al. (2018) A Closer Look at Spatiotemporal Convolutions for Action Recognition. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 6450-6459. [Google Scholar] [CrossRef