基于图注意力的单目3D人体姿态估计
Graph Attention Based Monocular 3D Human Pose Estimation
DOI: 10.12677/AIRR.2023.122017, PDF,   
作者: 朱志玮:华中师范大学,物理科学与技术学院,湖北 武汉
关键词: 深度学习图卷积时序卷积三维人体姿态估计Deep Learning Graph Convolution Temporal Convolution 3D Human Pose Estimation
摘要: 人体姿态估计是计算机视觉领域的重要研究方向,如何抑制复杂背景、光照变化和遮挡等因素干扰,提高3D人体姿态的准确性和鲁棒性目前仍然是一个很大挑战。本文提出了一种基于深度学习的三维人体姿态估计算法,该算法充分利用人体骨骼节点的连接关系和对称关系构建了一种图注意力时间卷积网络,该网络可以充分利用单目视频中的时空信息,解读人体姿态随时间的变化。实验表明该算法在Human3.6M数据集上比传统方法预测准确率提高了约14.9%。
Abstract: Human pose estimation is an important research area in computer vision. It remains a big challenge to perform 3D human pose estimation with high accuracy and robustness under the interference of such factors as complex background, lighting changes and occlusion. The paper proposed a deep learning-based 3D human pose estimation algorithm featuring a graph attention temporal convolutional network, which is constructed utilizing the connectivity and symmetry of human skeletal nodes. The network is capable to make full use of spatiotemporal information in monocular videos to interpret changes in human pose over time. Experiments show that the proposed algorithm improves prediction accuracy by about 14.9% compared to traditional methods on the Human3.6M dataset.
文章引用:朱志玮. 基于图注意力的单目3D人体姿态估计[J]. 人工智能与机器人研究, 2023, 12(2): 143-153. https://doi.org/10.12677/AIRR.2023.122017

参考文献

[1] Moeslund, T. and Hilton, A. (2006) A Survey of Advances in Vision-Based Human Motion Capture and Analysis. Computer Vision and Image Understanding, 104, 90-126. [Google Scholar] [CrossRef
[2] Newell, A., Yang, K. and Deng, J. (2016) Stacked Hourglass Networks for Human Pose Estimation. In: Leibe, B., Matas, J., Sebe, N. and Welling, M., eds, Computer Vision—ECCV 2016, 483-499. [Google Scholar] [CrossRef
[3] Chen, C.-H., Tyagi, A., Agrawal, A., et al. (2020) Unsupervised 3D Pose Estimation with Geometric Self-Supervision. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15-20 June 2019. [Google Scholar] [CrossRef
[4] Zhou, X., Huang, Q., Sun, X., et al. (2017) Towards 3D Human Pose Estimation in the Wild: A Weakly-Supervised Approach. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22-29 October 2017. [Google Scholar] [CrossRef
[5] Chen, C.-H. and Ramanan, D. (2017) 3D Human Pose Estimation = 2D Pose Estimation + Matching. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21-26 July 2017. [Google Scholar] [CrossRef
[6] Hossain, M.R.I., Little, J.J. (2018) Exploiting Temporal Information for 3D Human Pose Estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., eds, Computer Vision—ECCV 2018, Springer, Cham, 69-86. [Google Scholar] [CrossRef
[7] Pavllo, D., Feichtenhofer, C. and Grangier, D. (2020) 3D Human Pose Estimation in Video with Temporal Convolutions and Semi-Supervised Training. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15-20 June 2019. [Google Scholar] [CrossRef
[8] Wu, Z., Pan, S., Chen, F., et al. (2020) A Comprehensive Survey on Graph Neural Networks. IEEE Transactions on Neural Networks and Learning Systems, 32, 4-24.
[9] Ci, H., Wang, C., Ma, X., et al. (2020) Optimizing Network Structure for 3D Human Pose Estimation. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 27 October - 2 November 2019. [Google Scholar] [CrossRef
[10] Zhao, L., Peng, X, Tian, Y., et al. (2020) Semantic Graph Convolutional Networks for 3D Human Pose Regression. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15-20 June 2019. [Google Scholar] [CrossRef
[11] Wang, J., Tan, S., Zhen, X., et al. (2021) Deep 3D Human Pose Estimation: A Review. Computer Vision and Image Understanding, 210, Article ID: 103225. [Google Scholar] [CrossRef
[12] Ionescu, C., Papava, D., Olaru, V., et al. (2014) Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36, 1325-1339. [Google Scholar] [CrossRef
[13] de La Gorce, M., Fleet, D.J. and Paragios, N. (2011) Model-Based 3D Hand Pose Estimation from Monocular Video. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33, 1793-1805. [Google Scholar] [CrossRef
[14] Carreira, J., Agrawal, P., Fragkiadaki, K. and Malik, J. (2016) Human Pose Estimation with Iterative Error Feedback. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27-30 June 2016, 4733-4742. [Google Scholar] [CrossRef