基于DirtNet与惯性测量单元的人体姿态估计

doi:10.12677/csa.2024.143061

期刊菜单

基于DirtNet与惯性测量单元的人体姿态估计
Human Pose Estimation Based on DirtNet and Inertial Measurement Units

DOI: 10.12677/csa.2024.143061, PDF,
作者: 罗胜, 张元正^*, 叶润泽, 朱锦乐, 张博文：温州大学计算机与人工智能学院，浙江温州
关键词: 人体姿态估计；惯性测量单元；SMPL；骨架模型；实时；DirtNet；Human Pose Estimation； Inertial Measurement Units； SMPL； Skeleton Model； Real-Time； DirtNet

摘要: 仅使用少量的惯性测量单元(IMU, Inertial Measurement Unit)进行人体姿态估计是一种非侵入性且经济的人体姿态估计方法，该方法主要面临的挑战是从带有噪声的IMU信号中精确估计人体姿态。为此，对人体姿态估计问题提出了一种仅使用6个IMU精确估计人体姿态的方法。1) 提出了一种双重信息保留注意力Transformer网络(DirtNet, Dual information retention transformer Network)，它能够有效保留历史信息并通过注意整个序列的信息来获得更好的结果。2) 通过对加速度进行积分了获得了近似变化速度，并将其作为额外的输入通道以提高了人体姿态估计的精确度。3) 使用均匀滤波过滤和白噪声模拟的方法对合成的加速度进行了数据增强，以此来拟合真实的IMU数据并得到更好的训练结果。与之前的研究相比，改进后的方法有效提高了姿态估计的精确度。

Abstract: Using a small number of inertial measurement units (IMUs) for human pose estimation is a non-intrusive and cost-effective method. However, accurately estimating human pose from noisy IMU signals poses a significant challenge. To address this challenge, a method that utilizes only six IMUs for precise human pose estimation is proposed. 1) A dual information retention attention Transformer network, called DirtNet, is introduced. This network effectively preserves historical information and leverages attention over the entire sequence to achieve better results. 2) The approximate velocity is obtained by integrating the acceleration, and it is used as an additional input channel to improve the accuracy of human pose estimation. 3) A data augmentation technique is applied by filtering the synthesized acceleration using uniform filtering and simulating white noise. This approach helps to fit the real IMU data and achieve better training results. Compared to previous research, the improved method significantly enhances the accuracy of pose estimation. By combining the strengths of DirtNet, leveraging historical information, incorporating velocity as an input, and applying data augmentation techniques, this method provides more precise human pose estimation results.

文章引用：罗胜, 张元正, 叶润泽, 朱锦乐, 张博文. 基于DirtNet与惯性测量单元的人体姿态估计[J]. 计算机科学与应用, 2024, 14(3): 96-107. https://doi.org/10.12677/csa.2024.143061

参考文献

[1]	Cao, Z., Simon, T., Wei, S.E. and Sheikh, Y. (2017) Real-Time Multi-Person 2D Pose Estimation Using Part Affinity Fields. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 1302-1310. [Google Scholar] [CrossRef]
[2]	Güler, R.A., Neverova, N. and Kokkinos, I. (2018) Densepose: Dense Human Pose Estimation in the Wild. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 7297-7306. [Google Scholar] [CrossRef]
[3]	Wei, X.L., Zhang, P.Z. and Chai, J.X. (2012) Accurate Realtime Full-Body Motion Capture Using a Single Depth Camera. ACM Transactions on Graphics, 31, 1-12. [Google Scholar] [CrossRef]
[4]	Huang, Y.H., et al. (2018) Deep Inertial Poser: Learning to Reconstruct Human Pose from Sparse Inertial Measurements in Real Time. ACM Transactions on Graphics, 37, 1-15. [Google Scholar] [CrossRef]
[5]	Yi, X.Y., Zhou, Y.X. and Xu, F. (2021) Transpose: Real-Time 3D Human Translation and Pose Estimation with Six Inertial Sensors. ACM Transactions on Graphics, 40, 1-13. [Google Scholar] [CrossRef]
[6]	Lei, T., et al. (2017) Simple Recurrent Units for Highly Parallelizable Recurrence. arXiv: 1709.02755.
[7]	Xia, D., Zhu, Y.Q. and Zhang, H. (2022) Faster Deep Inertial Pose Estimation with Six Inertial Sensors. Sensors, 22, Article 7144. [Google Scholar] [CrossRef] [PubMed]
[8]	Liu, Z.G., et al. (2021) Deep Dual Consecutive Network for Human Pose Estimation. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 525-534. [Google Scholar] [CrossRef]
[9]	Pavllo, D., Feichtenhofer, C., Grangier, D. and Auli, M. (2019) 3D Human Pose Estimation in Video with Temporal Convolutions and Semi-Supervised Training. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 7745-7754. [Google Scholar] [CrossRef]
[10]	Tome, D., Peluse, P., Agapito, L. and Badino, H. (2019) xR-Egopose: Egocentric 3D Human Pose from an HMD Camera. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 7727-7737. [Google Scholar] [CrossRef]
[11]	Nguyen, H.C., et al. (2022) Unified End-to-End YOLOv5-HR-TCM Framework for Automatic 2D/3D Human Pose Estimation for Real-Time Applications. Sensors, 22, Article 5419. [Google Scholar] [CrossRef] [PubMed]
[12]	Von Marcard, T., et al. (2018) Recovering Accurate 3d Human Pose in the Wild Using IMUs and a Moving Camera. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., Computer Vision—ECCV 2018, Springer, Cham, 614-631. [Google Scholar] [CrossRef]
[13]	Zhang, Z., Wang, C.Y., Qin, W.H. and Zeng, W.J. (2020) Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A Geometric Approach. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 2197-2206. [Google Scholar] [CrossRef]
[14]	Gilbert, A., et al. (2019) Fusing Visual and Inertial Sensors with Semantics for 3D Human Pose Estimation. International Journal of Computer Vision, 127, 381-397. [Google Scholar] [CrossRef]
[15]	Schepers, M., Giuberti, M. and Bellusci, G. (2018) Xsens MVN: Consistent Tracking of Human Motion Using Inertial Sensing. Xsens Technologies, 1, 1-8.
[16]	Loper, M., et al. (2023) Smpl: A Skinned Multi-Person Linear Model. Seminal Graphics Papers: Pushing the Boundaries, 2, 851-866. [Google Scholar] [CrossRef]
[17]	von Marcard, T., Rosenhahn, B., Black, M.J., Pons-Moll, G., et al. (2017) Sparse Inertial Poser: Automatic 3D Human Pose Estimation from Sparse IMUs. Computer Graphics Forum, 36, 349-360. [Google Scholar] [CrossRef]
[18]	Vaswani, A., et al. (2017) Attention Is All You Need. arXiv: 1706.03762.
[19]	Sun, Y.T., et al. (2023) Retentive Network: A Successor to Transformer for Large Language Models. arXiv: 2307.08621.
[20]	Trumble, M., et al. (2017) Total Capture: 3D Human Pose Estimation Fusing Video and Inertial Sensors. Proceedings of 28th British Machine Vision Conference, London, 4-7 September 2017, 1-13.
[21]	Mahmood, N., et al. (2019) Amass: Archive of Motion Capture as Surface Shapes. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 5441-5450. [Google Scholar] [CrossRef]
[22]	Kingma, D.P. and Ba, J. (2014) Adam: A Method for Stochastic Optimization. arXiv: 1412.6980.

为你推荐

友情链接