基于LSTM多特征融合人体动作识别算法
Human Action Recognition Algorithm Based on LSTM with Multi-Feature Fusion
摘要: 为解决现有人体动作识别方法依赖单一模态数据、复杂场景下识别精度不足的问题,提出一种基于LSTM的多特征融合人体动作识别算法。首先,通过Kinect深度相机获取人体20个关键关节点的3D坐标信息,提取关节点的位置、运动速度及加速度等多维度特征;其次,通过关键帧定位与降采样处理优化特征数据,降低计算复杂度;最后,将整合后的多模态特征输入LSTM模型,利用其时空建模能力实现动作分类。为验证算法性能,在Kinetics-Skeleton、NTU-RGB (X-Sub)及NTU-RGB + D (X-View)三个公开数据集上进行实验,结果表明:算法对前20类动作的识别准确率均达到98%以上,其中NTU-RGB + D (X-View)数据集上部分动作类别准确率超99%;混淆矩阵分析显示模型分类一致性良好,具备较强的抗干扰能力与场景适应性。该算法通过多特征融合策略充分挖掘人体动作的时空关联信息,为复杂环境下的高精度动作识别提供了有效解决方案。
Abstract: To address the issues of existing human action recognition methods relying on single-modal data and insufficient recognition accuracy in complex scenarios, a human action recognition algorithm based on LSTM with multi-feature fusion is proposed. Firstly, 3D coordinate information of 20 key human joint points is acquired using a Kinect depth camera, and multi-dimensional features such as the position, motion velocity, and acceleration of the joint points are extracted. Secondly, key frame localization and downsampling are applied to optimize the feature data and reduce computational complexity. Finally, the integrated multi-modal features are input into the LSTM model, which leverages its spatiotemporal modeling capability to achieve action classification. To verify the algorithm’s performance, experiments are conducted on three public datasets: Kinetics-Skeleton, NTU-RGB (X-Sub), and NTU-RGB + D (X-View). The results demonstrate that the algorithm achieves an accuracy of over 98% for the top 20 action categories on all three datasets, with some categories exceeding 99% accuracy on the NTU-RGB + D (X-View) dataset. Confusion matrix analysis shows that the model exhibits good classification consistency, along with strong anti-interference ability and scene adaptability. By fully exploring the spatiotemporal correlation information of human actions through a multi-feature fusion strategy, the algorithm provides an effective solution for high-precision action recognition in complex environments.
文章引用:隋龙飞, 薛欢欢. 基于LSTM多特征融合人体动作识别算法[J]. 计算机科学与应用, 2026, 16(3): 171-176. https://doi.org/10.12677/csa.2026.163096

参考文献

[1] 王杨, 许佳炜, 王傲, 等. 基于CSI实例标准化的域泛化人体动作识别模型[J]. 通信学报, 2024, 45(6): 196-209.
[2] Yoshikawa, Y., Shigeto, Y., Shimbo, M. and Takeuchi, A. (2023) Action Class Relation Detection and Classification across Multiple Video Datasets. Pattern Recognition Letters, 173, 93-100. [Google Scholar] [CrossRef
[3] 叶典, 邱卫根, 张立臣, 等. 基于2S-LSGCN的人体动作识别[J]. 计算机工程与设计, 2022, 43(2): 510-516.
[4] 陈路飞, 张勇, 唐永正, 等. FP-Net: 基于任意角度单幅人体图像的正面姿态估计[J]. 计算机辅助设计与图形学学报, 2022, 34(10): 1604-1612.
[5] Chen, Y. and Cheng, K. (2024) BICLR: Radar-Camera-Based Cross-Modal Bi-Contrastive Learning for Human Motion Recognition. IEEE Sensors Journal, 24, 4102-4119. [Google Scholar] [CrossRef
[6] 徐晓冰, 左涛涛, 孙百顺, 等. 基于热释电红外传感器的人体动作识别方法[J]. 红外与激光工程, 2022, 51(4): 391-398.
[7] 王琳玮, 邵星灵, 杨卫. 基于惯性传感器的球形机器人位姿控制系统及实验研究[J]. 中国测试, 2020, 46(3): 123-127.
[8] 李光, 刘丕亮, 张雪松. 基于骨架平衡的3D人体异常行为识别方法仿真[J]. 计算机仿真, 2024, 41(2): 492-495.
[9] 孙浩, 何宏, 汪焰兵, 等. 基于运动特征的骨骼行为识别方法[J]. 计算机工程与设计, 2024, 45(6): 1836-1842.
[10] 余金锁, 卢先领. 基于分割注意力的特征融合CNN-Bi-LSTM人体行为识别算法[J]. 电子测量与仪器学报, 2022, 36(2): 89-95.
[11] 马亚彤, 王松, 刘英芳. 融合多模态数据的人体动作识别方法研究[J]. 计算机工程, 2022, 48(9): 180-188.
[12] 李元祥, 谢林柏. 结合RGB-D视频和卷积神经网络的行为识别算法[J]. 计算机与数字工程, 2020, 48(12): 3052-3058.
[13] Ahmad, Z. and Khan, N. (2020) Human Action Recognition Using Deep Multilevel Multimodal (M2 ) Fusion of Depth and Inertial Sensors. IEEE Sensors Journal, 20, 1445-1455. [Google Scholar] [CrossRef