基于融合Capsule-Transformer的人体活动识别模型
Human Activity Recognition Model Based on the Fusion of Capsule-Transformer
DOI: 10.12677/mos.2025.145466, PDF,   
作者: 王 星, 李瑞祥*, 施伟斌:上海理工大学光电信息与计算机工程学院,上海
关键词: 人体活动识别可穿戴传感器Transformer胶囊网络Human Activity Recognition Wearable Sensor Transformer Capsule Network
摘要: 近年来,基于可穿戴传感器的人体活动识别在智能健康监护、人机交互等场景展现出重要应用价值。传统的深度学习算法,如卷积神经网络(CNNs)和循环神经网络(RNNs),虽然取得了一定成效,但在捕捉复杂人体活动的时序动态和空间关系方面仍存在不足。为了解决这些局限性,文章利用Transformer模型在捕获全局特征方面的优势和胶囊网络在捕获局部特征方面的优势,提出了一种结合Transformer模型与胶囊网络的新型混合架构。在两个公共数据集(UCI-HAR和WISDM)上对模型性能进行了评估。最后,该模型在UCI-HAR数据集中的总体准确率为96.0%,在WISDM数据集中的总体准确率为96.5%。实验结果表明,基于Transformer和胶囊网络的融合模型比仅基于Transformer和仅基于胶囊网络的模型表现出更好的性能。而且,该算法的性能优于近期同类研究中的其他传统深度学习算法。
Abstract: In recent years, wearable sensor-based Human Activity Recognition (HAR) has demonstrated significant application value in intelligent health monitoring and human-computer interaction scenarios. Although traditional deep learning algorithms, such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), have achieved certain results, they still exhibit limitations in capturing the temporal dynamics and spatial relationships of complex human activities. To address these limitations, this paper proposes a novel hybrid architecture combining Transformer models and Capsule Networks, leveraging the Transformer’s advantages in capturing global features and Capsule Networks’ strengths in extracting local features. The model performance was evaluated on two public datasets (UCI-HAR and WISDM), achieving overall accuracies of 96.0% on the UCI-HAR dataset and 96.5% on the WISDM dataset. Experimental results demonstrate that the Transformer-Capsule hybrid model outperforms both standalone Transformer-based and Capsule Network-based models. Furthermore, the proposed algorithm exhibits superior performance compared to other traditional deep learning approaches in recent related studies.
文章引用:王星, 李瑞祥, 施伟斌. 基于融合Capsule-Transformer的人体活动识别模型[J]. 建模与仿真, 2025, 14(5): 1161-1175. https://doi.org/10.12677/mos.2025.145466

参考文献

[1] 陈金瑶, 李瑞祥, 王星, 等. 基于DWT-VMD混合信号分解技术的人体活动识别[J]. 数据采集与处理, 2024, 39(3): 736-749.
[2] Cornacchia, M., Ozcan, K., Zheng, Y. and Velipasalar, S. (2017) A Survey on Activity Detection and Classification Using Wearable Sensors. IEEE Sensors Journal, 17, 386-403. [Google Scholar] [CrossRef
[3] Yatani, K. and Truong, K.N. (2012) BodyScope: A Wearable Acoustic Sensor for Activity Recognition. Proceedings of the 2012 ACM Conference on Ubiquitous Computing, Pittsburgh, 5-8 September 2012, 341-350. [Google Scholar] [CrossRef
[4] Cagliyan, B., Karabacak, C. and Gurbuz, S.Z. (2014) Human Activity Recognition Using a Low Cost, COTS Radar Network. 2014 IEEE Radar Conference, Cincinnati, 19-23 May 2014, 1223-1228. [Google Scholar] [CrossRef
[5] Yang, X. and Tian, Y. (2017) Super Normal Vector for Human Activity Recognition with Depth Cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1028-1039. [Google Scholar] [CrossRef] [PubMed]
[6] Liu, J., Shahroudy, A., Xu, D., Kot, A.C. and Wang, G. (2018) Skeleton-Based Action Recognition Using Spatio-Temporal LSTM Network with Trust Gates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 3007-3021. [Google Scholar] [CrossRef] [PubMed]
[7] Kitani, K.M., Okabe, T., Sato, Y. and Sugimoto, A. (2011) Fast Unsupervised Ego-Action Learning for First-Person Sports Videos. CVPR 2011, Colorado, 20-25 June 2011, 3241-3248. [Google Scholar] [CrossRef
[8] Amer, M.R. and Todorovic, S. (2016) Sum Product Networks for Activity Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, 800-813. [Google Scholar] [CrossRef] [PubMed]
[9] Lin, W., Xing, S., Nan, J., Wenyuan, L. and Binbin, L. (2019) Concurrent Recognition of Cross-Scale Activities via Sensorless Sensing. IEEE Sensors Journal, 19, 658-669. [Google Scholar] [CrossRef
[10] Lopez-Nava, I.H. and Munoz-Melendez, A. (2016) Wearable Inertial Sensors for Human Motion Analysis: A Review. IEEE Sensors Journal, 16, 7821-7834. [Google Scholar] [CrossRef
[11] Chen, L., Hoey, J., Nugent, C.D., Cook, D.J. and Yu, Z. (2012) Sensor-Based Activity Recognition. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42, 790-808. [Google Scholar] [CrossRef
[12] Margarito, J., Helaoui, R. and Bianchi, A.M. (2016) User-Independent Recognition of Sports Activities from a Single Wrist-Worn Accelerometer: A Template-Matching-Based Approach. IEEE Transactions on Biomedical Engineering, 63, 788-796. [Google Scholar] [CrossRef
[13] Zhu, C. and Sheng, W. (2011) Wearable Sensor-Based Hand Gesture and Daily Activity Recognition for Robot-Assisted Living. IEEE Transactions on Systems, Man, and Cybernetics—Part A: Systems and Humans, 41, 569-573. [Google Scholar] [CrossRef
[14] Orbach, J. (1962) Principles of Neurodynamics. Perceptrons and the Theory of Brain Mechanisms. Archives of General Psychiatry, 7, 218-219. [Google Scholar] [CrossRef
[15] Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2012) ImageNet Classification with Deep Convolutional Neural Networks. Communications of the ACM, 60, 84-90. [Google Scholar] [CrossRef
[16] Hochreiter, S. and Schmidhuber, J. (1997) Long Short-Term Memory. Neural Computation, 9, 1735-1780. [Google Scholar] [CrossRef] [PubMed]
[17] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., et al. (2017) Attention Is All You Need. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 6000-6010.
[18] Hinton, G.E., Krizhevsky, A. and Wang, S.D. (2011) Transforming Auto-Encoders. Artificial Neural Networks and Machine LearningICANN 2011, Espoo, 14-17 June 2011, 44-51. [Google Scholar] [CrossRef
[19] Kwabena Patrick, M., Felix Adekoya, A., Abra Mighty, A. and Edward, B.Y. (2022) Capsule Networks—A Survey. Journal of King Saud University-Computer and Information Sciences, 34, 1295-1310. [Google Scholar] [CrossRef
[20] Sabour, S., Frosst, N. and Hinton, G.E. (2017) Dynamic Routing between Capsules. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 3859-3869.
[21] Liu, G., Reda, F.A., Shih, K.J., Wang, T., Tao, A. and Catanzaro, B. (2018) Image Inpainting for Irregular Holes Using Partial Convolutions. Computer VisionECCV 2018, Munich, 8-14 September 2018, 89-105. [Google Scholar] [CrossRef
[22] Iqbal, T., Xu, Y., Kong, Q. and Wang, W. (2018) Capsule Routing for Sound Event Detection. 2018 26th European Signal Processing Conference (EUSIPCO), Rome, 3-7 September 2018, 2255-2259. [Google Scholar] [CrossRef
[23] Dauphin, Y.N., Fan, A., Auli, M. and Grangier, D. (2017) Language Modeling with Gated Convolutional Networks. Proceedings of the 34th International Conference on Machine Learning (ICML), Sydney, 6-11 August 2017, 933-941.
[24] Reyes-Ortiz, J., Oneto, L., Samà, A., Parra, X. and Anguita, D. (2016) Transition-Aware Human Activity Recognition Using Smartphones. Neurocomputing, 171, 754-767. [Google Scholar] [CrossRef
[25] Kwapisz, J.R., Weiss, G.M. and Moore, S.A. (2011) Activity Recognition Using Cell Phone Accelerometers. ACM SIGKDD Explorations Newsletter, 12, 74-82. [Google Scholar] [CrossRef
[26] Anguita, D., Ghio, A., Oneto, L., Parra, X. and Reyes-Ortiz, J.L. (2013) A Public Domain Dataset for Human Activity Recognition Using Smartphones. ESANN 2013 Proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges, 24-26 April 2013, 437-442.
[27] Yang, J., Nguyen, M.N., Li, X.L. and San, P.P. (2015) Deep Convolutional Neural Networks on Multichannel Time Series for Human Activity Recognition. Proceedings of the 24th International Conference on Artificial Intelligence, Buenos, 25-31 July 2015, 3995-4001.
[28] Ordóñez, F. and Roggen, D. (2016) Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition. Sensors, 16, Article 115. [Google Scholar] [CrossRef] [PubMed]