基于轻量级人体姿态估计和图卷积的摔倒实时检测方法

doi:10.12677/CSA.2021.114080

期刊菜单

基于轻量级人体姿态估计和图卷积的摔倒实时检测方法
Real-Time Fall Detection Based on Light-weight Human Pose Estimation and Graph Convolution Network

DOI: 10.12677/CSA.2021.114080, PDF, 被引量
作者: 何炜婷, 曾碧, 陈文轩：广东工业大学计算机学院，广东广州
关键词: 人体姿态估计；图卷积网络；轻量级；摔倒检测；动作识别；Human Pose Estimation； Graph Convolutional Network； Lightweight； Fall-Down Detection； Action Recognition

摘要: 基于人体姿态估计的摔倒检测方法，因其人体姿态估计模型涉及十几个关节点的识别与处理，导致整体模型的检测速度较慢。为了摔倒检测达到实时性，提出了一种基于轻量级人体姿态估计模型和图卷积的摔倒实时检测方法。该方法首先采用优化后的基于目标检测的两阶段轻量级人体姿态估计模型进行关节点检测，使整体模型达到轻量级；然后使用只有6个特征提取模块的时空图卷积网络对人体关节点序列进行摔倒检测，提高整体模型摔倒检测的准确率。本文通过NTU-D-RGB-120和UR Fall Detection Dataset两个数据集进行实验，摔倒检测的正确率达到96.1%，整体模型在GTX1060Ti显卡中达到约33FPS。

Abstract: The fall detection based on human pose estimation, because the human pose estimation involves the recognition and processing of more than a dozen joint points, the detection speed of the overall model is slow. In order to achieve real-time fall detection, a real-time fall detection method based on a lightweight human pose estimation and graph convolution network is proposed. The method first uses an optimized two-stage lightweight human pose estimation based on object detection to detect joint points, so that the overall model is lightweight; then uses the spatio-temporal graph convolutional network with only 6 feature extraction modules to perform fall detection on the human joint point sequence to improve the accuracy of the overall model fall detection. This article conducts experiments on two data sets, NTU-D-RGB-120 and UR Fall Detection Dataset, and the accuracy rate of fall detection reaches 96.1%, and the overall model reaches about 33FPS in the GTX1060Ti.

文章引用：何炜婷, 曾碧, 陈文轩. 基于轻量级人体姿态估计和图卷积的摔倒实时检测方法[J]. 计算机科学与应用, 2021, 11(4): 783-794. https://doi.org/10.12677/CSA.2021.114080

参考文献

[1]	Abbate, S., Avvenuti, M., Bonatesta, F., Cola, G., Corsini, P. and Vecchio, A. (2012) A Smartphone-Based Fall Detec-tion System. Pervasive & Mobile Computing, 8, 883-899. [Google Scholar] [CrossRef]
[2]	Feng, W., Liu, R. and Zhu, M. (2014) Fall Detection for Elderly Person Care in a Vision-Based Home Surveillance Environ-ment Using a Monocular Camera. Signal Image & Video Processing, 8, 1129-1138. [Google Scholar] [CrossRef]
[3]	Alhimale, L., Zedan, H. and Al-Bayatti, A. (2014) The Implemen-tation of an Intelligent and Video-Based Fall Detection System Using a Neural Network. Applied Soft Computing, 18, 59-69. [Google Scholar] [CrossRef]
[4]	Nunez-Marcos, A., Azkune, G. and Arganda-Carreras, I. (2018) Vision-Based Fall Detection with Convolutional Neural Networks. Wireless Communications & Mobile Compu-ting, 2017, Article ID: 9474806. [Google Scholar] [CrossRef]
[5]	Wang, H. and Schmid, C. (2013) Action Recognition with Improved Trajectories. Proceedings of the IEEE International Conference on Computer Vision, Sydney, 1-8 December 2013, 3551-3558. [Google Scholar] [CrossRef]
[6]	Chen, W., Jiang, Z., Guo, H. and Ni, X. (2020) Fall Detection Based on Key Points of Human-Skeleton Using Open Pose. Symmetry, 12, 744. [Google Scholar] [CrossRef]
[7]	Song, S., Lan, C., Xing, J., Zeng, W. and Liu, J. (2017) An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data. Proceedings of the AAAI Confer-ence on Artificial Intelligence, 31, 501.
[8]	Tran, D., Bourdev, L., Fergus, R., Torresani, L. and Paluri, M. (2015) Learning Spatiotemporal Features with 3d Convolutional Networks. Proceedings of the IEEE International Conference on Computer Vision, Santiago, 7-13 December 2015, 4489-4497. [Google Scholar] [CrossRef]
[9]	Xu, H., Das, A. and Saenko, K. (2017) R-c3d: Region Convolutional 3d Network for Temporal Activity Detection. Proceed-ings of the IEEE International Conference on Computer Vision, Venice, 22-29 October 2017, 5783-5792. [Google Scholar] [CrossRef]
[10]	Papandreou, G., et al. (2017) Towards Accurate Multi-Person Pose Estimation in the Wild. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 4903-4911. [Google Scholar] [CrossRef]
[11]	Fang, H.-S., Xie, S., Tai, Y.-W. and Lu, C. (2017) Rmpe: Regional Multi-Person Pose Estimation. Proceedings of the IEEE International Conference on Computer Vision, Venice, 22-29 October 2017, 2334-2343. [Google Scholar] [CrossRef]
[12]	Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G. and Sun, J. (2018) Cascaded Pyramid Network for Multi-Person Pose Estimation. Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 7103-7112. [Google Scholar] [CrossRef]
[13]	Cao, Z., Simon, T., Wei, S. and Sheikh, Y. (2017) Realtime Mul-ti-Person 2D Pose Estimation Using Part Affinity Fields. Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 1302-1310. [Google Scholar] [CrossRef]
[14]	Kreiss, S., Bertoni, L. and Alahi, A. (2019) PifPaf: Composite Fields for Human Pose Estimation. Computer Vision and Pattern Recognition, Long Beach, 16-20 June 2019, 11977-11986. [Google Scholar] [CrossRef]
[15]	Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M. and Schiele, B. (2016) DeeperCut: A Deeper, Stronger, and Faster Multi-Person Pose Estimation Model. European Confer-ence on Computer Vision, Amsterdam, 11-14 October 2016, 34-50. [Google Scholar] [CrossRef]
[16]	Osokin, D. (2019) Real-Time 2D Multi-Person Pose Estimation on CPU: Lightweight OpenPose. International Conference on Pattern Recognition Applications and Methods, Prague, 19-21 February 2019, 744-748. [Google Scholar] [CrossRef]
[17]	Sekii, T. (2018) Pose Proposal Networks. European Conference on Computer Vision, Munich, 8-14 September 2018, 350-366. [Google Scholar] [CrossRef]
[18]	Redmon, J., Divvala, S.K., Girshick, R. and Farhadi, A. (2016) You Only Look Once: Unified, Real-Time Object Detection. Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 779-788. [Google Scholar] [CrossRef]
[19]	Lin, H.Y., Hsueh, Y.L. and Lie, W.N. (2017) Abnormal Event Detec-tion Using Microsoft Kinect in a Smart Home. 2016 International Computer Symposium (ICS), Chiayi, 15-17 December 2016, 285-289. [Google Scholar] [CrossRef]
[20]	Lie, W.N., Le, A.T. and Lin, G.H. (2018) Human Fall-Down Event Detection Based on 2D Skeletons and Deep Learning Approach. 2018 International Workshop on Advanced Image Technology (IWAIT), Chiang Mai, 7-10 January 2018, 1-4. [Google Scholar] [CrossRef]
[21]	Yan, S., Xiong, Y. and Lin, D. (2018) Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recogni-tion.
[22]	Kuhn, H.W. (1955) The Hungarian Method for the Assignment Problem. Naval Research Logistics Quarterly, 2, 83-97. [Google Scholar] [CrossRef]
[23]	Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. and Chen, L. (2018) MobileNetV2: Inverted Residuals and Linear Bottlenecks. Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 4510-4520. [Google Scholar] [CrossRef]
[24]	Bewley, A., Ge, Z., Ott, L., Ramos, F. and Upcroft, B. (2016) Simple Online and Real-Time Tracking. 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, 25-28 September 2016, 3464-3468. [Google Scholar] [CrossRef]

为你推荐

友情链接