四足机器人复杂地形行走与摔倒恢复的统一控制方法——基于深度强化学习的研究
Unified Control Method for Complex Terrain Locomotion and Fall Recovery of Quadruped Robots—A Study Based on Deep Reinforcement Learning
摘要: 随着四足机器人在非结构化环境中的应用需求日益增加,如何应对外部扰动和复杂地形引发的失衡与摔倒,成为了机器人自主性和任务连续性面临的重要挑战。传统的摔倒恢复方法依赖外部干预,限制了机器人的自主性,难以满足复杂应用中的需求。尽管深度强化学习在运动控制方面取得了一定进展,但在摔倒后的自主恢复和基于地形特征的动态恢复策略方面的研究仍显不足。本文提出了一种基于深度强化学习的统一运动与恢复控制方法,使四足机器人能够在复杂地形中行走,并自主恢复摔倒。该方法结合了摔倒恢复因子、动态增长策略和安全约束优化,解决了现有方法中的不足。实验表明,机器人能够在不同地形条件下快速恢复并稳定过渡到行走状态,表现出较强的适应性和鲁棒性。本研究为四足机器人在高风险应用中的自主执行能力提供了有效的解决方案。
Abstract: With the growing demand for quadruped robots operating in unstructured environments, addressing instability and falls caused by external disturbances and complex terrains has become a critical challenge for ensuring robot autonomy and mission continuity. Traditional fall recovery methods often rely on external interventions, which limit the autonomy of the robot and fail to meet the needs of complex, real-world applications. Although deep reinforcement learning (DRL) has made notable progress in motion control, research on autonomous post-fall recovery and dynamic recovery strategies based on terrain features remains limited. In this paper, we propose a unified locomotion and recovery control framework based on deep reinforcement learning, enabling quadruped robots to walk over complex terrains and autonomously recover from falls. The framework integrates a fall recovery factor, a dynamic scheduling strategy, and safety-constrained optimization to address the limitations of existing approaches. Specifically, a non-symmetric actor-critic architecture is adopted, enhanced with a context-aided estimator to improve terrain-aware decision-making. Additionally, a dynamic β-VAE latent constraint strategy is introduced to facilitate stable training, while the NP3O algorithm ensures safe and efficient policy optimization under torque and stability constraints. Extensive experiments demonstrate that the proposed method enables quadruped robots to quickly recover from falls under various terrain conditions and transition smoothly back into locomotion. The robots exhibit strong adaptability and robustness, significantly improving their operational autonomy in high-risk environments. This study provides an effective solution for enhancing the autonomous capabilities of quadruped robots in real-world applications involving challenging and hazardous terrains.
文章引用:孔德然, 范永, 赵荣华. 四足机器人复杂地形行走与摔倒恢复的统一控制方法——基于深度强化学习的研究[J]. 人工智能与机器人研究, 2025, 14(4): 1052-1063. https://doi.org/10.12677/airr.2025.144100

参考文献

[1] Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V. and Hutter, M. (2020) Learning Quadrupedal Locomotion over Challenging Terrain. Science Robotics, 5, eabc5986. [Google Scholar] [CrossRef] [PubMed]
[2] Miki, T., Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V. and Hutter, M. (2022) Learning Robust Perceptive Locomotion for Quadrupedal Robots in the Wild. Science Robotics, 7, eabk2822. [Google Scholar] [CrossRef] [PubMed]
[3] Kumar, A., Fu, Z., Pathak, D. and Malik, J. (2021) RMA: Rapid Motor Adaptation for Legged Robots. Robotics: Science and Systems XVII, 12-16 July 2021, 1-12. [Google Scholar] [CrossRef
[4] Aswin Nahrendra, I.M., Yu, B. and Myung, H. (2023) DreamWaQ: Learning Robust Quadrupedal Locomotion with Implicit Terrain Imagination via Deep Reinforcement Learning. 2023 IEEE International Conference on Robotics and Automation (ICRA), London, 29 May-2 June 2023, 5078-5084. [Google Scholar] [CrossRef
[5] Long, J., Wang, Z., Li, Q., et al. (2023) Hybrid Internal Model: Learning Agile Legged Locomotion with Simulated Robot Response. arXiv: 2312.11460.
[6] Long, J., Yu, W., Li, Q., et al. (2024) Learning H-Infinity Locomotion Control. arXiv: 2404.14405.
[7] Lee, J., Hwangbo, J. and Hutter, M. (2019) Robust Recovery Controller for a Quadrupedal Robot Using Deep Reinforcement Learning. arXiv:1901.07517.
[8] Smith, L., Kew, J.C., Bin Peng, X., Ha, S., Tan, J. and Levine, S. (2022) Legged Robots That Keep on Learning: Fine-Tuning Locomotion Policies in the Real World. 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, 23-27 May 2022, 1593-1599. [Google Scholar] [CrossRef
[9] Nahrendra, I.M.A., Oh, M., Yu, B., et al. (2023) Robust Recovery Motion Control for Quadrupedal Robots via Learned Terrain Imagination. arXiv: 2306.12712.
[10] Bowman, S.R., Vilnis, L., Vinyals, O., Dai, A., Jozefowicz, R. and Bengio, S. (2016) Generating Sentences from a Continuous Space. Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, Berlin, August 2016, 10-21. [Google Scholar] [CrossRef
[11] Shen, L., Yang, L., Chen, S., Yuan, B., Wang, X., Tao, D., et al. (2022) Penalized Proximal Policy Optimization for Safe Reinforcement Learning. arXiv: 2205.11814.
[12] Rudin, N., Hoeller, D., Reist, P. and Hutter, M. (2022) Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning. arXiv: 2109.11978.
[13] Pinto, L., Andrychowicz, M., Welinder, P., Zaremba, W. and Abbeel, P. (2018) Asymmetric Actor Critic for Image-Based Robot Learning. Robotics: Science and Systems XIV, Pittsburgh, 26-30 June 2018, 1-10. [Google Scholar] [CrossRef
[14] Kingma, D.P. and Welling, M. (2013) Auto-Encoding Variational Bayes. arXiv: 1312.6114.
[15] Higgins, I., Matthey, L., Pal, A., et al. (2017) β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. Proceeding of International Conference on Learning Representations (ICLR) 2017, Toulon, 24-26 April 2017, 1-13.
[16] Burgess, C.P., Higgins, I., Pal, A., et al. (2017) Understanding Disentangling in β-VAE. arXiv: 1804.03599.
[17] Kullback, S. and Leibler, R.A. (1951) On Information and Sufficiency. The Annals of Mathematical Statistics, 22, 79-86. [Google Scholar] [CrossRef
[18] Schulman, J., Wolski, F., Dhariwal, P., et al. (2017) Proximal Policy Optimization Algorithms. arXiv: 1707.06347.
[19] Lee, J., Schro, K.V., et al. (2023) Evaluation of Constrained Reinforcement Learning Algorithms for Legged Locomotion. arXiv: 2309.15430.
[20] Makoviychuk, V., Wawrzyniak, L., Guo, Y., et al. (2021) Isaac Gym: High Performance GPU-Based Physics Simulation for Robot Learning. arXiv: 2108.10470.
[21] Kingma, D.P. and Ba, J. (2015) Adam: A Method for Stochastic Optimization. arXiv: 1412.6980.