基于强化学习的三维无人机路径规划综述
A Review of 3D UAV Path Planning Based on Reinforcement Learning
DOI: 10.12677/csa.2026.165168, PDF,    科研立项经费支持
作者: 张楚寒, 姜福宏:魏桥国科(滨州)科技有限公司,山东 滨州;王梦赑*:魏桥国科(北京)科技有限公司,北京;程 超:滨州魏桥国科高等技术研究院,山东 滨州
关键词: 无人机路径规划强化学习策略梯度UAV Path Planning Reinforcement Learning Policy Gradient
摘要: 传统无人机路径规划算法依赖精确环境模型,在复杂动态三维环境中存在适应性差、实时性差等明显局限性。文章首先阐述了传统路径规划算法的不足,引入深度强化学习作为解决该问题的全新技术路径。其次全面梳理值函数、策略梯度及值–策略混合三大类典型强化学习算法,深入探讨各类算法的核心原理,从单机与多机两个维度,系统总结了强化学习在三维无人机路径规划中的改进成果与应用进展。最后,聚焦无人机实际飞行场景的独特挑战,明确强化学习的应用瓶颈并总结未来发展方向,为该领域的理论研究与工程实践提供系统性参考。
Abstract: Traditional unmanned aerial vehicle (UAV) path planning algorithms rely heavily on accurate environmental models and exhibit significant limitations, including poor adaptability and inadequate real-time performance in complex dynamic three-dimensional (3D) environments. This thesis first elaborates on the inherent limitations of traditional path planning algorithms and introduces deep reinforcement learning as an innovative approach to address these challenges. Subsequently, it comprehensively reviews three major categories of reinforcement learning algorithms—namely value-based, policy gradient-based, and value-policy hybrid methods, delving into their core principles and systematically synthesizing their advancements and application outcomes in 3D UAV path planning from both single-UAV and multi-UAV perspectives. Finally, focusing on the unique challenges of actual UAV flight scenarios, the thesis clarifies the application bottlenecks of reinforcement learning and summarizes the future development directions, providing systematic references for theoretical research and engineering practice in this field.
文章引用:张楚寒, 姜福宏, 王梦赑, 程超. 基于强化学习的三维无人机路径规划综述[J]. 计算机科学与应用, 2026, 16(5): 95-109. https://doi.org/10.12677/csa.2026.165168

参考文献

[1] Yang, L., Qi, J.T., Xiao, J. and Yong, X. (2014) A Literature Review of UAV 3D Path Planning. Proceeding of the 11th World Congress on Intelligent Control and Automation, Shenyang, 29 June-4 July 2014, 2376-2381. [Google Scholar] [CrossRef
[2] 聂虹宇, 张广玉, 李德才, 等. 多旋翼无人机的环境感知与运动规划方法综述[J]. 信息与控制, 2025, 54(3): 353-371.
[3] 李晓辉, 苗苗, 冉保健, 等. 基于改进A*算法的无人机避障路径规划[J]. 计算机系统应用, 2021, 30(2): 255-259.
[4] 李亚飞, 赵瑞. 城市复杂环境下多目标无人机路径规划研究[J]. 南京航空航天大学学报, 2024, 56(6): 1002-1012.
[5] Huang, Y., Li, H., Dai, Y., Lu, G. and Duan, M. (2024) A 3D Path Planning Algorithm for UAVs Based on an Improved Artificial Potential Field and Bidirectional RRT. Drones, 8, Article 760. [Google Scholar] [CrossRef
[6] 曾国奇, 赵民强, 刘方圆, 等. 基于网格PRM的无人机多约束航路规划[J]. 系统工程与电子技术, 2016, 38(10): 2310-2316.
[7] Tripicchio, P., Unetti, M., D’Avella, S. and Avizzano, C.A. (2023) Smooth Coverage Path Planning for UAVs with Model Predictive Control Trajectory Tracking. Electronics, 12, Article 2310. [Google Scholar] [CrossRef
[8] Azar, A.T., Koubaa, A., Ali Mohamed, N., Ibrahim, H.A., Ibrahim, Z.F., Kazim, M., et al. (2021) Drone Deep Reinforcement Learning: A Review. Electronics, 10, Article 999. [Google Scholar] [CrossRef
[9] Sun, H., Zhang, W., Yu, R. and Zhang, Y. (2021) Motion Planning for Mobile Robots—Focusing on Deep Reinforcement Learning: A Systematic Review. IEEE Access, 9, 69061-69081. [Google Scholar] [CrossRef
[10] Zhu, K. and Zhang, T. (2021) Deep Reinforcement Learning Based Mobile Robot Navigation: A Review. Tsinghua Science and Technology, 26, 674-691. [Google Scholar] [CrossRef
[11] 熊斯, 李逸琛, 欧阳权, 等. 基于强化学习的无人机集群航迹规划研究综述[J]. 空间电子技术, 2025, 22(6): 1-8, 123.
[12] Tanimoto, Y. and Fukumizu, K. (2024) State-Separated SARSA: A Practical Sequential Decision-Making Algorithm with Recovering Rewards. arXiv: 2403.11520.
[13] 许振阳, 陈谋, 韩增亮, 等. 复杂环境下基于TCPDQN算法的低空飞行器动态航路规划[J]. 机器人, 2025, 47(3): 383-393.
[14] Watkins, C.J. and Watkins, P. (1989) Learning from Delayed Rewards. Ph.D. Thesis, King’s College.
[15] 张泽华, 杨波, 傅广, 等. 基于SARSA的动态蜂群算法求解作业车间调度问题[J]. 组合机床与自动化加工技术, 2023(6): 188-192.
[16] 陈一波, 赵知劲. 基于SARSA学习的跳频系统智能抗干扰决策算法[J]. 现代电子技术, 2023, 46(1): 31-35.
[17] 司彦娜, 普杰信, 于晓升, 等. 基于径向基神经网络的多步SARSA控制算法[J]. 控制与决策, 2023, 38(4): 944-950.
[18] 黄鑫, 张志佳, 孙平, 等. 基于深度强化学习的路径规划算法综述[J]. 机器人, 2026, 48(1): 196-216.
[19] 于天浩, 周航, 贾鑫悦, 等. 基于改进DQN算法的无人机路径规划算法研究[J]. 航空计算技术, 2025, 55(6): 59-63, 79.
[20] 王艺霖, 张烈平, 尹亚梦, 等. 基于改进DDQN的移动机器人路径规划算法[J]. 桂林航天工业学院学报, 2025, 30(5): 770-783.
[21] Van Hasselt, H., Guez, A. and Silver, D. (2016) Deep Reinforcement Learning with Double Q-Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 30, 2094-2100. [Google Scholar] [CrossRef
[22] Wang, Z., Schaul, T., Hessel, M., et al. (2016) Dueling Network Architectures for Deep Reinforcement Learning. Inter-national Conference on Machine Learning, New York, 19-24 June 2016, 1995-2003.
[23] 苏江玉. 基于深度强化学习的USV路径规划算法研究[D]: [硕士学位论文]. 哈尔滨: 哈尔滨工程大学, 2023.
[24] 武曲, 张义, 郭坤, 等. 基于DPES Dueling DQN的路径规划方法研究[J]. 计算机应用与软件, 2023, 40(6): 147-153, 233.
[25] Xu, Y., Wei, Y., Wang, D., Jiang, K. and Deng, H. (2023) Multi-UAV Path Planning in GPS and Communication Denial Environment. Sensors, 23, Article 2997. [Google Scholar] [CrossRef] [PubMed]
[26] Schulman, J., Levine, S., Abbeel, P., et al. (2015) Trust Region Policy Optimization. International Conference on Ma-chine Learning, Lille, 6-11 July 2015, 1889-1897.
[27] 万宇航, 朱子璐, 钟春富, 等. 基于改进PPO算法的机械臂动态路径规划[J]. 系统仿真学报, 2025, 37(6): 1462-1473.
[28] 程浩鹏, 朱涵, 杨高奇, 等. 深度强化学习及智能路径规划应用综述[J]. 现代计算机, 2022, 28(21): 1-10.
[29] Barto, A.G., Sutton, R.S. and Anderson, C.W. (1983) Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problems. IEEE Transactions on Systems, Man, and Cybernetics, 13, 834-846. [Google Scholar] [CrossRef
[30] Lillicrap, T.P., Hunt, J.J., Pritzel, A., et al. (2016) Continuous Control with Deep Reinforcement Learning. International Conference on Learning Representations, San Juan, 2-4 May 2016.
[31] Silver, D., Lever, G., Heess, N., et al. (2014) Deterministic Policy Gradient Algorithms. International Conference on Machine Learning, Beijing, 21-26 June 2014, 387-395.
[32] 王树森. 深度强化学习[M]. 北京: 人民邮电出版社, 2022.
[33] Fujimoto, S., Hoof, H. and Meger, D. (2018) Addressing Function Approximation Error in Actor-Critic Methods. International Conference on Machine Learning, Stockholm, 10-15 July 2018, 1587-1596.
[34] Mnih, V., Badia, A.P., Mirza, M., et al. (2016) Asynchronous Methods for Deep Reinforcement Learning. International Conference on Machine Learning, New York, 19-24 June 2016, 1928-1937.
[35] Haarnoja, T., Zhou, A., Abbeel, P., et al. (2018) Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. International Conference on Machine Learning, Stockholm, 10-15 July 2018, 1861-1870.
[36] 周明鑫. 基于强化学习的多智能体自主任务分配[D]: [硕士学位论文]. 哈尔滨: 哈尔滨工程大学, 2022.
[37] Wu, J., Sun, Y., Li, D., Shi, J., Li, X., Gao, L., et al. (2023) An Adaptive Conversion Speed Q-Learning Algorithm for Search and Rescue UAV Path Planning in Unknown Environments. IEEE Transactions on Vehicular Technology, 72, 15391-15404. [Google Scholar] [CrossRef
[38] Saeed, R.A., Ali, E.S., Abdelhaq, M., Alsaqour, R., Ahmed, F.R.A. and Saad, A.M.E. (2024) Energy Efficient Path Planning Scheme for Unmanned Aerial Vehicle Using Hybrid Generic Algorithm-Based Q-Learning Optimization. IEEE Access, 12, 13400-13417. [Google Scholar] [CrossRef
[39] 王现磊, 郝文宁, 陈刚, 等. 基于模拟退火策略的SARSA强化学习方法[J]. 计算机仿真, 2019, 36(4): 219-222, 228.
[40] Chao, Y., Dillmann, R., Roennau, A. and Xiong, Z. (2024) E-DQN-Based Path Planning Method for Drones in Airsim Simulator under Unknown Environment. Biomimetics, 9, Article 238. [Google Scholar] [CrossRef] [PubMed]
[41] Zhu, Y., Tan, Y., Chen, Y., Chen, L. and Lee, K.Y. (2024) UAV Path Planning Based on Random Obstacle Training and Linear Soft Update of DRL in Dense Urban Environment. Energies, 17, Article 2762. [Google Scholar] [CrossRef
[42] Jiang, W., Bao, C., Xu, G. and Wang, Y. (2021) Research on Autonomous Obstacle Avoidance and Target Tracking of UAV Based on Improved Dueling DQN Algorithm. 2021 China Automation Congress (CAC), Beijing, 22-24 October 2021, 5110-5115. [Google Scholar] [CrossRef
[43] Qi, C., Wu, C., Lei, L., Li, X. and Cong, P. (2022) UAV Path Planning Based on the Improved PPO Algorithm. 2022 Asia Conference on Advanced Robotics, Automation, and Control Engineering (ARACE), Qingdao, 26-28 August 2022, 193-199. [Google Scholar] [CrossRef
[44] Tian, S., Li, Y., Zhang, X., Zheng, L., Cheng, L., She, W., et al. (2024) Fast UAV Path Planning in Urban Environments Based on Three-Step Experience Buffer Sampling DDPG. Digital Communications and Networks, 10, 813-826. [Google Scholar] [CrossRef
[45] 牟文心, 时宏伟. 基于改进TD3算法的无人机轨迹规划[J]. 计算机系统应用, 2024, 33(12): 197-209.
[46] Zhao, F.Y., Li, D.Y., Wang, Z.X., Mao, J.L. and Wang, N.Y. (2024) Autonomous Localized Path Planning Algorithm for UAVs Based on TD3 Strategy. Scientific Reports, 14, Article No. 763. [Google Scholar] [CrossRef] [PubMed]
[47] Zhou, Y., Shu, J., Hao, H., Song, H. and Lai, X. (2023) UAV 3D Online Track Planning Based on Improved SAC Algorithm. Journal of the Brazilian Society of Mechanical Sciences and Engineering, 46, Article No. 12. [Google Scholar] [CrossRef
[48] 赵天隆, 陈龙胜, 张存富, 等. 融合强化学习与改进人工势场的无人机编队路径规划[J]. 航空兵器, 2025, 32(5): 54-63.
[49] Wang, W., Zhang, G., Da, Q. and Tian, Y. (2024) Path Planning with Improved Dueling DQN Algorithm for UAVs in Unknown Dynamic Environment. In: Li, S., Ed., Computational and Experimental Simulations in Engineering, Springer, 453-465. [Google Scholar] [CrossRef
[50] Zhang, Y., Ding, M., Yuan, Y., Zhang, J., Yang, Q., Shi, G., et al. (2024) Multi-UAV Cooperative Pursuit of a Fast-Moving Target UAV Based on the GM-TD3 Algorithm. Drones, 8, Article 557. [Google Scholar] [CrossRef
[51] Qiao, B., Jia, Z., Xiao, B. and Qian, H. (2025) Game Maneuver Decision-Making for Multi-UAV via PPO-A3C-PER Learning Method. In: Yan, L., Duan, H. and Deng, Y., Eds., Advances in Guidance, Navigation and Control, Springer, 72-81. [Google Scholar] [CrossRef
[52] 陈麒杰, 晋玉强, 韩露. 无人机路径规划算法研究综述[J]. 飞航导弹, 2020(5): 54-58.