CSA  >> Vol. 7 No. 7 (July 2017)

    Dynamic Planning Method Based on Time Delayed Q-Learning

  • 全文下载: PDF(490KB) HTML   XML   PP.671-677   DOI: 10.12677/CSA.2017.77078  
  • 下载量: 107  浏览量: 148   国家自然科学基金支持


庄 夏:中国民用航空飞行学院,四川 广汉

机器人动态规划时延Q学习最优策略Robot Dynamic Planning Time Delayed Q Learning Optimal Policy



Aiming at the unknown environment of the existing robot dynamic planning methods with the slow convergence, a robot planning method based on time delayed Q-Learning is proposed. Firstly, the robot planning is modeled as MDP model, and it is then transferred as the problem which can be solved by reinforcement learning method. Then, the goal function of dynamic planning is defined, and the planning algorithm based on time delayed Q-Learning is proposed. The Q value of every state action pair is initialized to Rmax, so that all the state action pairs are explored, via decreasing the number of updating for Q value, to improve the updating efficiency. The simulation experiment shows: this time delayed Q-Learning algorithm can achieve the path planning of the mobile robot; compared with the other methods, this method has the advantages of good convergence performance and quick convergence rate with big priority, thus it is an effective robot planning method.

庄夏. 基于时延Q学习的机器人动态规划方法[J]. 计算机科学与应用, 2017, 7(7): 671-677. https://doi.org/10.12677/CSA.2017.77078


[1] Schaal, S. and Atkeson, C. (2010) Learning Control in Robotics. IEEE Robotics & Automation Magazine, 17, 20-29.
[2] 宋勇, 李贻斌, 李彩虹. 移动机器人路径规划强化学习的初始化[J]. 控制理论与应用, 2012, 12(29): 1623-1628.
[3] Bu, Q., Wang, Z. and Tong, X. (2013) An Improved Genetic Algorithm for Searching for Pollution Sources. Water Science and Engineering, 6, 392-401.
[4] Deng, Z.Y. and Chen, C.K. (2006) Mobile Robot Path Planning Based on Improved Genetic Algorithm. Journal of Chinese Computer Systems, 27, 1695-1699.
[5] Liu, C.M., Li, Z.B., Zhen, H., et al. (2013) A Reactive Navigation Method of Mobile Robots Based on LSPI and Rolling Windows. Journal of Central South University (Science and Technology), 44, 970-977.
[6] Er, M.J. and Zhou, Y. (2008) A Novel Framework for Automatic Generation of Fuzzy Neural Networks. Neurocomputing, 71, 584-591.
[7] 曾明如, 徐小勇, 罗浩, 徐志敏. 多步长蚁群算法的机器人路径规划研究[J]. 小型微型计算机系统, 2016, 2(37): 366-369.
[8] 屈鸿, 黄利伟, 柯星. 动态环境下基于改进蚁群算法的机器人路径规划研究[J]. 电子科技大学学报, 2015, 2(44): 260-265.
[9] 翁理国, 纪壮壮, 夏旻, 王安. 基于改进多目标粒子群算法的机器人路径规划[J]. 系统仿真学报, 2014, 12(26): 2892-2898.
[10] 潘桂彬, 潘丰, 刘国栋. 基于改进混合蛙跳算法的移动机器人路径规划[J]. 计算机应用, 2014, 34(10): 2850-2853.
[11] 温素芳, 郭光耀. 基于改进人工势场法的移动机器人路径规划[J]. 计算机工程与设计, 2015, 10(36): 2818-2822.
[12] Watkins, C.J.C.H. and Dayan, P. (1992) Q-Learning. Machine Learning, 8, 279-292.
[13] Palanisamy, M., Modares, H., Lewis, F.L., et al. (2015) Continuous-Time Q-Learning for Infinite-Horizon Discounted Cost Linear Quadratic Regulator Problems. IEEE Transactions on Cybernetics, 45, 165-176.