基于时延Q学习的机器人动态规划方法
Dynamic Planning Method Based on Time Delayed Q-Learning
DOI: 10.12677/CSA.2017.77078, PDF, HTML, XML,  被引量 下载: 1,888  浏览: 3,542  国家自然科学基金支持
作者: 庄 夏*:中国民用航空飞行学院,四川 广汉
关键词: 机器人动态规划时延Q学习最优策略Robot Dynamic Planning Time Delayed Q Learning Optimal Policy
摘要: 主要针对现有机器人动态规划方法环境未知,且收敛性能欠佳的缺点,提出了一种基于时延Q学习的机器人动态规划方法。首先,对机器人规划进行了MDP建模,将其转换为一个可以通过强化学习解决的问题。然后,定义了规划的目标函数,并描述了基于时延Q学习的机器人规划算法。在该算法中采用Rmax方法来初始化所有状态动作对的Q值,使得所有状态动作对都能被探索到,同时通过时延的Q值来减少Q值更新的次数,从而提高Q值更新的效率。仿真实验表明:文中设计的时延Q学习算法能有效地实现移动机器人的路径规划,较其它算法相比,具有收敛效果好和收敛速度快的优点,具有较大的优越性,是一种有效的机器人动态规划方法。
Abstract: Aiming at the unknown environment of the existing robot dynamic planning methods with the slow convergence, a robot planning method based on time delayed Q-Learning is proposed. Firstly, the robot planning is modeled as MDP model, and it is then transferred as the problem which can be solved by reinforcement learning method. Then, the goal function of dynamic planning is defined, and the planning algorithm based on time delayed Q-Learning is proposed. The Q value of every state action pair is initialized to Rmax, so that all the state action pairs are explored, via decreasing the number of updating for Q value, to improve the updating efficiency. The simulation experiment shows: this time delayed Q-Learning algorithm can achieve the path planning of the mobile robot; compared with the other methods, this method has the advantages of good convergence performance and quick convergence rate with big priority, thus it is an effective robot planning method.
文章引用:庄夏. 基于时延Q学习的机器人动态规划方法[J]. 计算机科学与应用, 2017, 7(7): 671-677. https://doi.org/10.12677/CSA.2017.77078

参考文献

[1] Schaal, S. and Atkeson, C. (2010) Learning Control in Robotics. IEEE Robotics & Automation Magazine, 17, 20-29.
https://doi.org/10.1109/MRA.2010.936957
[2] 宋勇, 李贻斌, 李彩虹. 移动机器人路径规划强化学习的初始化[J]. 控制理论与应用, 2012, 12(29): 1623-1628.
[3] Bu, Q., Wang, Z. and Tong, X. (2013) An Improved Genetic Algorithm for Searching for Pollution Sources. Water Science and Engineering, 6, 392-401.
[4] Deng, Z.Y. and Chen, C.K. (2006) Mobile Robot Path Planning Based on Improved Genetic Algorithm. Journal of Chinese Computer Systems, 27, 1695-1699.
[5] Liu, C.M., Li, Z.B., Zhen, H., et al. (2013) A Reactive Navigation Method of Mobile Robots Based on LSPI and Rolling Windows. Journal of Central South University (Science and Technology), 44, 970-977.
[6] Er, M.J. and Zhou, Y. (2008) A Novel Framework for Automatic Generation of Fuzzy Neural Networks. Neurocomputing, 71, 584-591.
https://doi.org/10.1016/j.neucom.2007.03.015
[7] 曾明如, 徐小勇, 罗浩, 徐志敏. 多步长蚁群算法的机器人路径规划研究[J]. 小型微型计算机系统, 2016, 2(37): 366-369.
[8] 屈鸿, 黄利伟, 柯星. 动态环境下基于改进蚁群算法的机器人路径规划研究[J]. 电子科技大学学报, 2015, 2(44): 260-265.
[9] 翁理国, 纪壮壮, 夏旻, 王安. 基于改进多目标粒子群算法的机器人路径规划[J]. 系统仿真学报, 2014, 12(26): 2892-2898.
[10] 潘桂彬, 潘丰, 刘国栋. 基于改进混合蛙跳算法的移动机器人路径规划[J]. 计算机应用, 2014, 34(10): 2850-2853.
[11] 温素芳, 郭光耀. 基于改进人工势场法的移动机器人路径规划[J]. 计算机工程与设计, 2015, 10(36): 2818-2822.
[12] Watkins, C.J.C.H. and Dayan, P. (1992) Q-Learning. Machine Learning, 8, 279-292.
[13] Palanisamy, M., Modares, H., Lewis, F.L., et al. (2015) Continuous-Time Q-Learning for Infinite-Horizon Discounted Cost Linear Quadratic Regulator Problems. IEEE Transactions on Cybernetics, 45, 165-176.
https://doi.org/10.1109/TCYB.2014.2322116