基于LCS和LS-SVM的多机器人强化学习
Multi-Robot Reinforcement Learning Based on LCS and LS-SVM
摘要:
本文提出了一种LCS和LS-SVM相结合的多机器人强化学习方法,LS-SVM获得的最优学习策略作为LCS的初始规则集。LCS通过与环境的交互,能更快发现指导多机器人强化学习的规则,为强化学习系统的动作选择提供实时、动态的反馈,使多机器人自主地学习到相互协作的最优策略。算法的分析和仿真表明多机器人学习空间大、学习速度收敛慢、学习效果不确定等问题得到很大的改善。
Abstract: This paper presents a multi-robot reinforcement learning method combination LCS and LS-SVM, the optimal learning strategy LS-SVM obtained as an initial rule set of LCS. LCS interact with the environment, which can quickly find the guiding rules for multi-robot reinforcement learning, provide real-time, dynamic feedback, so that multi-robot autonomously learn the optimal strategy of mutual cooperation. Algorithm analysis and simulation show that a large space for multi-robot learning, the learning speed converges slowly, uncertainties and other learning problems can get a great improvement.
参考文献
[1]
|
J. shao, J. Y. Yang. Multi-robot reinforcement learning based on learning classifier system with gradient descent methods. Jour- nal of Computational Information Systems, 2010, 6(8): 2449- 2455.
|
[2]
|
高阳, 陈世福, 陆鑫. 强化学习研究综述[J]. 自动化学报, 2004, 30(1): 86-100.
|
[3]
|
沈晶, 程晓北, 刘海波等. 动态环境中的强化学习[J]. 控制理论与应用, 2008, 25(1): 71-74.
|
[4]
|
邵杰, 杨静宇, 万鸣华, 黄传波. 基于学习分类器的多机器人路径规划收敛性研究[J]. 计算机研究与发展, 2010, 47(5): 948-955.
|
[5]
|
焦殿科, 石川. 共享经验的多主体强化学习研究[J]. 计算机工程, 2008, 34(11): 219-221.
|
[6]
|
陈卫东, 席玉庚, 顾东雷. 自主机器人的强化学习进展[J]. 机器人, 2001, 23(4): 379-384.
|
[7]
|
王雪松, 田西兰, 程玉虎, 易建强. 基于协同最小二乘支持向量机的Q学习[J]. 自动化学报, 2009, 35(2): 215-219.
|
[8]
|
X.-L. Wang, Z.-J. Yin, Y.-B. Lv and S.-F. Li. Operating rules classification system of water supply reservoir based on learning classifier system. Expert Systems with Applications, 2008, 36(3): 5654-5659.
|
[9]
|
P. Musilek. Enhanced learning classifier system for robot navi- gation. International Conference on Intelligent Robots and Sys- tems, Edmonton, 2-6 August 2005: 3390-3395.
|