合作的多智能体强化学习算法
The Reinforcement Learning Algorithm for Cooperative Multi-Agent
摘要: 在多智能体的环境中,智能体的学习行为是一个有价值的研究内容。从系统设计者的角度来看,在同时存在多个智能体的环境中,能够让智能体朝着共同利益的最大化方向调整自己的行为策略,这是值得研究的。本文将提出一种合作的梯度算法(CL-WoLF-IGA),目的是让智能体朝着使得共同收益最大的策略学习。同时,为了让算法适用于马尔可夫博弈,我们放宽条件,提出CL-WoLF-PHC强化学习算法。该算法在只知道平均共同收益的未知环境中,也能够让使用算法的智能体最终达成能够使共同收益最大化的策略。同时,为了验证算法在实际博弈模型中的表现,我们用经典的博弈模型进行检验CL-WoLF-IGA算法。仿真结果表明,算法具有良好的收敛性。
Abstract: In the multi-agent environment, the learning behavior of agents is a valuable research content. From the perspective of system designer, it is worth studying that in an environment where multiple agents exist simultaneously, agents can adjust their behavior strategies in the direction of maximizing common interests. In this paper, a cooperative gradient algorithm (CL-Wolf-IGA) is proposed to make the agent learn towards the strategy that maximizes the common benefit. Meanwhile, in order to make the algorithm suitable for Markov games, we relax the conditions and propose CL-Wolf-PHC reinforcement learning algorithm. Even in the unknown environment where only the average common benefit is known, the algorithm can make the agent using the algorithm finally reach the strategy that can maximize the common benefit. At the same time, in order to verify the performance of the algorithm in the actual game model, we use a classical game model to test the CL-Wolf-PHC algorithm. Simulation results show that the algorithm has good convergence.
文章引用:秦前伟, 邓喜才. 合作的多智能体强化学习算法[J]. 运筹与模糊学, 2022, 12(2): 312-321. https://doi.org/10.12677/ORF.2022.122032

参考文献

[1] Littman, M. (1994) Markov Games as a Framework for Multi-Agent Reinforcement Learning. Machine Learning Proceedings 1994, New Brunswick, 10-13 July 1994, 1435-1445. [Google Scholar] [CrossRef
[2] Hu, J. (1998) Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm. 15th International Conference on Machine Learning.
[3] Singh, S. and Kearns, M. (2000) Nash Convergence of Gradient Dynamics in General-Sum Games. Conference on Uncertainty in Arti-ficial Intelligence, 541-548.
[4] Zinkevich, M. (2003) Online Convex Programming and Generalized Infinitesimal Gradient Ascent. Proceedings of the 20th International Conference on Machine Learning (ICML-03), 928-936.
[5] Bowling, M. and Veloso, M. (2002) Multiagent Learning Using a Variable Learning Rate. Artificial Intelligence, 136, 215-250. [Google Scholar] [CrossRef
[6] Bowling, M. (2005) Convergence and No-Regret in Multiagent Learning. Advances in Neural Information Processing Systems, 17, 209-216.
[7] Abdallah, S. and Lesser, V. (2008) A Multiagent Reinforcement Learning Algorithm with Non-Linear Dynamics. Journal of Artificial Intelligence Research, 33, 521-549. [Google Scholar] [CrossRef
[8] Phon-Amnuaisuk, S. (2009) Learning Cooperative Behaviours in Multiagent Reinforcement Learning. International Conference on Neural Information Processing, 570-579. [Google Scholar] [CrossRef
[9] Crandall, J. and Goodrich, M. (2011) Learning to Compete, Coordinate, and Cooperate in Repeated Games Using Reinforcement Learning. Machine Learning, 82, 281-314. [Google Scholar] [CrossRef
[10] Zhang, C. and Hao, J. (2019) SA-IGA: A Multiagent Reinforcement Learning Method towards Socially Optimal Outcomes. Autonomous Agents and Multi-Agent Systems, 33, 403-429. [Google Scholar] [CrossRef
[11] Watkins, C. and Dayan, P. (1992) Q-Learning. Machine Learning, 8, 279-292. [Google Scholar] [CrossRef
[12] Sutton, R. (1988) Learning to Predict by the Methods of Temporal Differences. Machine Learning, 3, 9-44. [Google Scholar] [CrossRef