|
[1]
|
Sutton, R.S. and Barto, A.G. (1998) Reinforcement Learning: An Introduction. Vol. 1, No. 1, MIT Press.
|
|
[2]
|
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., et al. (2015) Human-Level Control through Deep Reinforcement Learning. Nature, 518, 529-533. [Google Scholar] [CrossRef] [PubMed]
|
|
[3]
|
Shapley, L.S. (1953) Stochastic Games. Proceedings of the National Academy of Sciences, 39, 1095-1100. [Google Scholar] [CrossRef] [PubMed]
|
|
[4]
|
Lowe, R., et al. (2017) Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 6382-6393.
|
|
[5]
|
Hernandez-Leal, P., et al. (2017) A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity.
|
|
[6]
|
Tan, M. (1993) Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents. Proceedings of the 10th International Conference, Amherst, 27-29 June 1993, 330-337. [Google Scholar] [CrossRef]
|
|
[7]
|
Rashid, T., et al. (2020) Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. Journal of Machine Learning Research, 21, 1-51.
|
|
[8]
|
Singh, S., Kearns, M.J. and Mansour, Y. (2000) Nash Convergence of Gradient Dynamics in General-Sum Games. UAI.
|
|
[9]
|
Foerster, J.N., et al. (2017) Learning with Opponent-Learning Awareness.
|
|
[10]
|
Letcher, A., et al. (2018) Stable Opponent Shaping in Differentiable Games.
|
|
[11]
|
Schulman, J., et al. (2015) Trust Region Policy Optimization. ICML’15: Proceedings of the 32nd International Conference on Machine Learning, Volume 37, 1889-1897.
|
|
[12]
|
Haarnoja, T., Zhou, A., Abbeel, P. and Levine, S. (2018) Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. Proceedings of the 35th International Conference on Machine Learning, Stockholm, 10-15 July 2018, 1861-1870.
|
|
[13]
|
Czarnecki, W.M., et al. (2020) Real World Games Look like Spinning Tops. NIPS’20: Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, 6-12 December 2020, 17443-17454.
|
|
[14]
|
Littman, M.L. (1994) Markov Games as a Framework for Multi-Agent Reinforcement Learning. Proceedings of the 11th International Conference, Rutgers University, New Brunswick, 10-13 July 1994, 157-163. [Google Scholar] [CrossRef]
|
|
[15]
|
Ziebart, B.D., et al. (2008) Maximum Entropy Inverse Reinforcement Learning. AAAI, Volume 8, 1433-1438.
|
|
[16]
|
Hochreiter, S. and Schmidhuber, J. (1997) Flat Minima. Neural Computation, 9, 1-42. [Google Scholar] [CrossRef] [PubMed]
|
|
[17]
|
Goodfellow, I.J., Vinyals, O. and Saxe, A.M. (2014) Qualitatively Characterizing Neural Network Optimization Problems.
|