|
[1]
|
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., et al. (2016) Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature, 529, 484-489. [Google Scholar] [CrossRef] [PubMed]
|
|
[2]
|
Brown, N. and Sandholm, T. (2017) Libratus: The Superhuman AI for No-Limit Poker. Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, 19-25 August 2017, 5226-5228. [Google Scholar] [CrossRef]
|
|
[3]
|
Vinyals, O., Babuschkin, I., Chung, J., et al. (2019) Alphastar: Mastering the Real-Time Strategy Game Starcraft II. DeepMind Blog, 2.
|
|
[4]
|
Kober, J., Bagnell, J.A. and Peters, J. (2013) Reinforcement Learning in Robotics: A Survey. The International Journal of Robotics Research, 32, 1238-1274. [Google Scholar] [CrossRef]
|
|
[5]
|
Wei, E. and Luke, S. (2016) Lenient Learning in Independent-Learner Stochastic Cooperative Games. Journal of Machine Learning Research, 17, 1-42.
|
|
[6]
|
Cui, Q.W. and Du, S.S. (2022) When Are Offline Two-Player Zero-Sum Markov Games Solvable? 36th Conference on Neural Information Processing Systems (NeurIPS 2022), New Orleans, 28 November-9 December 2022, 25779-25791.
|
|
[7]
|
Yan, Y., Li, G., Chen, Y. and Fan, J. (2024) Model-Based Reinforcement Learning for Offline Zero-Sum Markov Games. Operations Research, 72, 2430-2445. [Google Scholar] [CrossRef]
|
|
[8]
|
Sayin, M., et al. (2021) Decentralized Q-Learning in Zero-Sum Markov Games. 35th Conference on Neural Information Processing Systems (NeurIPS 2021), 6-14 December 2021, 18320-18334.
|
|
[9]
|
Yang, Y.D. and Wang, J. (2020) An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective.
|
|
[10]
|
Puterman, M.L. (2014) Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons.
|
|
[11]
|
Kakade, S.M. (2003) On the Sample Complexity of Reinforcement Learning. University of London, University College London.
|