|
[1]
|
Li, Y. (2017) Deep Reinforcement Learning: An Overview. arXiv preprint arXiv:1701.07274.
|
|
[2]
|
LeCun, Y., Bengio, Y. and Hinton, G. (2015) Deep Learning. Nature, 521, 436-444. [Google Scholar] [CrossRef] [PubMed]
|
|
[3]
|
Kaelbling, L.P., Littman, M.L. and Moore, A.W. (1996) Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research, 4, 237-285. [Google Scholar] [CrossRef]
|
|
[4]
|
Barto, A.G., Sutton, R.S. and Watkins, C. (1989) Learning and Sequential Decision Making. University of Massachusetts, Amherst.
|
|
[5]
|
Fedus, W., Ghosh, D., Martin, J.D., et al. (2020) On Catastrophic Interference in Atari 2600 Games. arXiv preprint arXiv:2002.12499.
|
|
[6]
|
Conrad, S., Teichmann, J., Auth, P., et al. (2024) 3D-Printed Digital Pneumatic Logic for the Control of Soft Robotic Actuators. Science Robotics, 9, eadh4060. [Google Scholar] [CrossRef] [PubMed]
|
|
[7]
|
Brown, N. and Sandholm, T. (2018) Superhuman AI for Heads-up No-Limit Poker: Libratus Beats Top Professionals. Science, 359, 418-424. [Google Scholar] [CrossRef] [PubMed]
|
|
[8]
|
Brown, N. and Sandholm, T. (2019) Superhuman AI for Multiplayer Poker. Science, 365, 885-890. [Google Scholar] [CrossRef] [PubMed]
|
|
[9]
|
Da Silva, F.L. and Costa, A.H.R. (2019) A Survey on Transfer Learning for Multiagent Reinforcement Learning Systems. Journal of Artificial Intelligence Research, 64, 645-703. [Google Scholar] [CrossRef]
|
|
[10]
|
Bellemare, M.G., Dabney, W. and Munos, R. (2017) A Distributional Perspective on Reinforcement Learning. Proceedings of the 34th International Conference on Machine Learning, Sydney, 6-11 August 2017, 449-458.
|
|
[11]
|
Sun, W.F., Lee, C.K. and Lee, C.Y. (2021) DFAC Framework: Factorizing the Value Function via Quantile Mixture for Multi-Agent Distributional Q-Learning. Proceedings of the 38th International Conference on Machine Learning, 18-24 July 2021, 9945-9954.
|
|
[12]
|
Hong, Y., Jin, Y. and Tang, Y. (2022) Rethinking Individual Global Max in Cooperative Multi-Agent Reinforcement Learning. Advances in Neural Information Processing Systems, 35, 32438-32449.
|
|
[13]
|
Zhao, J., Yang, M., Zhao, Y., et al. (2023) MCMARL: Parameterizing Value Function via Mixture of Categorical Distributions for Multi-Agent Reinforcement Learning. IEEE Transactions on Games, 1-10. [Google Scholar] [CrossRef]
|
|
[14]
|
Kappen, H.J. (2011) Optimal Control Theory and the Linear Bellman Equation. In: Barber, D., Cemgil, A.T. and Chiappa, S., Eds., Bayesian Time Series Models, Cambridge University Press, Cambridge, 363-387. [Google Scholar] [CrossRef]
|
|
[15]
|
Filar, J. and Vrieze, K. (2012) Competitive Markov Decision Processes. Springer Science & Business Media, Berlin.
|
|
[16]
|
Guicheng, S. and Yang, W. (2022) Review on Dec-POMDP Model for Marl Algorithms. In: Jain, L.C., Kountchev, R., Hu, B. and Kountcheva, R., Eds., Smart Communications, Smart Communications, Intelligent Algorithms and Interactive Methods, Springer, Singapore, 29-35. [Google Scholar] [CrossRef]
|
|
[17]
|
Zhou, Y., Liu, S., Qing, Y., et al. (2023) Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL? arXiv preprint arXiv:2305.17352.
|
|
[18]
|
Lowe, R., Wu, Y.I., Tamar, A., et al. (2017) Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 6382-6393.
|
|
[19]
|
Sunehag, P., Lever, G., Gruslys, A., et al. (2017) Value-Decomposition Networks for Cooperative Multi-Agent Learning. arXiv preprint arXiv:1706.05296.
|
|
[20]
|
Rashid, T., Samvelyan, M., De Witt, C.S., et al. (2020) Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. The Journal of Machine Learning Research, 21, 7234-7284.
|
|
[21]
|
Yang, Y., Hao, J., Liao, B., et al. (2020) Qatten: A General Framework for Cooperative Multiagent Reinforcement Learning. arXiv preprint arXiv:2002.03939.
|
|
[22]
|
Hu, J., Harding, S.A., Wu, H., et al. (2020) QR-MIX: Distributional Value Function Factorisation for Cooperative Multi-Agent Reinforcement Learning. arXiv preprint arXiv:2009.04197.
|
|
[23]
|
Qiu, W., Wang, X., Yu, R., et al. (2021) RMIX: Learning Risk-Sensitive Policies for Cooperative Reinforcement Learning Agents. Advances in Neural Information Processing Systems, 34, 23049-23062.
|
|
[24]
|
Darling, D.A. (1957) The Kolmogorov-Smirnov, Cramer-von Mises Tests. The Annals of Mathematical Statistics, 28, 823-838. [Google Scholar] [CrossRef]
|