强化学习方法的理论与应用研究

doi:10.12677/CSA.2022.123056

期刊菜单

强化学习方法的理论与应用研究
Theoretical and Applied Research on Reinforcement Learning Methods

DOI: 10.12677/CSA.2022.123056, PDF,
作者: 林晨：华南理工大学数学学院，广东广州
关键词: 人工智能；强化学习；理论；应用；Artificial Intelligence； Reinforcement Learning； Theory； Application

摘要: 强化学习是机器学习的一个重要分支，是人工智能领域的一大发展方向。本文讨论基于马尔可夫决策过程的强化学习基本框架，对强化学习基本模型进行分析，指出了强化学习的目标，对其中的理论推导进行拆解。文章从理论角度研究了深度强化学习的基础演员/评论家方法(actor-critic)，探讨了确定性策略梯度方法(DPG)的内涵。文章分析了近几年效果良好的双延迟深度确定性策略梯度(TD3)学习方法。文章研究了现阶段强化学习的研究方向与典型方法。文章关注了强化学习的应用，从现阶段强化学习应用领域、强化学习可以处理的问题以及强化学习遇到的挑战等方面分析强化学习，剖析了强化学习的应用现状并对未来发展方向进行了预测。

Abstract: Reinforcement Learning is an important branch of machine learning and a major development direction in the field of artificial intelligence. The article discusses the basic framework of Reinforcement Learning based on Markov Decision Process. The article analyzes the basic model, points out the goals and disassembles the theoretical derivation of Reinforcement Learning. The article analyzes actor-critic method from a theoretical perspective which is the basis of Deep Reinforcement Learning and talks about the insight of Deterministic Policy Gradient method. The article analyzes Twin Delayed Deep Deterministic policy gradient method that works well in recent years. The article studies the current research direction and typical methods of Reinforcement Learning. The article focuses on the application of Reinforcement Learning and analyzes the uses of Reinforcement Learning from an application perspective of Reinforcement Learning, problems that Reinforcement Learning can solve and the challenges that Reinforcement Learning faces. The article finally analyzes the application status of Reinforcement Learning and predicts the future of Reinforcement Learning.

文章引用：林晨. 强化学习方法的理论与应用研究[J]. 计算机科学与应用, 2022, 12(3): 554-564. https://doi.org/10.12677/CSA.2022.123056

参考文献

[1]	Sutton, R.S. and Barto, A.G. (2018) Reinforcement Learning: An Introduction. MIT Press, Cambridge, 54-93.
[2]	Konda, V.R. and Tsitsiklis, J.N. (2000) Actor-Critic Algorithms. Advances in Neural Information Pro-cessing Systems. NIPS Conference, Denver, Colorado, 29 November-4 December 1999.
[3]	Silver, D., Lever, G., Heess, N., et al. (2014) Deterministic Policy Gradient Algorithms. International Conference on Machine Learning, Bei-jing, 21-26 June 2014, 387-395.
[4]	Watkins, C.J.C.H. and Dayan, P. (1992) Q-Learning. Machine Learning, 8, 279-292.
[5]	Fujimoto, S., Hoof, H. and Meger, D. (2018) Addressing Function Approximation Error in Actor-Critic Methods. International Conference on Machine Learning, Stockholm, 10-15 July 2018, 1587-1596.
[6]	Heuillet, A., Couthouis, F. and Díaz-Rodríguez, N. (2021) Explainability in Deep Reinforcement Learning. Knowledge-Based Sys-tems, 214, Article ID: 106685. [Google Scholar] [CrossRef]
[7]	Madumal, P., Miller, T., Sonenberg, L., et al. (2020) Explainable Reinforcement Learning through a Causal Lens. Proceedings of the AAAI Con-ference on Artificial Intelligence, New York, 7-12 February 2020, 2493-2500.
[8]	Sequeira, P. and Gervasio, M. (2020) Interestingness Elements for Explainable Reinforcement Learning: Understanding Agents’ Capabilities and Limi-tations. Artificial Intelligence, 288, Article ID: 103367. [Google Scholar] [CrossRef]
[9]	Madumal, P., Miller, T., Sonenberg, L., et al. (2020) Distal Ex-planations for Explainable Reinforcement Learning Agents. arXiv:2001.10284.
[10]	Liventsev, V., Härmä, A. and Petković, M. (2021) Neurogenetic Programming Framework for Explainable Reinforcement Learning. Proceedings of the Genetic and Evolutionary Computation Conference Companion, Lille, 10-14 July 2021, 329-330.
[11]	Cruz, F., Daze-ley, R., Vamplew, P., et al. (2021) Explainable Robotic Systems: Understanding Goal-Driven Actions in a Reinforcement Learning Scenario. arXiv:2006.13615.
[12]	Vaswani, A., Shazeer, N., Parmar, N., et al. (2017) Attention Is All You Need. 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, 4-9 December 2017, 11 p.
[13]	Parisotto, E., Song, F., Rae, J., et al. (2020) Stabilizing Transformers for Reinforcement Learning. International Conference on Machine Learning, Virtual, 12-18 July 2020, 7487-7498.
[14]	Janner, M., Li, Q. and Levine, S. (2021) Offline Reinforcement Learning as One Big Sequence Modeling Problem. arXiv:2106.02039.
[15]	Chen, L., Lu, K., Rajeswaran, A., et al. (2021) Decision Transformer: Reinforcement Learning via Sequence Modeling. arXiv:2106.01345.
[16]	Yarats, D., Fergus, R., Lazaric, A., et al. (2021) Reinforcement Learning with Prototypical Representations. International Conference on Machine Learning, Virtual, 18-24 July 2021, 11920-11931.
[17]	Schwarzer, M., Rajkumar, N., Noukhovitch, M., et al. (2021) Pretraining Representations for Da-ta-Efficient Reinforcement Learning. 35th Conference on Neural Information Processing Systems (NeurIPS 2021), Vir-tual, 6-14 December 2021, 14 p.
[18]	Hansen, N. and Wang, X. (2021) Generalization in Reinforcement Learning by Soft Data Augmentation. 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, 30 May-5 June 2021, 13611-13617. [Google Scholar] [CrossRef]
[19]	Brockman, G., Cheung, V., Pettersson, L., et al. (2016) OpenAI Gym. arXiv:1606.01540.
[20]	Schrittwieser, J., Antonoglou, I., Hubert, T., et al. (2020) Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model. Nature, 588, 604-609.
[21]	Gu, B. and Sung, Y. (2021) Enhanced Reinforcement Learning Method Combining One-Hot Encoding-Based Vectors for CNN-Based Alternative High-Level Decisions. Applied Sciences, 11, Article No. 1291. [Google Scholar] [CrossRef]
[22]	Johannink, T., Bahl, S., Nair, A., et al. (2019) Residual Reinforcement Learning for Robot Control. 2019 International Conference on Robotics and Automation (ICRA), Montreal, 20-24 May 2019, 6023-6029. [Google Scholar] [CrossRef]
[23]	Zhang, R., Lv, Q., Li, J., et al. (2022) A Reinforcement Learn-ing Method for Human-Robot Collaboration in Assembly Tasks. Robotics and Computer-Integrated Manufacturing, 73, Article ID: 102227. [Google Scholar] [CrossRef]
[24]	Kiran, B.R., Sobh, I., Talpaert, V., et al. (2021) Deep Reinforce-ment Learning for Autonomous Driving: A Survey. IEEE Transactions on Intelligent Transportation Systems, 1-18. [Google Scholar] [CrossRef]
[25]	Ma, X., Li, J., Kochenderfer, M.J., et al. (2021) Reinforcement Learning for Autonomous Driving with Latent State Inference and Spatial-Temporal Relationships. 2021 IEEE Interna-tional Conference on Robotics and Automation (ICRA), Xi’an, 30 May-5 June 2021, 6064-6071. [Google Scholar] [CrossRef]
[26]	Chen, J., Li, S.E. and Tomizuka, M. (2021) Interpretable End-to-End Urban Autonomous Driving with Latent Deep Reinforcement Learning. IEEE Transactions on Intelligent Transportation Systems, 1-11. [Google Scholar] [CrossRef]

为你推荐

友情链接