结合注意力机制的多智能体深度强化学习的交通信号控制

doi:10.12677/orf.2024.142143

期刊菜单

结合注意力机制的多智能体深度强化学习的交通信号控制
Traffic Signal Control Using Multi-Agent Deep Reinforcement Learning Combined with Attention Mechanism

DOI: 10.12677/orf.2024.142143, PDF,
作者: 徐晴晴：上海理工大学光电信息与计算机工程学院，上海
关键词: 多智能体深度强化学习；智能交通信号控制；平均场理论；机器学习；Multi Agent Deep Reinforcement Learning； Intelligent Traffic Signal Control； Mean Field Theory； Machine Learning

摘要: 智能交通信号控制方法被越来越多的应用在现实世界中，并且取得了不错的成果。其中，多智能体深度强化学习是一种非常有效的方法，但是，在多交叉口交通信号控制中，大规模的交通网络容易引起严重的维度灾难，而且对于道路环境的特征提取也存在不足。针对以上问题，提出了一种新的多智能体深度强化学习算法，该算法基于双决斗深度Q网络(Double Dueling Deep Q-Network, 3DQN)，消除了传统强化学习算法对Q值的高估问题。引入了平均场(Mean Field, MF)理论大大减少了状态和动作空间的维度，同时融合了注意力机制对道路环境全面观察，使得智能体获得更准确的环境信息。在城市交通模拟器(Simulation Of Urban Mobility, SUMO)中建模了一个交通网络，模拟真实世界中的交通流，对算法进行评估。实验结果表明，提出的算法在奖励方面相较于DQN、DDPG、MA2C分别增加了64.17%、36.40%、32.55%，证明了所提算法的正确性和优越性。

Abstract: Intelligent traffic signal control methods are increasingly being applied in the real world and have achieved good results. Among them, multi-agent deep reinforcement learning is a very effective method. However, in multi-intersection traffic signal control, large-scale traffic networks are prone to serious dimensional disasters, and there are also shortcomings in feature extraction of road environments. A new multi-agent deep reinforcement learning algorithm is proposed to address the above issues. This algorithm is based on the Double Dueling Deep Q-Network (3DQN) and eliminates the problem of overestimation of values in traditional reinforcement learning algorithms. The introduction of Mean Field (MF) theory greatly reduces the dimensions of state and action space, while integrating attention mechanisms to comprehensively observe the road environment, enabling intelligent agents to obtain more accurate environmental information. A traffic network was modeled in the Simulation of Urban Mobility (SUMO) to simulate real-world traffic flow and evaluate the algorithm. The experimental results show that the proposed algorithm has increased rewards by 64.17%, 36.40%, and 32.55% compared to DQN, DDPG, and MA2C, respectively, proving the correctness and superiority of the proposed algorithm.

文章引用：徐晴晴. 结合注意力机制的多智能体深度强化学习的交通信号控制[J]. 运筹与模糊学, 2024, 14(2): 373-387. https://doi.org/10.12677/orf.2024.142143

参考文献

[1]	政府网站: 国务院. 上海: 道路交通实现智慧治理[J]. 2019-04-03. https://www.gov.cn/xinwen/2019-04/03/content_5379482.htm, 2024-02-15.
[2]	Webster, F.V. (1958) Traffic Signal Settings.
[3]	Vincent, R.A. and Peirce, J.R. (1988) “MOVA”: Traffic Responsive, Self-Optimising Signal Control for Isolated Intersections.
[4]	Sims, A.G. (1979) The Sydney Coordinated Adaptive Traffic System. Engineering Foundation Conference on Research Directions in Computer Control of Urban Traffic Systems, Pacific Grove, 11-16 February 1979, 12-27.
[5]	Kaelbling, L.P., Littman, M.L. and Moore, A.W. (1996) Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research, 4, 237-285. [Google Scholar] [CrossRef]
[6]	Wei, H., Zheng, G., Gayah, V., et al. (2021) Recent Advances in Reinforcement Learning for Traffic Signal Control: A Survey of Models and Evaluation. ACM SIGKDD Explorations Newsletter, 22, 12-18. [Google Scholar] [CrossRef]
[7]	Li, L., Lv, Y. and Wang, F.Y. (2016) Traffic Signal Timing via Deep Reinforcement Learning. IEEE/CAA Journal of Automatica Sinica, 3, 247-254. [Google Scholar] [CrossRef]
[8]	Luo, J., Li, X. and Zheng, Y. (2020) Researches on Intelligent Traffic Signal Control Based on Deep Reinforcement Learning. 2020 IEEE 16th International Conference on Mobility, Sensing and Networking (MSN), Tokyo, 17-19 December 2020, 729-734. [Google Scholar] [CrossRef]
[9]	Wang, S., Xie, X., Huang, K., et al. (2019) Deep Reinforcement Learning-Based Traffic Signal Control Using High-Resolution Event-Based Data. Entropy, 21, Article No. 744. [Google Scholar] [CrossRef] [PubMed]
[10]	Buşoniu, L., Babuška, R. and De Schutter, B. (2010) Multi-Agent Reinforcement Learning: An Overview. In: Srinivasan, D. and Jain, L.C., Eds., Innovations in Multi-Agent Systems and Applications—1, Springer, Berlin, 183-221. [Google Scholar] [CrossRef]
[11]	Haddad, T.A., Hedjazi, D. and Aouag, S. (2022) A Deep Reinforcement Learning-Based Cooperative Approach for Multi-Intersection Traffic Signal Control. Engineering Applications of Artificial Intelligence, 114, Article ID: 105019. [Google Scholar] [CrossRef]
[12]	Chu, T., Wang, J., Codecà, L., et al. (2020) Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control. IEEE Transactions on Intelligent Transportation Systems, 21, 1086-1095. [Google Scholar] [CrossRef]
[13]	Wu, T., Zhou, P., Liu, K., et al. (2020) Multi-Agent Deep Reinforcement Learning for Urban Traffic Light Control in Vehicular Networks. IEEE Transactions on Vehicular Technology, 69, 8243-8256. [Google Scholar] [CrossRef]
[14]	Wang, X., Ke, L., Qiao, Z., et al. (2020) Large-Scale Traffic Signal Control Using a Novel Multiagent Reinforcement Learning. IEEE Transactions on Cybernetics, 51, 174-187. [Google Scholar] [CrossRef]
[15]	Garivier, A. and Moulines, E. (2011) On Upper-Confidence Bound Policies for Switching Bandit Problems. International Conference on Algorithmic Learning Theory, Espoo, 5-7 October 2011, 174-188. [Google Scholar] [CrossRef]
[16]	Yang, Y., Luo, R., Li, M., et al. (2018) Mean Field Multi-Agent Reinforcement Learning. International Conference on Machine Learning PMLR, Stockholm, 10-15 July 2018, 5571-5580.
[17]	Hu, T., Hu, Z., Lu, Z., et al. (2023) Dynamic Traffic Signal Control Using Mean Field Multi-Agent Reinforcement Learning in Large Scale Road-Networks. IET Intelligent Transport Systems, 17, 1715-1728. [Google Scholar] [CrossRef]
[18]	Vaswani, A., Shazeer, N., Parmar, N., et al. (2017) Attention Is All You Need. Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, 4-9 December 2017, 232-241.
[19]	Pérolat, J., Strub, F., Piot, B., et al. (2017) Learning Nash Equilibrium for General-Sum Markov Games from Batch Data. Artificial Intelligence and Statistics. PMLR, 2017, Fort Lauderdale, 20-22 April 2017, 232-241.
[20]	Lillicrap, T.P., Hunt, J.J., Pritzel, A., et al. (2015) Continuous Control with Deep Reinforcement Learning.
[21]	Schulman, J., Wolski, F., Dhariwal, P., et al. (2017) Proximal Policy Optimization Algorithms.
[22]	Prabuchandran, K.J., An, H.K. and Bhatnagar, S. (2014) Multi-Agent Reinforcement Learning for Traffic Signal Control. 17th International IEEE Conference on Intelligent Transportation Systems (ITSC), Qingdao, 8-11 October 2014, 2529-2534. [Google Scholar] [CrossRef]

为你推荐

友情链接