基于混合域注意力的深度强化学习交叉口信号控制方法
An Intersection Signal Control Method with Deep Reinforcement Learning Based on Mixed Domain Attention
DOI: 10.12677/csa.2024.144088, PDF,   
作者: 李忠华:广东工业大学自动化学院,广东 广州;何子登*:广州羊城通有限公司大数据分公司,广东 广州
关键词: 混合域注意力深度强化学习交通信号控制3DQN算法SUMOMixed Domain Attention Deep Reinforcement Learning Traffic Signal Control 3DQN Algorithm SUMO
摘要: 针对强化学习智能体对微观交通状态感知能力有限的问题,本文提出了一种基于混合域注意力的深度强化学习交叉口信号控制算法3DQN_MDAM。首先,为减少存储开销,设计了一种轻量的混合域注意力模块(Mixed Domain Attention Module, MDAM),仅使用少量的参数就能实现自适应地调整交通状态特征图中通道之间及空间位置之间权重的功能。然后,在现有基于双深度决斗Q网络(Double Dueling DQN, 3DQN)算法模型的基础上通过引入MDAM,使智能体自动地聚焦于对当前控制任务更为重要的交通状态信息,以增强智能体的状态感知能力。最后,利用仿真平台SUMO (Simulation of Urban Mobility)进行实验。实验结果显示,在低、中、高三种不同交通流条件下,3DQN_MDAM相比3DQN在各项指标上均得到改善,其中车辆平均等待时间分别缩短了20%、20%、17.6%。与其它常用的基准算法相比,3DQN_MDAM在各项指标上均得到最好的控制效果。
Abstract: Aiming at the problem that reinforcement learning agents have limited perception of microscopic traffic conditions, an intersection signal control algorithm 3DQN_MDAM with deep reinforcement learning based on mixed domain attention is proposed. Firstly, to reduce storage overhead, a lightweight mixed domain attention module (MDAM) is designed, which can adaptively adjust the weights between channels and spatial positions in the traffic state feature map with only a small number of parameters. Then, based on the existing 3DQN algorithm model, by introducing MDAM, the agent automatically focuses on traffic status information that is more important to the current control task, in order to enhance the agent’s state perception ability. Finally, experiments were conducted using the simulation platform SUMO (Simulation of Urban Mobility). The experimental results show that under three different traffic flow conditions of low, medium, and high, 3DQN_MDAM has improved in various indicators compared to 3DQN, with average waiting time of vehicles reduced by 20%, 20%, and 17.6%, respectively. Compared with other commonly used benchmark algorithms, 3DQN_MDAM achieved the best control effect on all indicators.
文章引用:李忠华, 何子登. 基于混合域注意力的深度强化学习交叉口信号控制方法[J]. 计算机科学与应用, 2024, 14(4): 177-192. https://doi.org/10.12677/csa.2024.144088

参考文献

[1] Webster, F.V. (1958) Traffic Signal Settings. Road Research Technical Paper, 39.
[2] Miller, A.J. (1963) Settings for Fixed-Cycle Traffic Signals. Journal of the Operational Research Society, 14, 373-386. [Google Scholar] [CrossRef
[3] Cools, S.B., Gershenson, C. and D’Hooghe, B. (2013) Self-Organizing Traffic Lights: A Realistic Simulation. In: Prokopenko, M., Ed., Advances in Applied Self-Organizing Systems, Springer, Berlin, 45-55. [Google Scholar] [CrossRef
[4] 刘志, 曹诗鹏, 沈阳, 等. 基于改进深度强化学习方法的单交叉口信号控制[J]. 计算机科学, 2020, 47(12): 226-232.
[5] Arulkumaran, K., Deisenroth, M.P., Brundage, et al. (2017) Deep Reinforcement Learning: A Brief Survey. IEEE Signal Processing Magazine, 34, 26-38. [Google Scholar] [CrossRef
[6] Shi, B., Darrell, T. and Wang, X. (2023) Top-Down Visual Attention from Analysis by Synthesis. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, 18-22 June 2023, 2102-2112. [Google Scholar] [CrossRef
[7] Abdulhai, B., Pringle, R. and Karakoulas, G.J. (2003) Reinforcement Learning for True Adaptive Traffic Signal Control. Journal of Transportation Engineering, 129, 278-285. [Google Scholar] [CrossRef
[8] Li, L., Lv, Y. and Wang, F.Y. (2016) Traffic Signal Timing via Deep Reinforcement Learning. IEEE/CAA Journal of Automatica Sinica, 3, 247-254. [Google Scholar] [CrossRef
[9] Touhbi, S., Babram, M.A., Nguyen-Huu, T., et al. (2017) Adaptive Traffic Signal Control: Exploring Reward Definition for Reinforcement Learning. Procedia Computer Science, 109, 513-520. [Google Scholar] [CrossRef
[10] Wan, C.H. and Hwang, M.C. (2018) Value‐Based Deep Reinforcement Learning for Adaptive Isolated Intersection Signal Control. IET Intelligent Transport Systems, 12, 1005-1010. [Google Scholar] [CrossRef
[11] Liu, S., Wu, G. and Barth, M. (2022) A Complete State Transition-Based Traffic Signal Control Using Deep Reinforcement Learning. 2022 IEEE Conference on Technologies for Sustainability (SusTech), Corona, 21-23 April 2022, 100-107. [Google Scholar] [CrossRef
[12] Ye, B.L., Wu, P., Wu, W., et al. (2022) Q-Learning Based Traffic Signal Control Method for an Isolated Intersection. 2022 China Automation Congress (CAC), Xiamen, 25-27 November 2022, 6063-6068. [Google Scholar] [CrossRef
[13] Yi, C., Wu, J., Ren, Y., Ran, Y. and Lou, Y. (2022) A Spatial-Temporal Deep Reinforcement Learning Model for Large-Scale Centralized Traffic Signal Control. 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), Macau, 8-12 October 2022, 275-280. [Google Scholar] [CrossRef
[14] Genders, W. and Razavi, S. (2016) Using a Deep Reinforcement Learning Agent for Traffic Signal Control.
https://arxiv.org/pdf/1611.01142.pdf
[15] Yu, P. and Luo, J. (2022) Minimize Pressure Difference Traffic Signal Control Based on Deep Reinforcement Learning. 2022 41st Chinese Control Conference (CCC), Hefei, 25-27 July 2022, 5493-5498. [Google Scholar] [CrossRef
[16] Haddad, T.A., Hedjazi, D. and Aouag, S. (2022) A New Deep Reinforcement Learning-Based Adaptive Traffic Light Control Approach for Isolated Intersection. 2022 5th International Symposium on Informatics and its Applications (ISIA), M’sila, 29-30 November 2022, 1-6. [Google Scholar] [CrossRef
[17] Liang, X., Du, X., Wang, G. and Han, Z. (2019) A Deep Reinforcement Learning Network for Traffic Light Cycle Control. IEEE Transactions on Vehicular Technology, 68, 1243-1253. [Google Scholar] [CrossRef
[18] An, Y. and Zhang, J. (2022) Traffic Signal Control Method Based on Modified Proximal Policy Optimization. 2022 10th International Conference on Traffic and Logistic Engineering (ICTLE), Macau, 12-14 August 2022, 83-88. [Google Scholar] [CrossRef
[19] Wei, H., Zheng, G., Yao, H., et al. (2018) Intellilight: A Reinforcement Learning Approach for Intelligent Traffic Light Control. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, 19-23 July 2018, 2496-2505. [Google Scholar] [CrossRef
[20] Genders, W. and Razavi, S. (2018) Evaluating Reinforcement Learning State Representations for Adaptive Traffic Signal Control. Procedia Computer Science, 130, 26-33. [Google Scholar] [CrossRef
[21] 任安妮, 周大可, 冯锦浩, 等. 基于注意力机制的深度强化学习交通信号控制[J]. 计算机应用研究, 2023, 40(2): 430-434.
[22] Zhu, L., Peng, P., Lu, Z., et al. (2023) Metavim: Meta Variationally Intrinsic Motivated Reinforcement Learning for Decentralized Traffic Signal Control. IEEE Transactions on Knowledge and Data Engineering, 35, 11570-11584. [Google Scholar] [CrossRef
[23] Woo, S., Park, J., Lee, J.Y., et al. (2018) Cbam: Convolutional Block Attention Module. In: Proceedings of the European Conference on Computer Vision (ECCV), Springer, Cham, 3-19. [Google Scholar] [CrossRef
[24] Hu, J., Shen, L. and Sun, G. (2018) Squeeze-and-Excitation Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-22 June 2018, 7132-7141. [Google Scholar] [CrossRef
[25] Wang, Q., Wu, B., Zhu, P., et al. (2020) ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, 13-19 June 2020, 11534-11542. [Google Scholar] [CrossRef
[26] Hou, Q., Zhou, D. and Feng, J. (2021) Coordinate Attention for Efficient Mobile Network Design. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 19-25 June 2021, 13713-13722. [Google Scholar] [CrossRef
[27] 于泽, 宁念文, 郑燕柳, 等. 深度强化学习驱动的智能交通信号控制策略综述[J]. 计算机科学, 2023, 50(4): 159-171.
[28] Wang, Z., Schaul, T., Hessel, M., et al. (2016) Dueling Network Architectures for Deep Reinforcement Learning. International Conference on Machine Learning, New York, 19-24 June 2016, 1995-2003.
[29] Van Hasselt, H., Guez, A. and Silver, D. (2016) Deep Reinforcement Learning with Double Q-Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 30, 2094-2100. [Google Scholar] [CrossRef
[30] Mnih, V., Kavukcuoglu, K., Silver, D., et al. (2013) Playing Atari with Deep Reinforcement Learning.
https://arxiv.org/pdf/1312.5602.pdf
[31] Varaiya, P. (2013) The Max-Pressure Controller for Arbitrary Networks of Signalized Intersections. In: Ukkusuri, S.V. and Ozbay, K., Eds., Advances in Dynamic Network Modeling in Complex Transportation Systems, Springer, New York, 27-66. [Google Scholar] [CrossRef