基于驾驶风格识别与协作学习的交通信号强化学习方法
A Reinforcement Learning Approach for Traffic Signal Control Integrating Driving Style Recognition and Collaborative Learning
DOI: 10.12677/mos.2025.149581, PDF,   
作者: 李 琦:上海电科智能系统股份有限公司,上海;周鲁露:上海市公安局交通管理总队,上海
关键词: 多智能体深度强化学习智能交通信号控制驾驶风格协作学习Multi-Agent Deep Reinforcement Learning Intelligent Traffic Signal Control Driving Style Collaborative Learning
摘要: 随着城市化进程的加快,城市交通拥堵问题日益严峻,已成为制约城市可持续发展的核心瓶颈。近年来,智能交通信号控制技术迅速发展,尤其是基于多智能体深度强化学习(Multi-Agent Deep Reinforcement Learning, MADRL)的方法,为缓解交通拥堵提供了新思路。然而,现有研究普遍忽视了驾驶员行为的异质性,并在大规模交叉口网络中面临状态空间维度过高与智能体协作效率不足的双重挑战。针对上述问题,本文提出一种融合驾驶风格识别与邻居协作机制的多智能体深度强化学习算法——CDS-DQN (Collaborative Driving Style-aware Deep Q-Network)。该算法设计了一种轻量化驾驶风格识别模块,通过量化车辆行为特征(如激进型、普通型、保守型),构建“有效占有率”指标作为状态输入,以增强对微观交通特性的感知能力。同时,提出了一种邻居状态共享机制,使各智能体能够获取相邻交叉口的关键信息,实现局部协同感知并缓解多智能体系统中的环境非平稳性问题。基于SUMO (Simulation of Urban Mobility)仿真平台构建的城市路网环境中对该算法进行了实验验证。结果表明,CDS-DQN在平均等待时间、队列长度与通行效率等指标上,均优于传统固定配时控制、独立DQN与主流的MA2C算法,充分展示了其有效性与先进性。
Abstract: With the accelerating process of urbanization, urban traffic congestion has become increasingly severe, emerging as a critical bottleneck for sustainable urban development. In recent years, intelligent traffic signal control technology has undergone rapid development, particularly methods based on Multi-Agent Deep Reinforcement Learning (MADRL), which offer new avenues for alleviating traffic congestion. However, existing research often overlooks the heterogeneity of driver behavior and faces the dual challenges of high-dimensional state spaces and inefficient agent collaboration in large-scale intersection networks. To address these issues, this paper introduces a novel Multi-Agent Deep Reinforcement Learning algorithm, the Collaborative Driving Style-aware Deep Q-Network (CDS-DQN). The proposed algorithm features a lightweight driving style recognition module that quantifies vehicle behavioral characteristics (e.g., aggressive, normal, and conservative) to formulate an “effective occupancy” metric. This metric serves as a state input, enhancing the agent’s perceptual capabilities regarding micro-level traffic dynamics. Furthermore, a neighbor state-sharing mechanism is proposed, enabling each agent to access critical information from adjacent intersections. This facilitates local collaborative perception and mitigates the environmental non-stationarity problem inherent in multi-agent systems. The algorithm was experimentally validated in an urban road network environment constructed on the SUMO (Simulation of Urban Mobility) platform. The results demonstrate that CDS-DQN outperforms traditional fixed-time control, independent DQN, and the state-of-the-art MA2C algorithm across key performance indicators, including average waiting time, queue length, and traffic throughput. These findings fully showcase the effectiveness and advanced nature of the proposed algorithm.
文章引用:李琦, 周鲁露. 基于驾驶风格识别与协作学习的交通信号强化学习方法[J]. 建模与仿真, 2025, 14(9): 21-29. https://doi.org/10.12677/mos.2025.149581

参考文献

[1] Bilbao-Ubillos, J. (2008) The Costs of Urban Congestion: Estimation of Welfare Losses Arising from Congestion on Cross-Town Link Roads. Transportation Research Part A: Policy and Practice, 42, 1098-1108. [Google Scholar] [CrossRef
[2] Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., et al. (2017) Mastering the Game of Go without Human Knowledge. Nature, 550, 354-359. [Google Scholar] [CrossRef] [PubMed]
[3] Sims, A.G. and Dobinson, K.W. (1980) The Sydney Coordinated Adaptive Traffic (SCAT) System Philosophy and Benefits. IEEE Transactions on Vehicular Technology, 29, 130-137. [Google Scholar] [CrossRef
[4] Hunt, P.B., Robertson, D.I., Bretherton, R.D., et al. (1981) SCOOT—A Traffic Responsive Method of Coordinating Signals.
[5] Kaelbling, L.P., Littman, M.L. and Moore, A.W. (1996) Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research, 4, 237-285. [Google Scholar] [CrossRef
[6] Li, L., Lv, Y. and Wang, F. (2016) Traffic Signal Timing via Deep Reinforcement Learning. IEEE/CAA Journal of Automatica Sinica, 3, 247-254. [Google Scholar] [CrossRef
[7] Buşoniu, L., Babuška, R. and De Schutter, B. (2010) Multi-Agent Reinforcement Learning: An Overview. In: Srinivasan, D. and Jain, L.C., Eds., Innovations in Multi-Agent Systems and Applications—1, Springer, 183-221. [Google Scholar] [CrossRef
[8] Wei, H., Zheng, G., Gayah, V. and Li, Z. (2021) Recent Advances in Reinforcement Learning for Traffic Signal Control: A Survey of Models and Evaluation. ACM SIGKDD Explorations Newsletter, 22, 12-18. [Google Scholar] [CrossRef
[9] Rong, J., Mao, K. and Ma, J. (2011) Effects of Individual Differences on Driving Behavior and Traffic Flow Characteristics. Transportation Research Record: Journal of the Transportation Research Board, 2248, 1-9.
[10] Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., et al. (2015) Human-Level Control through Deep Reinforcement Learning. Nature, 518, 529-533. [Google Scholar] [CrossRef] [PubMed]
[11] Van Hasselt, H., Guez, A. and Silver, D. (2016) Deep Reinforcement Learning with Double Q-Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 30, 2094-2100. [Google Scholar] [CrossRef
[12] Wang, Z., Schaul, T., Hessel, M., et al. (2016) Dueling Network Architectures for Deep Reinforcement Learning. International Conference on Machine Learning. PMLR, New York, 20-22 June 2016, 1995-2003.
[13] Chu, T., Wang, J., Codecà, L. and Li, Z. (2020) Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control. IEEE Transactions on Intelligent Transportation Systems, 21, 1086-1095. [Google Scholar] [CrossRef
[14] Haddad, T.A., Hedjazi, D. and Aouag, S. (2022) A Deep Reinforcement Learning-Based Cooperative Approach for Multi-Intersection Traffic Signal Control. Engineering Applications of Artificial Intelligence, 114, Article ID: 105019. [Google Scholar] [CrossRef