基于强化学习的多信道车联网频谱聚合共享
Reinforcement Learning-Based Aggregated Spectrum Sharing for Multi-Channel Vehicular Networking
DOI: 10.12677/CSA.2022.1212297, PDF,   
作者: 唐嘉程:成都信息工程大学,四川 成都;王辛果:中国航空工业无线电电子研究所,上海
关键词: 车联网多智能体强化学习认知无线电DQN多信道The Vehicular Network Multi-Agent Reinforcement Learning Cognitive Radio Deep Q Network Multi-Channal
摘要: 针对车联网需求日益增多以及频谱资源的短缺问题,本文结合认知无线电的频谱聚合功能以及多智能体强化学习方法,提出了基于强化学习的多信道车联网频谱聚合共享模型。每一条车辆到车辆链路作为一个智能体,共同与通信环境交互。各链路独立获得观测结果,同时获得共同的奖励。用这样的设置来促进多个智能体进行合作来训练Q网络,达到改善频谱聚合位置选取和功率分配这一智能体动作的目的。仿真结果表明,通过适当的奖励设计和训练机制,多个智能体能成功学会以分布式方式合作。在不损失车辆到基础设施链路传输总带宽的前提下,本模型能大幅度提高车辆到车辆链路的负载交付率。
Abstract: In response to the increasing demand of vehicular networks and the shortage of spectrum resources, this paper proposes a reinforcement learning-based spectrum aggregation and sharing model for multi-channel vehicular networks by combining the spectrum aggregation function of cognitive radio and a multi-agent reinforcement learning. Each vehicle-to-vehicle link, as an agent, interacts with the communication environment together. Each link obtains observations inde-pendently while receiving a common reward. Such a setup is used to facilitate cooperation among multiple agents to train the Q-network for the purpose of improving spectrum aggregation location picking and power allocation as an agent action. Simulation results show that multiple agents can successfully learn to cooperate in a distributed manner through appropriate reward design and training mechanisms. Without losing the total bandwidth of vehicle-to-infrastructure link transmis-sion, this model can substantially improve the load delivery rate of vehicle-to-vehicle links.
文章引用:唐嘉程, 王辛果. 基于强化学习的多信道车联网频谱聚合共享[J]. 计算机科学与应用, 2022, 12(12): 2925-2936. https://doi.org/10.12677/CSA.2022.1212297

参考文献

[1] Li, Y., Zhang, W., Wang, C.-X., Sun, J. and Liu, Y. (2020) Deep Reinforcement Learning for Dynamic Spectrum Sens-ing and Aggregation in Multi-Channel Wireless Networks. IEEE Transactions on Cognitive Communications and Net-working, 6, 464-475. [Google Scholar] [CrossRef
[2] Poston, J.D. and Horne, W.D. (2005) Discontiguous OFDM Considerations for Dynamic Spectrum Access in Idle TV Channels. First IEEE International Symposium on New Frontiers in Dynamic Spectrum Access Networks, 2005, Baltimore, 8-11 November 2005, 607-610. [Google Scholar] [CrossRef
[3] Botsov, M., Klügel, M., Kellerer, W. and Fertl, P. (2014) Location Dependent Resource Allocation for Mobile Device-to-Device Communications. 2014 IEEE Wireless Commu-nications and Networking Conference (WCNC), Istanbul, 6-9 April 2014, 1679-1684. [Google Scholar] [CrossRef
[4] Sun, W., Ström, E.G., Brännström, F., Sou, K.C. and Sui, Y. (2016) Radio Resource Management for D2D-Based V2V Communication. IEEE Transactions on Vehicular Technolo-gy, 65, 6636-6650. [Google Scholar] [CrossRef
[5] Ye, H., Liang, L., Li, G.Y., Kim, J., Lu, L. and Wu, M. (2018) Machine Learning for Vehicular Networks: Recent Advances and Application Examples. IEEE Vehicular Technology Magazine, 13, 94-101. [Google Scholar] [CrossRef
[6] Liang, L., Ye, H. and Li, G.Y. (2019) Toward Intelligent Vehic-ular Networks: A Machine Learning Framework. IEEE Internet of Things Journal, 6, 124-135. [Google Scholar] [CrossRef
[7] Liang, L., Ye, H. and Li, G.Y. (2019) Spectrum Sharing in Ve-hicular Networks Based on Multi-Agent Reinforcement Learning. IEEE Journal on Selected Areas in Communications, 37, 2282-2292. [Google Scholar] [CrossRef
[8] (2017) Technical Specification Group Radio Access Network. Study Enhancement 3GPP Support for 5G V2X Services. Release 15, Document 3GPP TR 22.886 V15.1.0, 3rd Gener-ation Partnership Project.
[9] Molina-Masegosa, R. and Gozalvez, J. (2017) LTE-V for Sidelink 5G V2X Vehicular Communications: A New 5G Technology for Short-Range Vehicle-to-Everything Communications. IEEE Vehicular Technology Magazine, 12, 30-39. [Google Scholar] [CrossRef
[10] Omidshafiei, S., Pazis, J., Amato, C., How, J.P. and Vian, J. (2017) Deep Decentralized Multi-Task Multi-Agent Reinforcement Learning under Par Tial Observability. Proceedings of the 34th International Conference on Machine Learning, (ICML), Sydney, 6-11 August 2017, 2681-2690.
[11] Foerster, J., et al. (2017) Stabilising Experience Replay for Deep Multi-Agent Rein-forcement Learning. Proceedings of the 34th International Conference on Machine Learning, Sydney, 6-11 August 2017, 1146-1155.
[12] Nasir, Y.S. and Guo, D. (2018) Deep Reinforcement Learning for Distributed Dynamic Pow-er Allocation in Wireless Networks. ArXiv: 1808.00490.
https://arxiv.org/abs/1808.00490
[13] Tan, M. (1993) Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents. Proceedings of the 10th International Con-ference, University of Massachusetts, Amherst, 27-29 June 1993, 330-337. [Google Scholar] [CrossRef
[14] Mnih, V., et al. (2015) Human-Level Control through Deep Reinforcement Learning. Nature, 518, 529-533. [Google Scholar] [CrossRef] [PubMed]
[15] Watkins, C.J.C.H. and Dayan, P. (1992) Q-Learning. Machine Learning, 8, 279-292. [Google Scholar] [CrossRef
[16] Sutton, R.S. and Barto, A.G. (1998) Reinforcement Learning: An Intro-duction. MIT Press, Cambridge.
[17] (2016) Technical Specification Group Radio Access Network. Study LTE-Based V2X Services, Release 14, Document 3GPP TR 36.885 V14.0.0, 3rd Generation Partnership Project.
[18] Ruder, S. (2016) An Overview of Gradient Descent Optimization Algorithms. ArXiv: 1609.04747.
https://arxiv.org/abs/1609.04747
[19] Ye, H., Li, G.Y. and Juang, B.-H.F. (2019) Deep Reinforcement Learning Based Resource Allocation for V2V Communications. IEEE Transactions on Vehicular Technology, 68, 3163-3173. [Google Scholar] [CrossRef