基于分布式强化学习的功率控制算法研究

doi:10.12677/SEA.2023.123052

期刊菜单

基于分布式强化学习的功率控制算法研究
Research on Power Control Algorithm Based on Distributed Reinforcement Learning

DOI: 10.12677/SEA.2023.123052, PDF, 科研立项经费支持
作者: 司轲, 李烨：上海理工大学光电信息与计算机工程学院，上海
关键词: 分布式强化学习；功率控制；Actor-Critic算法；双重深度Q网络；延迟深度确定性策略梯度；Distributed Reinforcement Learning； Power Control； Actor-Critic Algorithm； Dual Depth Q Network； Delay Depth Deterministic Strategy Gradient

摘要: 强化学习作为一种无模型的控制方法被应用于解决蜂窝网络中的同信道干扰问题。然而，在基于值的强化学习算法中，函数逼近存在误差导致Q值被高估，使算法收敛至次优策略而对信道干扰的抑制性能不佳，且在高频带场景中收敛速度缓慢。对此提出一种适用于分布式部署下的控制方法，使用DDQN学习离散策略，以添加三元组批评机制的延迟深度确定性策略梯度算法学习连续策略；使算法对动作价值的估计更准确，以提升算法在不同频带数量场景下对干扰的抑制性能。通过数量的扩展性实验表明了所提算法在不同频带数量场景下，保证更快收敛速度的同时对信道干扰有更好的抑制效果，证明了算法的有效性与扩展性。

Abstract: Reinforcement learning is applied as a model free control method to solve the problem of co channel interference in cellular networks. However, in value based reinforcement learning algorithms, error in function approximation leads to overestimation of the Q value, which leads to the algorithm converging to a suboptimal strategy and poor performance in suppressing channel interference, and the convergence speed is slow in high-frequency scenarios. This paper proposes a control method suitable for distributed deployment, which uses DDQN to learn discrete strategies, and adds a delay-depth deterministic strategy gradient algorithm with a triplet criticism mechanism to learn continuous strategies; Make the algorithm’s estimation of action value more accurate to improve the algorithm’s interference suppression performance under different frequency band number scenarios. Quantitative scalability experiments have shown that the proposed algorithm guarantees faster convergence speed and better suppression of channel interference in different frequency band scenarios, demonstrating the effectiveness and scalability of the algorithm.

文章引用：司轲, 李烨. 基于分布式强化学习的功率控制算法研究[J]. 软件工程与应用, 2023, 12(3): 530-542. https://doi.org/10.12677/SEA.2023.123052

参考文献

[1]	Luo, Z.-Q. and Zhang, S. (2008) Dynamic Spectrum Management: Complexity and Duality. IEEE Journal of Selected Topics in Signal Processing, 2, 57-73. [Google Scholar] [CrossRef]
[2]	Tan, J., Zhang, L. and Liang, Y.-C. (2019) Deep Reinforcement Learning for Channel Selection and Power Control in D2D Networks. 2019 IEEE Global Communications Conference (GLOBECOM), Waikoloa, 9-13 December 2019, 1-6. [Google Scholar] [CrossRef]
[3]	Shen, K. and Yu, W. (2018) Fractional Programming for Communication Systems—Part I: Power Control and Beamforming. IEEE Transactions on Signal Processing, 66, 2616-2630. [Google Scholar] [CrossRef]
[4]	Sun, H., Chen, X., Shi, Q., et al. (2018) Learning to Optimize: Training Deep Neural Networks for Interference Management. IEEE Transactions on Signal Processing, 66, 5438-5453. [Google Scholar] [CrossRef]
[5]	Tan, J., Liang, Y.-C., Zhang, L. and Feng, G. (2020) Deep Reinforcement Learning for Joint Channel Selection and Power Control in D2D Networks. IEEE Transactions on Wireless Communications, 20, 1363-1378. [Google Scholar] [CrossRef]
[6]	Nasir, Y.S. and Guo, D. (2019) Multi-Agent Deep Reinforcement Learning for Dynamic Power Allocation in Wireless Networks. IEEE Journal on Selected Areas in Communications, 37, 2239-2250. [Google Scholar] [CrossRef]
[7]	Meng, F., Chen, P., Wu, L. and Cheng, J. (2020) Power Allocation in Multi-User Cellular Networks: Deep Reinforcement Learning Approaches. IEEE Transactions on Wireless Communications, 19, 6255-6267. [Google Scholar] [CrossRef]
[8]	Nasir, Y.S. and Guo, D. (2021) Deep Reinforcement Learning for Joint Spectrum and Power Allocation in Cellular Networks. 2021 IEEE Globecom Workshops (GC Wkshps), Madrid, 7-11 December 2021, 1-6. [Google Scholar] [CrossRef]
[9]	Van Hasselt, H., Guez, A. and Silver, D. (2016) Deep Reinforcement Learning with Q-Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 30, 2094-2100. [Google Scholar] [CrossRef]
[10]	Fujimoto, S., van Hoof, H. and Meger, D. (2018) Addressing Function Approximation Error in Actor-Critic Methods. Proceedings of the 35th International Conference on Machine Learning, Stockholm, 10-15 July 2018.
[11]	Wu, D., Dong, X., Shen, J. and Hoi, S.C.H. (2020) Reducing Estimation Bias via Triplet-Average Deep Deterministic Policy Gradient. IEEE Transactions on Neural Networks and Learning Systems, 31, 4933-4945. [Google Scholar] [CrossRef]
[12]	Nguyen, T.T., Nguyen, N.D. and Nahavandi, S. (2020) Deep Reinforcement Learning for Multiagent Systems: A Review of Challenges, Solutions, and Applications. IEEE Transactions on Cybernetics, 50, 3826-3839. [Google Scholar] [CrossRef]
[13]	Ren, J., He, Y., Wen, D., et al. (2020) Scheduling for Cellular Federated Edge Learning with Importance and Channel Awareness. IEEE Transactions on Wireless Communications, 19, 7690-7703. [Google Scholar] [CrossRef]
[14]	Liang, L., Peng, H., Li, G.Y. and Shen, X. (2017) Vehicular Communications: A Physical Layer Perspective. IEEE Transactions on Vehicular Technology, 66, 10647-10659. [Google Scholar] [CrossRef]
[15]	陈晓玉, 周佳玲. 分布式强化学习在经济调度问题中的应用[J]. 控制工程, 2022, 29(3): 480-485.
[16]	Duan, J., Guan, Y., Li, S.E., Ren, Y., Sun, Q. and Cheng, B. (2021) Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation Errors. IEEE Transactions on Neural Networks and Learning Systems, 33, 6584-6598. [Google Scholar] [CrossRef]
[17]	何斌, 刘全, 张琳琳, 等. 一种加速时间差分算法收敛的方法[J]. 自动化学报, 2021, 47(7): 1679-1688.
[18]	Zhao, Y., Niemegeers, I.G. and De Groot, S.M.H. (2021) Dynamic Power Allocation for Cell-Free Massive MIMO: Deep Reinforcement Learning Methods. IEEE Access, 9, 102953-102965. [Google Scholar] [CrossRef]
[19]	Nasir, Y.S. and Guo, D. (2020) Deep Actor-Critic Learning for Distributed Power Control in Wireless Mobile Networks. 2020 54th Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, 1-5 November 2020, 398-402. [Google Scholar] [CrossRef]

为你推荐

友情链接