改进遗传算法进化的演员网络种群强化学习算法
Evolving Actor Network Population Algorithm by Improved Genetic in Reinforcement Learning
摘要: 深度强化学习算法已成功应用于一系列具有挑战性的任务,然而这些方法通常会遇到奖励稀疏的时间信用分配、缺乏有效的探索以及探索经验不足等问题。演化算法是一类受自然进化启发的黑盒优化技术,算法提出了改进的混沌遗传算法以及量子遗传算法分别与强化学习算法结合,首先创建用于进化计算演员网络的总体,并使用梯度下降来更新网络参数,进化种群中的网络,直至算法收敛。算法的适应度度量整合强化学习中事件的回报,一定程度上解决了稀疏奖励条件下的时间信用分配问题;利用种群的方法来生成各种经验训练RL智能体,提高了鲁棒性。在离散和连续的强化学习环境中做了对比实验和消融实验,实验证明本文的算法能收敛到更高的奖励值,且能提高收敛速度。
Abstract: Deep reinforcement learning algorithms have been successfully applied to a range of challenging tasks; however, these methods often encounter problems such as sparse reward time credit allocation, lack of effective exploration, and insufficient exploration experience. Evolutionary algorithm is a type of black box optimization technique inspired by natural evolution. Improved chaotic genetic algorithm and quantum genetic algorithm are proposed to be combined with reinforcement learning algorithm. The algorithm first creates a population for evolutionary computation of actor networks and uses gradient descent to update network parameters, evolving the network in the population until the algorithm converges. The fitness measurement of the algorithm integrates the reward of events in reinforcement learning, which to some extent solves the problem of time credit allocation under sparse reward conditions; The use of population methods to generate various experience trained RL agents has improved robustness. Comparative experiments and ablation experiments were conducted in both discrete and continuous reinforcement learning environments, demonstrating that our algorithm can converge to higher reward values and improve convergence speed.
文章引用:张圣涛, 赵佳, 陈楚琪. 改进遗传算法进化的演员网络种群强化学习算法[J]. 计算机科学与应用, 2024, 14(10): 102-109. https://doi.org/10.12677/csa.2024.1410206

参考文献

[1] 程浩鹏, 朱涵, 杨高奇, 等. 深度强化学习及智能路径规划应用综述[J]. 现代计算机, 2022, 28(21): 1-10.
[2] 郑远鹏. 基于内在动机的多智能体协同目标搜索深度强化学习算法[D]: [硕士学位论文]. 北京: 北方工业大学, 2024.
[3] Wang, J., Zheng, Y., Zhang, Z., et al. (2024) A Novel Multi-State Reinforcement Learning-Based Multi-Objective Evolutionary Algorithm. Information Sciences, 688, Article ID: 121397. [Google Scholar] [CrossRef
[4] 郭洪飞, 陆鑫宇, 任亚平, 等. 基于强化学习的群体进化算法求解双边多目标同步并行拆解线平衡问题[J]. 机械工程学报, 2023, 59(7): 355-366.
[5] Deng, X., Dong, Z. and Ding, J. (2024) UAV Confrontation and Evolutionary Upgrade Based on Multi-Agent Reinforcement Learning. Drones, 8, Article No. 368. [Google Scholar] [CrossRef
[6] 尹帅, 余建慧, 宋斌, 等. 基于多种群混沌遗传算法的GEO目标服务任务规划[J]. 系统工程与电子技术, 2024, 46(3): 914-921.
[7] 陈芸芸. 基于量子遗传算法优化神经网络的研究及肿瘤诊断应用[D]: [硕士学位论文]. 兰州: 兰州交通大学, 2024.
[8] 蒋林利. 量子遗传算法研究现状综述[J]. 广西科技师范学院学报, 2016, 31(2): 130-134.
[9] Gu, S., Yang, L., Du, Y., et al. (2024) A Review of Safe Reinforcement Learning: Methods, Theories and Applications. IEEE Transactions on Pattern Analysis and Machine Intelligence. [Google Scholar] [CrossRef
[10] 陆涛, 管荑, 贾鹏, 等. 基于种群混合迁移策略的并行量子遗传算法[J]. 计算机工程与设计, 2024, 45(8): 2386-2392.
[11] Yuan, Y., Jiang, C., Wu, M., et al. (2024) Non-Uniform Optical Phased Array Based on Dual-Adaption Genetic Algorithm Improved by Chaos Sequence. Optics and Lasers in Engineering, 183, Article ID: 108500. [Google Scholar] [CrossRef
[12] 王少琦, 降爱莲, 马建芬. 自适应注意力引导的LDCT图像去噪条件扩散模型[J/OL]. 计算机工程与应用, 2024: 1-14.
http://kns.cnki.net/kcms/detail/11.2127.TP.20240912.1721.023.html, 2024-09-14.
[13] 倪梁琪琳. 基于混合支配的两阶段多目标进化算法平衡分离机制研究[D]: [硕士学位论文]. 西安: 西安理工大学, 2024.
[14] 冯敏, 章少辉, 白美健, 等. 基于基因表达式编程算法的数据驱动型平板闸门流量计算方法[J]. 节水灌溉, 2024(7): 88-94.
[15] 雷艳菊. 基于遗传算法和截断凝聚SGD训练WGAN [D]: [硕士学位论文]. 大连: 大连理工大学, 2021.
[16] 李欣. 基于聚类的多目标演化算法交配限制策略研究[D]: [博士学位论文]. 哈尔滨: 哈尔滨工业大学, 2020.