基于强化学习QMIX的多机器人区域覆盖策略
Multi-Robot Area Coverage Strategy Based on Reinforcement Learning QMIX
摘要: 未知环境下的多机器人区域覆盖是指多个机器人遍历环境中每个无障碍物的区域。机器人区域覆盖作为多机器人系统研究的重要组成部分,在灾后救援、野外勘测、森林防火等众多领域有着广泛的应用,具有十分重要的研究意义。传统的多机器人覆盖方法需要考虑区域分割、任务分配等问题,且没有协同策略的覆盖方法只是单个机器人方法的简单叠加。而在强化学习中机器人可以通过自主学习的方式求得问题可行解。本文将多机器人区域覆盖问题转换为多机器人强化学习中团队奖励值最大化的求解问题,搭建了基于Actor-Critic结构的多机器人强化学习网络,考虑到机器人个体行为对环境造成的不平稳问题,选择考虑了全局信息的QMIX网络作为多机器人行为的评价网络。最后设计了强化学习与仿真环境端到端的数据交互接口,简化了训练数据交互过程。算法训练结果表明本文提出的算法能达到较高的覆盖率,验证了该算法解决区域覆盖任务问题的有效性和可行性。
Abstract:
Multi-robot area coverage in unknown environment refers to multiple robots traversing every obstruction-free area in the environment. As an important part of multi-robot system research, robot area coverage has been widely used in many fields, such as post-disaster rescue, field survey, forest fire prevention, and so on, and has very important research significance. Traditional multi-robot coverage methods need to consider regional segmentation, task allocation and other problems, and the coverage method without collaborative strategy is just a simple superposition of a single robot method. In reinforcement learning, the robot can obtain feasible solutions through autonomous learning. In this paper, the multi-robot area coverage problem is transformed into a solution problem of maximizing team reward value in multi-robot reinforcement learning, and a multi-robot reinforcement learning network based on Actor-Critic structure is built. Considering the instability of individual robot behavior on the environment, QMIX network considering global information is selected as the evaluation network of multi-robot behavior. Finally, the end-to-end data interaction interface between reinforcement learning and simulation environment is designed to simplify the training data interaction process. The algorithm training results show that the proposed algorithm can achieve a higher coverage rate, and verify its effectiveness and feasibility in solving the problem of regional coverage task.
参考文献
|
[1]
|
Walker, J. (2019) Search and Rescue Robots–Current Applications on Land, Sea, and Air.
|
|
[2]
|
Merino, L., Caballero, F., Martínez-de-Dios, J.R., et al. (2012) An Unmanned Aircraft System for Automatic Forest Fire Monitoring and Measurement. Journal of Intelligent & Robotic Systems, 65, 533-548. [Google Scholar] [CrossRef]
|
|
[3]
|
Breitenmoser, A., Tâche, F., Caprari, G., et al. (2010) MagneBike: Toward Multi Climbing Robots for Power Plant Inspection. In: Proceedings of the 9th International Conference on Au-tonomous Agents and Multiagent Systems: Industry Track, International Foundation for Autonomous Agents and Multi-agent Systems, Richland, 1713-1720.
|
|
[4]
|
Knight, W. (2017) Drones and Robots Are Taking over Industrial Inspection. MIT Technology Review.
|
|
[5]
|
DeBusk, W.M. (2010) Unmanned Aerial Vehicle Systems for Disaster Relief: Tornado Alley. AIAA Infotech@Aerospace 2010. [Google Scholar] [CrossRef]
|
|
[6]
|
Heydari, J., Saha, O. and Ga-napathy, V. (2021) Reinforcement Learning-Based Coverage Path Planning with Implicit Cellular Decomposition. arXiv:2110.09018.
|
|
[7]
|
Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J. and Whiteson, S. (2018, July). Qmix: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. International Conference on Machine Learning, 4295-4304.
|
|
[8]
|
Kan, X., Teng, H. and Karydis, K. (2020) Online Exploration and Coverage Planning in Unknown Obstacle-Cluttered Environments. IEEE Robotics and Automation Letters, 5, 5969-5976. [Google Scholar] [CrossRef]
|
|
[9]
|
Foerster, J., Farquhar, G., Afouras, T., et al. (2018) Counterfactual Multi-Agent Policy Gradients. Proceedings of the AAAI Conference on Artificial Intelligence, 32, 2974-2982. [Google Scholar] [CrossRef]
|