RCAT:面向拥挤人群导航的机器人中心注意力时空建模方法
RCAT: A Spatial-Temporal Modeling Method for Robot Central Attention in Crowd Navigation
DOI: 10.12677/airr.2025.146119, PDF,    国家自然科学基金支持
作者: 吕龙凤, 潘为刚:山东交通学院轨道交通学院,山东 济南;薛秉鑫:山东交通学院信息科学与电气工程学院,山东 济南
关键词: 机器人导航人群交互建模深度强化学习跨注意力机制时空TransformerRobot Navigation Crowd Interaction Modeling Deep Reinforcement Learning Cross Attention Mechanism Spatiotemporal Transformer
摘要: 在拥挤人群环境中实现安全、高效的自主导航是智能机器人面临的核心挑战。现有基于强化学习和自注意力机制的方法在时空建模方面已有一定成效,但往往缺乏对“机器人中心视角”的关注,导致机器人在复杂场景下难以准确聚焦于关键个体。为此,本文提出了一种机器人中心交叉注意力串行时空Transformer (RCAT)方法。该方法首先通过空间模块和时间模块串行建模行人的全局交互与时序动态,随后引入跨注意力机制,以机器人状态为查询向量,从时空特征中筛选与任务最相关的个体信息,从而实现更加任务导向的人群建模。在二维仿真环境中的实验结果表明,RCAT相比SARL方法,在不同人群密度下均表现出更高的成功率、更低的碰撞率以及更短的平均到达时间,并在累积奖励上取得显著优势。研究结果验证了RCAT在复杂人群导航任务中的安全性、效率和鲁棒性。
Abstract: Realizing safe and efficient autonomous navigation in crowded environments is the core challenge faced by intelligent robots. The existing methods based on reinforcement learning and self attention mechanisms have achieved certain results in spatiotemporal modeling, but often lack attention to the “robot central perspective”, which makes it difficult for robots to accurately focus on key individuals in complex scenes. Therefore, this article proposes a robot center cross attention serial spatiotemporal Transformer (RCAT) method. This method first models the global interaction and temporal dynamics of pedestrians in series through spatial and temporal modules. Then, a cross attention mechanism is introduced, using the robot state as the query vector, to filter individual information most relevant to the task from spatiotemporal features, thereby achieving more task oriented crowd modeling. The experimental results in a two-dimensional simulation environment show that RCAT exhibits higher success rates, lower collision rates, and shorter average arrival times compared to SARL methods under different population densities, and achieves significant advantages in cumulative rewards. The research results validated the safety, efficiency, and robustness of RCAT in complex crowd navigation tasks.
文章引用:吕龙凤, 潘为刚, 薛秉鑫. RCAT:面向拥挤人群导航的机器人中心注意力时空建模方法[J]. 人工智能与机器人研究, 2025, 14(6): 1268-1275. https://doi.org/10.12677/airr.2025.146119

参考文献

[1] Zhang, J. and Tao, D. (2021) Empowering Things with Intelligence: A Survey of the Progress, Challenges, and Opportunities in Artificial Intelligence of Things. IEEE Internet of Things Journal, 8, 7789-7817. [Google Scholar] [CrossRef
[2] Mnih, V., et al. (2013) Playing Atari with Deep Reinforcement Learning. arXiv: 1312.5602.
[3] Hu, H., Zhang, K., Tan, A.H., Ruan, M., Agia, C.G. and Nejat, G. (2021) A Sim-To-Real Pipeline for Deep Reinforcement Learning for Autonomous Robot Navigation in Cluttered Rough Terrain. IEEE Robotics and Automation Letters, 6, 6569-6576. [Google Scholar] [CrossRef
[4] Mirowski, P., et al. (2017) Learning to Navigate in Complex Environments. arXiv: 1611.03673.
[5] Mirowski, P., et al. (2018) Learning to Navigate in Cities without a Map. Advances in Neural Information Processing Systems, 31, 2419-2430.
[6] Han, R., Chen, S., Wang, S., Zhang, Z., Gao, R., Hao, Q., et al. (2022) Reinforcement Learned Distributed Multi-Robot Navigation with Reciprocal Velocity Obstacle Shaped Rewards. IEEE Robotics and Automation Letters, 7, 5896-5903. [Google Scholar] [CrossRef
[7] Josef, S. and Degani, A. (2020) Deep Reinforcement Learning for Safe Local Planning of a Ground Vehicle in Unknown Rough Terrain. IEEE Robotics and Automation Letters, 5, 6748-6755. [Google Scholar] [CrossRef
[8] Cimurs, R., Suh, I.H. and Lee, J.H. (2022) Goal-Driven Autonomous Exploration through Deep Reinforcement Learning. IEEE Robotics and Automation Letters, 7, 730-737. [Google Scholar] [CrossRef
[9] Mnih, V., et al. (2016) Asynchronous Methods for Deep Reinforcement Learning. Proceedings of the International Conference on Machine Learning (ICML), New York, 19-24 June 2016, 1928-1937.
[10] Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., et al. (2015) Human-Level Control through Deep Reinforcement Learning. Nature, 518, 529-533. [Google Scholar] [CrossRef] [PubMed]
[11] Chen, Y.F., Liu, M., Everett, M. and How, J.P. (2017) Decentralized Non-Communicating Multiagent Collision Avoidance with Deep Reinforcement Learning. 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May-3 June 2017, 285-292. [Google Scholar] [CrossRef
[12] Chen, Y.F., Everett, M., Liu, M. and How, J.P. (2017) Socially Aware Motion Planning with Deep Reinforcement Learning. 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, 24-28 September 2017, 1343-1350. [Google Scholar] [CrossRef
[13] Everett, M., Chen, Y.F. and How, J.P. (2018. Motion Planning among Dynamic, Decision-Making Agents with Deep Reinforcement Learning. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, 1-5 October 2018, 3052-3059. [Google Scholar] [CrossRef
[14] Chen, C., Liu, Y., Kreiss, S. and Alahi, A. (2019) Crowd-Robot Interaction: Crowd-Aware Robot Navigation with Attention-Based Deep Reinforcement Learning. 2019 International Conference on Robotics and Automation (ICRA), Montreal, 20-24 May 2019, 6015-6022. [Google Scholar] [CrossRef
[15] Liu, S., Chang, P., Liang, W., Chakraborty, N. and Driggs-Campbell, K. (2021) Decentralized Structural-RNN for Robot Crowd Navigation with Deep Reinforcement Learning. 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, 30 May-5 June 2021, 3517-3524. [Google Scholar] [CrossRef
[16] Yang, Y., Jiang, J., Zhang, J., Huang, J. and Gao, M. (2023) ST2: Spatial-Temporal State Transformer for Crowd-Aware Autonomous Navigation. IEEE Robotics and Automation Letters, 8, 912-919. [Google Scholar] [CrossRef