基于向量特征的车辆轨迹预测
Vehicle Trajectory Prediction Based on Vector Features
摘要: 在复杂交通场景中轨迹预测是智能驾驶汽车中一个至关重要的问题,这是因为道路结构、车辆间相互作用、智能体移动状态和环境信息的难以表示。本文提出一种多层图神经网络,首先利用向量分别表示车道线、目标车辆等其他交通成员的空间局部特性,然后建模所有成员之间的高阶相互作用。目前,大多数方法将动态目标车辆的轨迹和道路结构环境信息的俯视图用卷积神经网络进行编码。而本文通过向量化表示高精度地图和智能体轨迹,解决了计算密集的卷积网络编码步骤。为了进一步提高向量化学习上下文特征能力,提出一种新的辅助任务根据上下文恢复随机掩码智能体特征。本文根据行为预测基准和ArgoVerse预测数据集对本文提出的算法进行评估。本文的方法表现了很好的性能,同时节省了70%的模型参数。它在ArgoVerse数据集上的表现也超过了其他方法。
Abstract: Trajectory prediction in complex traffic scenarios is a crucial problem in smart driving vehicles due to the difficulty of representing road structure, inter-vehicle interactions, intelligent body move-ment states and environmental information. In this paper, we propose a multilayer graph neural network that first uses vectors to represent the spatial local characteristics of other traffic members such as lane lines and target vehicles separately, and then models the higher-order interactions among all members. Currently, most methods encode the top view of dynamic target vehicle trajec-tories and road structure environment information with convolutional neural networks. In contrast, this paper solves the computationally intensive convolutional network coding step by vectorizing the representation of high-definition (HD) maps and intelligent body trajectories. To further im-prove the vectorization learning contextual feature capability, a new auxiliary task is proposed to recover random masked Agent features according to the context. The algorithm proposed in this paper is evaluated against a behavioral prediction benchmark and the ArgoVerse prediction da-taset. The approach in this paper achieves better performance on both benchmarks while saving 70% of the model parameters. It also outperforms other methods on the ArgoVerse dataset.
文章引用:徐鑫, 王孝兰. 基于向量特征的车辆轨迹预测[J]. 建模与仿真, 2023, 12(3): 2712-2720. https://doi.org/10.12677/MOS.2023.123248

参考文献

[1] Chang, M.F., Lambert, J., Sangkloy, P., et al. (2019) Argoverse: 3D Tracking and Forecasting with Rich Maps. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 15-20 June 2019, 8748-8757. [Google Scholar] [CrossRef
[2] Krajewski, R., Bock, J., Kloeker, L., et al. (2018) The highD Dataset: A Drone Dataset of Naturalistic Vehicle Trajectories on German Highways for Validation of Highly Automated Driving Systems. 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Hawaii, 4-7 November 2018, 2118-2125. [Google Scholar] [CrossRef
[3] Casas, S., Luo, W. and Urtasun, R. (2018) IntentNet: Learning to Predict Intention from Raw Sensor Data. Conference on Robot Learning, PMLR, Tokyo, 17-20 September 2018, 947-956.
[4] Hong, J., Sapp, B. and Philbin, J. (2019) Rules of the Road: Predicting Driving Behavior with a Convolutional Model of Semantic In-teractions. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 15-20 June 2019, 8454-8462. [Google Scholar] [CrossRef
[5] Chai, Y., Sapp, B., Bansal, M., et al. (2019) MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction. Proceedings of the Conference on Robot Learn-ing, PMLR, Vol. 100, 86-99.
[6] Rhinehart, N., McAllister, R., Kitani, K., et al. (2019) Precog: Prediction Conditioned on Goals in Visual Multi-Agent Settings. Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, 20-26 October 2019, 2821-2830. [Google Scholar] [CrossRef
[7] Alahi, A., Goel, K., Ramanathan, V., et al. (2016) Social LSTM: Human Trajectory Prediction in Crowded Spaces. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 26 June-1 July 2016, 961-971.
[8] Zhao, J., Li, J., Cheng, Y., et al. (2018) Un-derstanding Humans in Crowded Scenes: Deep Nested Adversarial Learning and a New Benchmark for Multi-Human Parsing. Proceedings of the 26th ACM International Conference on Multimedia. Romania, Alba, 6-9 June 2018, 792-800. [Google Scholar] [CrossRef
[9] Sun, C., Shrivastava, A., Vondrick, C., et al. (2019) Relational Action Forecasting. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 15-20 June 2019, 273-283. [Google Scholar] [CrossRef
[10] Felsen, P., Agrawal, P. and Malik, J. (2017) What Will Happen Next? Forecasting Player Moves in Sports Videos. Proceedings of the IEEE International Conference on Computer Vision, Venice, 24-27 October 2017, 3342-3351. [Google Scholar] [CrossRef
[11] Yeh, R.A., Schwing, A.G., Huang, J., et al. (2019) Diverse Generation for Multi-Agent Sports Games. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 15-20 June 2019, 4610-4619. [Google Scholar] [CrossRef
[12] Zhan, E., Zheng, S., Yue, Y., et al. (2018) Generative Multi-Agent Be-havioral Cloning.
[13] Gupta, A., Johnson, J., et al. (2018) Social GAN: Socially Acceptable Trajectories with Generative Ad-versarial Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Alabama, 18-23 June 2018, 2255-2264. [Google Scholar] [CrossRef
[14] Sun, C., Karlsson, P., Wu, J., et al. (2019) Stochastic Prediction of Multi-Agent Interactions from Partial Observations.
[15] Battaglia, P.W., Hamrick, J.B., Bapst, V., et al. (2018) Relational Inductive Biases, Deep Learning, and Graph Networks.
[16] Chung, J., Kastner, K., Dinh, L., et al. (2015) A Recur-rent Latent Variable Model for Sequential Data. Computer Science, 35, 1340-1353. [Google Scholar] [CrossRef] [PubMed]
[17] Kipf, T., Fetaya, E., Wang, K.C., et al. (2018) Neural Relational Inference for Inter-acting Systems. International Conference on Machine Learning, PMLR, Chengdu, 16-18 July 2018, 2688-2697.
[18] Hoshen, Y. (2017) Vain: Attentional Multi-Agent Predictive Modeling. 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, 4-9 December 2017, 2698-2708.
[19] 范丽丽, 赵宏伟, 赵浩宇, 等. 基于深度卷积神经网络的目标检测研究综述[J]. 光学精密工程, 2020, 28(5): 1152-1164.
[20] Qi, C.R., Su, H., Mo, K., et al. (2017) Pointnet: Deep Learning on Point Sets for 3D Classification and Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, 21-26 July 2017, 652-660.
[21] Qi, C.R., Yi, L., Su, H., et al. (2017) Pointnet++: Deep Hierar-chical Feature Learning on Point Sets in a Metric Space. 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, 4-9 December 2017, 5105-5114.
[22] Devlin, J., Chag, M.W., Lee, K., et al. (2018) Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding.
[23] Radford, A., Wu, J., Child, R., et al. (2019) Language Models Are Unsupervised Multitask Learners. OpenAI Blog, 1, 9.
[24] Cui, H., Radosavljevic, V., Chou, F.C., et al. (2019) Multimodal Trajectory Predictions for Autonomous Driving Using Deep Convolutional Networks. 2019 International Confer-ence on Robotics and Automation (ICRA), Montreal, 20-24 May 2019, 2090-2096. [Google Scholar] [CrossRef
[25] Sun, C., Myers, A., Vondrick, C., et al. (2019) Videobert: A Joint Model for Video and Language Representation Learning. Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, 20-26 October 2019, 7464-7473. [Google Scholar] [CrossRef