基于双流卷积多注意力模型的行人意图识别研究
Research on Pedestrian Intention Recognition Based on Dual-Stream Convolutional Multi-Attention Model
摘要: 识别行人等弱势道路使用者的行为意图是自动驾驶汽车做出有效决策和控制动作保护行人和驾驶者安全的前提。本文设计了一种基于双流结构融合时空特征的行人过街意图识别模型(Dual-stream Convolutional Multi-Attention Model, DCMAM)。基于MobileNet引入空间注意力设计空间流卷积模块;基于膨胀3D卷积网络(Inflated 3D ConvNet, I3D)引入时空和空洞卷积设计时间流卷积模块;基于门控循环单元(Gate Recurrent Unit, GRU)搭建双向GRU网络,捕获时空交互信息;引入注意力机制设计双流融合模块。在数据集JAAD和PIE上的实验证明了模型的有效性,意图识别准确率相较于现有方法提高了7%。集成意图识别模型和硬件平台设计行人意图识别系统,通过实车实验验证了意图识别系统的稳定性和准确性。
Abstract: For autonomous vehicles to effectively make decisions and control actions to ensure the safety of pedestrians and drivers, they must be able to recognize the behavioral intention of vulnerable road users, such as pedestrians. This paper designs a pedestrian crossing intention recognizemodel (Du-al-stream Convolutional Multi-Attention Model) based on the fusion of spatiotemporal features of the dual-stream network structure. Introducing spatial attention based on MobileNet to create the spatial flow convolution module; designing the time Stream convolution module by adding spa-tio-temporal convolution and atrous convolution based on Inflated 3D ConvNet (I3D); a bidirectional GRU network is constructed based on Gate Recurrent Unit (GRU) to capture spatio-temporal inter-action information. The attention mechanism is introduced to design the dual-stream fusion module. Comparative experiments on the datasets JAAD and PIE demonstrate the effectiveness of the pro-posed method, with a 7% improvement in intention recognition accuracy compared to existing methods. Based on a hardware platform integrating an intention recognition network model, a pe-destrian intention recognition system is created. The stability and accuracy of the intention recog-nition system is verified through real vehicle experiments.
文章引用:张晓斐, 王孝兰. 基于双流卷积多注意力模型的行人意图识别研究[J]. 建模与仿真, 2023, 12(4): 3770-3780. https://doi.org/10.12677/MOS.2023.124345

参考文献

[1] Ahmed, S., Huda, M.N., Rajbhandari, S., et al. (2019) Pedestrian and Cyclist Detection and Intent Estimation for Autonomous Vehicles: A Survey. Applied Sciences, 9, Article 2335. [Google Scholar] [CrossRef
[2] 胡远志, 蒋涛, 刘西, 等. 基于双流自适应图卷积神经网络的行人过街意图识别[J]. 汽车安全与节能学报, 2022, 13(2): 325-332.
[3] 杨彪, 范福成, 杨吉成, 等. 基于动作预测与环境条件的行人过街意图识别[J]. 汽车工程, 2021, 43(7): 1066-1076.
[4] 曹昊天, 施惠杰, 宋晓琳, 等. 基于多特征融合的行人意图以及行人轨迹预测方法研究[J]. 中国公路学报, 2022, 35(10): 308-318.
[5] Chen, T., Tian, R.R. and Ding, Z.M. (2021) Visual Reasoning Using Graph Convolutional Networks for Pre-dicting Pedestrian Crossing Intention. Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, 11-17 October 2021, 3096-3102. [Google Scholar] [CrossRef
[6] Lorenzo, J., Parra, I., Wirth, F., et al. (2020) RNN-Based Pedestrian Crossing Prediction Using Activity and Pose-Related Features. 2020 IEEE Intelligent Vehi-cles Symposium (IV), Las Vegas, 19 October-13 November 2020, 1801-1806. [Google Scholar] [CrossRef
[7] Rasouli, A., Kotseruba, I. and Tsotsos, J.K. (2020) Pedestrian Action Anticipation Using Contextual Feature Fusion in Stacked RNNs. arXiv: 2005.06582.
[8] Kotseruba, I., Rasouli, A. and Tsotsos, J.K. (2021) Benchmark for Evaluating Pedestrian Action Prediction. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, 3-8 January 2021, 1257-1267. [Google Scholar] [CrossRef
[9] Yang, D.F., Zhang, H.L., Yurtsever, E., Redmill, K.A. and Özgüner, Ü. (2022) Predicting Pedestrian Crossing Intention with Feature Fusion and Spatio-Temporal Attention. IEEE Trans-actions on Intelligent Vehicles, 7, 221-230. [Google Scholar] [CrossRef
[10] Cordts, M., Omran, M., Ramos, S., et al. (2016) The Cityscapes Dataset for Semantic Urban Scene Understanding. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 3213-3223. [Google Scholar] [CrossRef
[11] Chen, L.C., Zhu, Y.K., Papan-dreou, G., Schroff, F. and Adam, H. (2017) Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv: 1706.05587. [Google Scholar] [CrossRef
[12] Ilg, E., Mayer, N., Saikia, T., et al. (2017) FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks. Proceedings of the IEEE Conference on Computer Vision and Pat-tern Recognition, Honolulu, 21-26 July 2017, 1647-1655. [Google Scholar] [CrossRef
[13] Howard, A.G., Zhu, M.L., Chen, B., et al. (2017) MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv: 1704.04861.
[14] Woo, S., Park, J., Lee, J.Y. and Kweon, I.S. (2018) CBAM: Convolutional Block Attention Module. 15th European Conference of Computer Vision—ECCV 2018, Munich, 8-14 September 2018, 3-19. [Google Scholar] [CrossRef
[15] Carreira, J. and Zisserman, A. (2017) Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 4724-4733. [Google Scholar] [CrossRef
[16] Luong, T., Pham, H. and Manning, C.D. (2015) Effective Approaches to Attention-Based Neural Machine Translation. arXiv: 1508.04025. [Google Scholar] [CrossRef
[17] Rasouli, A., Kotseruba, I. and Tsotsos, J.K. (2017) Are They Going to Cross? A Benchmark Dataset and Baseline for Pedestrian Crosswalk Behavior. 2017 IEEE International Conference on Computer Vi-sion Workshops (ICCVW), Venice, 22-29 October 2017, 206-213. [Google Scholar] [CrossRef
[18] Rasouli, A., Kotseruba, I., Kunic, T. and Tsotsos, J. (2019) PIE: A Large-Scale Dataset and Models for Pedestrian Intention Estimation and Trajectory Prediction. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 No-vember 2019, 6261-6270. [Google Scholar] [CrossRef
[19] Kotseruba, I., Rasouli, A. and Tsotsos, J.K. (2020) Do They Want to Cross? Understanding Pedestrian Intention for Behavior Prediction. 2020 IEEE Intelligent Vehicles Symposium (IV), Las Vegas, 19 October-13 November2020, 1688-1693. [Google Scholar] [CrossRef