融合多核图卷积与注意力池化的行人过街意图预测
Pedestrian Crossing Intention Prediction with Multi-Kernel Graph Convolution and Attention Pooling
摘要: 自动驾驶场景中,行人过街意图的准确预测是保障行驶安全和提升交互效率的重要问题。针对行人动态行为复杂且与环境高度交互的特点,提出了一种基于多核图卷积与注意力池化的行人过街意图预测模型(Multi-kernel Attention Graph Network for Pedestrian Crossing Intention, MAGNet-PCI)。该模型采用并行的多分支时空编码器,利用多核时空图卷积模块(Multi-kernel Spatio-Temporal Graph Convolution, MKGC-ST)从多角度提取行人骨架序列中的动态特征。为减轻特征展平过程中的信息丢失问题,引入注意力池化机制(Attention Pooling Transformer, APT),通过节点选择与多头注意力聚合生成结构感知的图级表示,用于意图分类。在公开的JAAD和PIE数据集上的实验结果表明,该方法在准确率上分别较PedGNN提升3%和10%。消融实验进一步验证了并行多分支结构、多核卷积机制及注意力池化模块的有效性。
Abstract: Accurate prediction of pedestrian crossing intention is critical for driving safety and interaction efficiency in autonomous driving. To address the complexity of pedestrian dynamics and strong interactions with the environment, a Multi-kernel Attention Graph Network for Pedestrian Crossing Intention (MAGNet-PCI) is proposed. The model employs a parallel multi-branch spatio-temporal encoder, where the Multi-kernel Spatio-Temporal Graph Convolution (MKGC-ST) module extracts motion features from pedestrian skeleton sequences from multiple perspectives. To mitigate information loss during feature flattening, an Attention Pooling Transformer (APT) mechanism is introduced. It selects key joints through graph convolution and aggregates global context with multi-head attention, generating structure-aware graph-level representations for intention classification. Experiments on the JAAD and PIE datasets show that the proposed method achieves accuracy improvements of 3% and 10% over PedGNN, respectively. Ablation studies further verify the effectiveness of the parallel multi-branch structure, multi-kernel convolution module, and attention pooling mechanism.
文章引用:周兴鹏. 融合多核图卷积与注意力池化的行人过街意图预测[J]. 计算机科学与应用, 2025, 15(11): 220-233. https://doi.org/10.12677/csa.2025.1511299

参考文献

[1] Fang, J., Wang, F., Xue, J. and Chua, T. (2024) Behavioral Intention Prediction in Driving Scenes: A Survey. IEEE Transactions on Intelligent Transportation Systems, 25, 8334-8355. [Google Scholar] [CrossRef
[2] Razali, H., Mordan, T. and Alahi, A. (2021) Pedestrian Intention Prediction: A Convolutional Bottom-Up Multi-Task Approach. Transportation Research Part C: Emerging Technologies, 130, Article 103259. [Google Scholar] [CrossRef
[3] 吕伟, 郭伏, 刘莉, 等. 行人与自动驾驶汽车的交互研究[J]. 中国机械工程, 2023, 34(5): 515-523.
[4] Rasouli, A., Kotseruba, I., Kunic, T. and Tsotsos, J. (2019) PIE: A Large-Scale Dataset and Models for Pedestrian Intention Estimation and Trajectory Prediction. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October 2019-2 November 2019, 6262-6271. [Google Scholar] [CrossRef
[5] Ning, C., Menglu, L., Hao, Y., Xueping, S. and Yunhong, L. (2021) Survey of Pedestrian Detection with Occlusion. Complex & Intelligent Systems, 7, 577-587. [Google Scholar] [CrossRef
[6] 陈龙, 杨晨, 蔡英凤, 等. 基于多模态特征融合的行人穿越意图预测方法[J]. 汽车工程, 2023, 45(10): 1779-1790.
[7] Schneider, N. and Gavrila, D.M. (2013) Pedestrian Path Prediction with Recursive Bayesian Filters: A Comparative Study. In: Weickert, J., Hein, M. and Schiele, B., Eds., German Conference on Pattern Recognition, Springer, 174-183. [Google Scholar] [CrossRef
[8] Alahi, A., Goel, K., Ramanathan, V., Robicquet, A., Li, F.-F. and Savarese, S. (2016) Social LSTM: Human Trajectory Prediction in Crowded Spaces. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 Jun 2016, 961-971. [Google Scholar] [CrossRef
[9] Achaji, L., Moreau, J., Fouqueray, T., Aioun, F. and Charpillet, F. (2022) Is Attention to Bounding Boxes All You Need for Pedestrian Action Prediction? 2022 IEEE Intelligent Vehicles Symposium (IV), Aachen, 4-9 June 2022, 895-902. [Google Scholar] [CrossRef
[10] Ahmed, S., Bazi, A.A., Saha, C., Rajbhandari, S. and Huda, M.N. (2023) Multi-Scale Pedestrian Intent Prediction Using 3D Joint Information as Spatio-Temporal Representation. Expert Systems with Applications, 225, Article 120077. [Google Scholar] [CrossRef
[11] Shi, L., Zhang, Y., Cheng, J. and Lu, H. (2020) Skeleton-Based Action Recognition with Multi-Stream Adaptive Graph Convolutional Networks. IEEE Transactions on Image Processing, 29, 9532-9545. [Google Scholar] [CrossRef] [PubMed]
[12] Yan, S., Xiong, Y. and Lin, D. (2018) Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 32, 7444-7452. [Google Scholar] [CrossRef
[13] Riaz, M.N., Wielgosz, M., Romera, A.G. and López, A.M. (2023) Synthetic Data Generation Framework, Dataset, and Efficient Deep Model for Pedestrian Intention Prediction. 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), Bilbao, 24-28 September 2023, 2742-2749. [Google Scholar] [CrossRef
[14] Kotseruba, I., Rasouli, A. and Tsotsos, J.K. (2021) Benchmark for Evaluating Pedestrian Action Prediction. 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, 3-8 January 2021, 1258-1268. [Google Scholar] [CrossRef
[15] 杨彪, 韦智文, 倪蓉蓉, 等. 基于动作条件交互的高效行人过街意图预测[J]. 汽车工程, 2024, 46(1): 29-38.
[16] Yang, D., Zhang, H., Yurtsever, E., Redmill, K.A. and Ozguner, U. (2022) Predicting Pedestrian Crossing Intention with Feature Fusion and Spatio-Temporal Attention. IEEE Transactions on Intelligent Vehicles, 7, 221-230. [Google Scholar] [CrossRef
[17] Cao, Z., Hidalgo, G., Simon, T., Wei, S. and Sheikh, Y. (2019) Openpose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 172-186. [Google Scholar] [CrossRef] [PubMed]
[18] Fang, H., Li, J., Tang, H., Xu, C., Zhu, H., Xiu, Y., et al. (2023) Alphapose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45, 7157-7173. [Google Scholar] [CrossRef] [PubMed]
[19] Fang, Z. and Lopez, A.M. (2020) Intention Recognition of Pedestrians and Cyclists by 2D Pose Estimation. IEEE Transactions on Intelligent Transportation Systems, 21, 4773-4783. [Google Scholar] [CrossRef
[20] Bruna, J., Zaremba, W., Szlam, A., et al. (2025) Spectral Networks and Locally Connected Networks on Graphs. arXiv:1312.6203
https://arxiv.org/abs/1312.6203
[21] Kipf, T.N. and Welling, M. (2017) Semi-Supervised Learning with Graph Convolutional Networks. International Conference on Learning Representations (ICLR), Toulon, 24-26 April 2017.
[22] Kipf, T.N. (2025) Semi-Supervised Classification with Graph Convolutional Networks. arXiv:1609.02907
https://arxiv.org/abs/1609.02907
[23] Hamilton, W., Ying, Z. and Leskovec, J. (2017) Inductive Representation Learning on Large Graphs. Advances in Neural Information Processing Systems (NIPS), Long Beach, 4-9 December 2017, 1025-1035.
[24] Velickovic, P., Cucurull, G., Casanova, A., et al. (2018) Graph Attention Networks. International Conference on Learning Representations (ICLR), Vancouver, 30 April-3 May 2018.
[25] Hong, X., Zhang, T., Cui, Z. and Yang, J. (2021) Variational Gridded Graph Convolution Network for Node Classification. IEEE/CAA Journal of Automatica Sinica, 8, 1697-1708. [Google Scholar] [CrossRef
[26] Zhang, H. and Xu, M. (2021) Graph Neural Networks with Multiple Kernel Ensemble Attention. Knowledge-Based Systems, 229, Article 107299. [Google Scholar] [CrossRef
[27] Zhou, K., Song, Q., Huang, X., Zha, D., Zou, N. and Hu, X. (2021) Multi-Channel Graph Neural Networks. Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 7-15 January 2021, 1352-1358. [Google Scholar] [CrossRef
[28] Lin, L. and Wang, H. (2020) Graph Attention Networks over Edge Content-Based Channels. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 23-27 August 2020, 1819-1827. [Google Scholar] [CrossRef
[29] Seo, Y., Defferrard, M., Vandergheynst, P. and Bresson, X. (2018) Structured Sequence Modeling with Graph Convolutional Recurrent Networks. In: Lecture Notes in Computer Science, Springer International Publishing, 362-373. [Google Scholar] [CrossRef
[30] Cadena, P.R.G., Yang, M., Qian, Y. and Wang, C. (2019) Pedestrian Graph: Pedestrian Crossing Prediction Based on 2D Pose Estimation and Graph Convolutional Networks. 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, 27-30 October 2019, 2000-2005. [Google Scholar] [CrossRef
[31] Cadena, P.R.G., Qian, Y., Wang, C. and Yang, M. (2022) Pedestrian Graph +: A Fast Pedestrian Crossing Prediction Model Based on Graph Convolutional Networks. IEEE Transactions on Intelligent Transportation Systems, 23, 21050-21061. [Google Scholar] [CrossRef
[32] 吕超, 崔格格, 孟相浩, 等. 基于图表示的智能车行人意图识别方法[J]. 北京理工大学学报自然版, 2022, 42(7): 688-695.
[33] Zhang, M., Cui, Z., Neumann, M., et al. (2018) An End-to-End Deep Learning Architecture for Graph Classification. Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, 2-7 February 2018, 2968-2975.
[34] Lee, J., Lee, I. and Kang, J. (2019) Self-Attention Graph Pooling. International Conference on Machine Learning (ICML), Long Beach, 9-15 June 2019, 3734-3743.
[35] Zhang, Z., Bu, J., Ester, M., et al. (2025) Hierarchical Graph Pooling with Structure Learning. arXiv:1911.05954
https://arxiv.org/abs/1911.05954
[36] Baek, J., Kang, M. and Hwang, S.J. (2025) Accurate Learning of Graph Representations with Graph Multiset Pooling. arXiv:2102.11533
https://arxiv.org/abs/2102.11533
[37] 胡远志, 蒋涛, 刘西, 等. 基于双流自适应图卷积神经网络的行人过街意图识别[J]. 汽车安全与节能学报, 2022, 13(2): 325-332.
[38] Gao, H. and Ji, S. (2019) Graph U-Nets. International Conference on Machine Learning (ICML), Long Beach, 9-15 Jun 2019, 2083-2092.
[39] 桑海峰, 刘玉龙, 刘泉恺. 基于混合注意力机制的多信息行人过街意图预测[J]. 控制与决策, 2024, 39(12): 3946-3954.
[40] Ying, Z., You, J., Morris, C., et al. (2018) Hierarchical Graph Representation Learning with Differentiable Pooling. Advances in Neural Information Processing Systems, 31, 4805-4815.
[41] Simonyan, K. and Zisserman, A. (2014) Two-Stream Convolutional Networks for Action Recognition in Videos. Advances in Neural Information Processing Systems, 27, 568-567.
[42] Kotseruba, I., Rasouli, A. and Tsotsos, J.K. (2020) Do They Want to Cross? Understanding Pedestrian Intention for Behavior Prediction. 2020 IEEE Intelligent Vehicles Symposium (IV), Las Vegas, 19 October 2020-13 November 2020, 1688-1693. [Google Scholar] [CrossRef
[43] Rasouli, A., Kotseruba, I. and Tsotsos, J.K. (2025) Pedestrian Action Anticipation Using Contextual Feature Fusion in Stacked RNNs. arXiv:2005.06582.
https://arxiv.org/abs/2005.06582
[44] Xie, J., Zhao, Y., Meng, Y., Zhao, H., Nguyen, A. and Zheng, Y. (2025) Are Spatial-Temporal Graph Convolution Networks for Human Action Recognition Over-Parameterized? 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 10-17 June 2025, 24309-24319. [Google Scholar] [CrossRef
[45] Diao, Y., Wu, B., Zhang, R., et al. (2025) TASAR: Transfer-Based Attack on Skeletal Action Recognition. arXiv:2409.02483
https://arxiv.org/abs/2409.02483
[46] Zhang, X., Angeloudis, P. and Demiris, Y. (2022) ST Crossingpose: A Spatial-Temporal Graph Convolutional Network for Skeleton-Based Pedestrian Crossing Intention Prediction. IEEE Transactions on Intelligent Transportation Systems, 23, 20773-20782. [Google Scholar] [CrossRef
[47] Shi, X., Chen, Z., Wang, H., et al. (2015) Convolutional LSTM Network: A Machine Learning Approach for Precipitation Now-Casting. Advances in Neural Information Processing Systems, 28, 802-810.
[48] Gesnouin, J., Pechberti, S., Bresson, G., Stanciulescu, B. and Moutarde, F. (2020) Predicting Intentions of Pedestrians from 2D Skeletal Pose Sequences with a Representation-Focused Multi-Branch Deep Learning Network. Algorithms, 13, Article 331. [Google Scholar] [CrossRef
[49] Rasouli, A., Kotseruba, I. and Tsotsos, J.K. (2017) Are They Going to Cross? A Benchmark Dataset and Baseline for Pedestrian Crosswalk Behavior. 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), Venice, 22-29 October 2017, 206-213. [Google Scholar] [CrossRef