基于注意力机制的改进PointPillars三维目标检测
Improved Point Pillars 3D Object Detection Based on Attention Mechanism
DOI: 10.12677/AIRR.2023.124035, PDF,   
作者: 司文悦, 高 佼, 李国栋, 张吉卫:山东交通学院,轨道交通学院,山东 济南;山东省交通运输行业轨道交通安全技术与装备重点实验室,山东 济南
关键词: 三维点云目标检测注意力机制PointPillars3D Point Cloud Object Detection Attention Mechanism PointPillars
摘要: 针对传统三维点云目标检测算法对小目标检测精度低的弱点,提出一种基于空间注意力机制的改进PointPillars方法。首先,在pillar特征网络中增加点云特征表示来丰富特征编码,提高每个点的表征能力,其次,在伪图像上通过空间注意力机制重新计算编码后空间点的特征权重,增强算法特征提取能力,提高检测性能,最后,利用公开数据集KITTI对改进算法进行验证。实验结果表明,该方法能够准确地检测出小尺寸行人和骑行者目标,同时在大尺寸汽车目标检测上保持稳定性能。此外,在中等检测难度条件下,三维模式、鸟瞰图模式和平均方向相似度模式三个类别平均精度均值(mAP)分别达到了62.07%、68.85%和70.02%,较改进前算法均有较大提升。
Abstract: Aiming at the weaknesses of traditional 3D point cloud object detection algorithms with low detection accuracy for small objects, an improved PointPillars method based on spatial attention mechanism is proposed. Firstly, the point cloud feature representation is added to the pillar feature network to enrich the feature encoding and improve the representation ability of each point, secondly, the feature weights of the encoded spatial points are recalculated on the pseudo-image by the spatial attention mechanism, which enhances the algorithm’s feature extraction ability and improves the detection performance, and lastly, the improved algorithm is validated by using the publicly available dataset KITTI. The experimental results show that the method is able to accurately detect small-size pedestrian and cyclist object, while maintaining stable performance on large-size car object detection. In addition, the mean average precision (mAP) of the three categories of 3D mode, bird’s-eye view mode, and average orientation similarity mode reached 62.07%, 68.85%, and 70.02%, respectively, under the medium detection difficulty condition, which are all greatly improved over the pre-improvement algorithm.
文章引用:司文悦, 高佼, 李国栋, 张吉卫. 基于注意力机制的改进PointPillars三维目标检测[J]. 人工智能与机器人研究, 2023, 12(4): 319-327. https://doi.org/10.12677/AIRR.2023.124035

参考文献

[1] Alaba, S.Y. and Ball, J.E. (2022) A Survey on Deep-Learning-Based Lidar 3d Object Detection for Autonom, Ous Driving. Sensors, 22, Article 9577. [Google Scholar] [CrossRef] [PubMed]
[2] Qi, C.R., Su, H., Mo, K., et al. (2017) PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. Computer Vision and Pattern Recognition.
https://arxiv.org/abs/1612.00593
[3] Zhou, Y. and Tuzel, O. (2017) VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 4490-4499. [Google Scholar] [CrossRef
[4] Yan, Y., Mao, Y. and Li, B. (2018) Second: Sparsely Embedded Convolutional Detection. Sensors, 18, Article 3337. [Google Scholar] [CrossRef] [PubMed]
[5] Lang, A.H., Vora, S., Caesar, H., et al. (2018) PointPillars: Fast Encoders for Object Detection from Point Clouds. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 12689-12697. [Google Scholar] [CrossRef
[6] 陈德江, 余文俊, 高永彬. 基于改进PointPillars的激光雷达三维目标检测[J]. 激光与光电子学进展, 2023, 60(10): 447-453.
[7] Liu, W., Anguelov, D., Erhan, D., et al. (2016) SSD: Single Shot Multibox Detector. Computer Vision-ECCV 2016: 14th European Conference, Amsterdam, 11-14 October 2016, 21-37. [Google Scholar] [CrossRef
[8] Everingham, M.R., Eslami, S.M.A., Gool, L.J., et al. (2015) The Pascal Visual Object Classes Challenge. International Journal of Computer Vision, 111, 98-136. [Google Scholar] [CrossRef
[9] 詹为钦, 倪蓉蓉, 杨彪. 基于注意力机制的PointPillars+三维目标检测[J]. 江苏大学学报(自然科学版), 2020, 41(3): 268-273.
[10] Hu, J., Shen, L. and Sun, G. (2018) Squeeze-and-Excitation Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 7132-7141. [Google Scholar] [CrossRef
[11] Wang, Q., Wu, B., Zhu, P., et al. (2020) ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 13-19 June 2020, 11534-11542. [Google Scholar] [CrossRef
[12] Lin, T.Y., Dollár, P., Girshick, R., et al. (2017) Feature Pyramid Networks for Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 2117-2125. [Google Scholar] [CrossRef
[13] Wang, W., Xie, E., Song, X., et al. (2019) Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network. Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, 27 October 2019-2 November 2019, 8440-8449. [Google Scholar] [CrossRef
[14] Woo, S., Park, J., Lee, J.Y., et al. (2018) Cbam: Convolutional Block Attention Module. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., Computer Vision-ECCV 2018, Lecture Notes in Computer Science, Vol. 11211, Springer, Cham, 3-19. [Google Scholar] [CrossRef
[15] Tao, Z. and Su, J. (2022) Research on Object Detection Algorithm of 3D Point Cloud PointPillar Based on Attention Mechanism. 2022 China Automation Congress (CAC), Xiamen, 25-27 November 2022, 4382-4385. [Google Scholar] [CrossRef
[16] Geiger, A., Lenz, P. and Urtasun, R. (2012) Are We Ready for Autonomous Driving? The KITTI Vision Benchmark Suite. 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, 16-21 June 2012, 3354-3361 [Google Scholar] [CrossRef