MS3D-Net:一种端到端的多传感器融合3D检测网络
MS3D-Net: An End-to-End Multi-Sensor Fusion 3D Detection Network
DOI: 10.12677/ORF.2023.133257, PDF,   
作者: 程家镯, 吴训成*, 相文彬, 吴玉坤:上海工程技术大学,机械与汽车工程学院,上海
关键词: 传感器融合高维表示3D-T门控递归Sensor Fusion Gowey Said 3D-T Gated Recursion
摘要: 随着自动驾驶技术的发展,对车辆环境的3D感知要求越来越高,而多传感器融合可以很好的满足这一要求。针对目前融合技术中存在的网络设计不系统、信息丢失过大和融合策略粗糙问题,本文设计了一种端到端的多传感器融合3D检测网络——MS3D-Net。为秉承系统设计理念找到最优的多模态融合层级,先提出了新的融合层次划分法,再基于Faster-Rcnn源架构的检测模型中通过控制变量法,找到了最适合的特征融合层级;为降低跨模态数据融合过程中的信息损失,设计了新的高维表示,并提出与之对应的融合方法3D-T;为提高融合策略的精细度,提高融合检测精度,受Long Short-Term Memory (LSTM)机制启发拓展设计了中晚期门控递归融合单元,同时为提升图像特征的提取效率,提出了CP卷积。最后在KITTI数据集上进行训练与验证,本文方法在提高检测精度的同时又保证了检测速度。
Abstract: With the development of autonomous driving technology, 3D perception of the vehicle environment is becoming more and more demanding, and multi-sensor fusion can meet this requirement very well. To address the problems of unsystematic network design, excessive information loss and rough fusion strategies in current fusion technologies, this paper designs an end-to-end multi-sensor fusion sensing network—MS3D-Net. In order to find the optimal multi-modal fusion hierarchy in adherence to the system design concept, a new fusion hierarchy division method is first proposed, and then the most suitable feature fusion level was found by the control variable method in the detection model based on the Faster-Rcnn architecture. In order to reduce the information loss during cross-modal data fusion, a new high-dimensional representation is designed, and proposes the corresponding fusion method 3D-T. In order to improve the fineness of the fusion strategy and increase the fusion detection accuracy, a mediumlate gated recursive fusion unit is extended and designed inspired by the long short-term memory (LSTM) mechanism. At the same time, in order to improve the efficiency of image feature extraction, CP convolution is proposed. Finally, the method is trained and validated on the KITTI dataset, and the detection speed is guaranteed while improving the detection accuracy.
文章引用:程家镯, 吴训成, 相文彬, 吴玉坤. MS3D-Net:一种端到端的多传感器融合3D检测网络[J]. 运筹与模糊学, 2023, 13(3): 2565-2583. https://doi.org/10.12677/ORF.2023.133257

参考文献

[1] Chai, Y., Sun, P., Ngiam, J., et al. (2021) To the Point: Efficient 3D Object Detection in the Range Image with Graph Convolution Kernels. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, 20-25 June 2021, 15995-16004. [Google Scholar] [CrossRef
[2] Chen, Y., Liu, S., Shen, X. and Jia, J.Y. (2019) Fast Point R-CNN. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October -2 November 2019, 9774-9783. [Google Scholar] [CrossRef
[3] Ge, R., Ding, Z., Hu, Y., et al. (2017) Real-Time Anchor-Free Single-Stage 3D Detection with IoU-Awareness. arXiv: 2107.14342.
[4] Liu, Z., Zhao, X., Huang, T., et al. (2020) Tanet: Robust 3D Object Detection from Point Clouds with Triple Attention. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 11677-11684. [Google Scholar] [CrossRef
[5] Mao, J., Niu, M., Bai, H., et al. (2021) Pyramid R-CNN: To-wards Better Performance and Adaptability for 3D Object Detection. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 2703- 2712. [Google Scholar] [CrossRef
[6] Chen, X., Ma, H., Wan, J., Li, B. and Xia, T. (2017) Multi-View 3D Object Detection Network for Autonomous Driving. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 6526-6534. [Google Scholar] [CrossRef
[7] Ku, J., Mozifian, M., Lee, J., et al. (2018) Joint 3D Proposal Generation and Object Detection from View Aggregation. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, 1-5 October 2018, 1-8. [Google Scholar] [CrossRef
[8] Liang, M., Yang, B., Chen, Y., Hu, R. and Urtasun, R. (2019) Multi-Task Multi-Sensor Fusion for 3D Object Detection. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 7337-7345. [Google Scholar] [CrossRef
[9] 钱晓明,黄宇轩,楼佩煌,孙天.基于多传感器融合的跟随AGV复合导引技术[J].农业机械学报,2022,53(01):14-22+ 32.
[10] Chadwick, S., Maddern, W. and Newman, P. (2019) Distant Vehicle Detection Using Radar and Vision. 2019 International Conference on Robotics and Automation (ICRA), Montreal, 20-24 May 2019, 8311-8317. [Google Scholar] [CrossRef
[11] Guan, D.Y., Cao, Y.P., Yang, J.X., Cao, Y.L. and Yang, M.Y. (2019) Fusion of Multispectral Data through Illumination-Aware Deep Neural Networks for Pedestrian De-tection. Information Fusion, 50, 148-157. [Google Scholar] [CrossRef
[12] Matti, D., Ekenel, H.K. and Thiran, J.P. (2017) Combining LiDAR Space Clustering and Convolutional Neural Networks for Pedestrian Detection. 2017 14th IEEE Interna-tional Conference on Advanced Video and Signal Based Surveillance (AVSS), Lecce, 29 August -1 September 2017, 1-6. [Google Scholar] [CrossRef
[13] Hu, M., Wang, S., Li, B., et al. (2021) Penet: Towards Precise and Efficient Image Guided Depth Completion. 2021 IEEE International Conference on Robotics and Au-tomation, Xi’an, 30 May -5 June 2021, 13656-13662. [Google Scholar] [CrossRef
[14] Imran, S., Liu, X. and Morris, D. (2021) Depth Completion with Twin Surface Extrapolation at Occlusion Boundaries. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 2583-2592. [Google Scholar] [CrossRef
[15] Wang, Y., Chao, W.L., Garg, D., et al. (2019) Pseu-do-Lidar from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 8437-8445. [Google Scholar] [CrossRef
[16] You, Y., Wang,Y., Chao, W.L., et al. (2019) Pseudo-Lidar + +: Accurate Depth for 3D Object Detection in Autonomous Driving. arXiv: 1906.06310.
[17] Ren, S.Q., He, K.M. and Girshick, R.B. and Sun, J. (2015) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv: 1506.01497.
[18] Dai, J., Li, Y., He, K., et al. (2016) R-FCN: Object Detection via Region-Based Fully Convolutional Networks. arXiv: 1605.06409.
[19] Liu, W., Anguelov, D., Erhan, D., et al. (2016) SSD: Single Shot Multibox Detector. In: Leibe, B., Matas, J., Sebe, N. and Welling, M., Eds., Computer Vi-sion—ECCV 2016, Springer, Cham, 21-37. [Google Scholar] [CrossRef
[20] Huang, J., Rathod, V., Sun, C., et al. (2017) Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors. 2017 IEEE Conference on Computer Vi-sion and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 3296-3297. [Google Scholar] [CrossRef
[21] Huang, T., Liu, Z., Chen, X., et al. (2020) Epnet: Enhancing Point Features with Image Semantics for 3D Object Detection. In: Vedaldi, A., Bischof, H., Brox, T. and Frahm, J.M., Eds., Computer Vision—ECCV 2020, Springer, Cham, 35-52. [Google Scholar] [CrossRef
[22] Deng, J., Shi, S., Li, P., et al. (2021) Voxel R-CNN: To-wards High Performance Voxel-Based 3D Object Detection. Proceedings of the AAAI Conference on Artificial In-telligence, 35, 1201-1209. [Google Scholar] [CrossRef
[23] Hochreiter, S. and Schmidhuber, J. (1997) Long Short-Term Memory. Neural Computation, 9, 1735-1780. [Google Scholar] [CrossRef] [PubMed]
[24] Gers, F.A., Schmidhuber, J. and Cummins, F. (2000) Learning to Forget: Continual Prediction with LSTM. Neural Computation, 12, 2451-2471. [Google Scholar] [CrossRef] [PubMed]
[25] Cho, K., Van Merriënboer, B., Gulcehre, C., et al. (2014) Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, 25-29 October 2014, 1724-1734. [Google Scholar] [CrossRef
[26] Fan, L., Xiong, X., Wang, F., Wang, N.Y. and Zhang, Z.X. (2021) RangeDet: In Defense of Range View for Lidar-Based 3D Object Detection. 2021 IEEE/CVF Interna-tional Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 2898-2907. [Google Scholar] [CrossRef