面向3D目标检测任务的数据增强方法研究进展
Research Progress of Data Augmentation Methods for 3D Object Detection
DOI: 10.12677/airr.2024.132023, PDF,    科研立项经费支持
作者: 魏梦婷, 苗 军*:北京信息科技大学,计算机学院,北京;邓永强, 梁 浩, 李娟娟:北京万集科技股份有限公司,北京;齐洪钢:中国科学院大学,计算机科学与技术学院,北京;崔国勤:北京中星微电子有限公司,北京
关键词: 3D点云数据增强目标检测3D Point Cloud Data Augmentation Object Detection
摘要: 基于深度学习的3D点云目标检测技术在自动驾驶、智慧工业等领域快速发展的过程中起到了关键性及支撑性作用。然而,由于3D点云覆盖空间广阔、数据稀疏的特点,为了实现更高精度的目标检测,需要对原始点云数据进行数据增强操作。目前,针对2D图像数据增强方法的研究较为广泛,但是面向3D点云数据的增强方法研究仍处于早期阶段。因此,本文旨在针对3D目标检测数据增强方法研究进展进行综述,首先介绍了3D目标检测的基本技术和流程,然后介绍并分析了面向3D目标检测任务的数据增强方法,具体分为三个类别,包括基于2D图像衍化而来的3D点云数据增强方法、针对3D点云设计的增强方法以及混合与创新型数据增强方法。最后讨论了该领域存在的挑战以及未来的发展方向,为未来该领域的研究人员提供参考。
Abstract: Deep learning-based 3D point cloud object detection technologies have played a crucial and supportive role in the rapid development of fields such as autonomous driving and smart industry. However, due to the vast coverage and sparse nature of 3D point clouds, data augmentation operations are necessary to achieve higher precision in object detection. Currently, there is extensive research on data augmentation methods for 2D images, but the study of augmentation methods for 3D point cloud data is still in its early stages. Therefore, this paper aims to provide a comprehensive review of the progress in data augmentation methods for 3D object detection. It first introduces the basic techniques and processes of 3D object detection, then presents and analyzes data augmentation methods for 3D object detection tasks, which are divided into three categories: methods derived from 2D image augmentation applied to 3D point clouds, methods designed specifically for 3D point clouds, and hybrid and innovative data augmentation methods. Finally, the paper discusses the challenges in this field and future directions for development, offering a reference for researchers in this area moving forward.
文章引用:魏梦婷, 苗军, 邓永强, 梁浩, 李娟娟, 齐洪钢, 崔国勤. 面向3D目标检测任务的数据增强方法研究进展[J]. 人工智能与机器人研究, 2024, 13(2): 213-226. https://doi.org/10.12677/airr.2024.132023

参考文献

[1] Qian, R., Lai, X. and Li, X. (2022) 3D Object Detection for Autonomous Driving: A Survey. Pattern Recognition, 130, Article 108796. [Google Scholar] [CrossRef
[2] Simon, M., Milz, S., Amende, K., et al. (2018) Complex-YOLO: Real-Time 3D Object Detection on Point Clouds. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, 16-17 June 2019, 1190-1199. [Google Scholar] [CrossRef
[3] Guo, Y., Wang, H., Hu, Q., et al. (2021) Deep Learning for 3D Point Clouds: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 4338-4364. [Google Scholar] [CrossRef
[4] Hou, J., Dai, A. and Niessner, M. (2019) 3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 4421-4425. [Google Scholar] [CrossRef
[5] Fanelli, G., Dantone, M., Gall, J., et al. (2013) Random Forests for Real Time 3D Face Analysis. International Journal of Computer Vision, 101, 437-458. [Google Scholar] [CrossRef
[6] Pontil, M. and Verri, A. (1998) Support Vector Machines for 3D Object Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 637-646. [Google Scholar] [CrossRef
[7] Rusu, R.B., Blodow, N., Marton, Z.C., et al. (2008) Aligning Point Cloud Views Using Persistent Feature Histograms. 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, Nice, 22-26 September 2008, 3384-3391. [Google Scholar] [CrossRef
[8] Zhou, Z., Zhao, C., Adolfsson, D., et al. (2021) NDT-Transformer: Large-Scale 3D Point Cloud Localisation Using the Normal Distribution Transform Representation. 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, 30 May-5 June 2021, 5654-5660. [Google Scholar] [CrossRef
[9] Zhou, Y. and Tuzel, O. (2018) VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 4490-4499. [Google Scholar] [CrossRef
[10] Liang, G., Zhao, X., Zhao, J., et al. (2023) MVCNN: A Deep Learning-Based Ocean-Land Waveform Classification Network for Single-Wavelength LiDAR Bathymetry. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 16, 656-674. [Google Scholar] [CrossRef
[11] Qi, C.R., Su, H., Mo, K., et al. (2017) PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 652-660.
[12] Qi, C.R., Yi, L., Su, H., et al. (2017) PointNet : Deep Hierarchical Feature Learning on Point Sets in a Metric Space. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December, 2017, 5105-5114.
[13] Phan, A.V., Nguyen, M.L., Nguyen, Y.L.H., et al. (2018) DGCNN: A Convolutional Neural Network over Large-Scale Labeled Graphs. Neural Networks, 108, 533-543. [Google Scholar] [CrossRef] [PubMed]
[14] Bodla, N., Singh, B., Chellappa, R., et al. (2017) Soft-NMS—Improving Object Detection with one Line of Code. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 5562-5570. [Google Scholar] [CrossRef
[15] Hu, H., Gu, J., Zhang, Z., et al. (2018) Relation Networks for Object Detection. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 3588-3597. [Google Scholar] [CrossRef
[16] Getreuer, P. (2012) Automatic Color Enhancement (ACE) and Its Fast Implementation. Image Processing on Line, 2, 266-277. [Google Scholar] [CrossRef
[17] Zhong, Z., Zheng, L., Kang, G., et al. (2020) Random Erasing Data Augmentation. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, Vancouver, 20-27 February 2024, 13001-13008. [Google Scholar] [CrossRef
[18] Jakubovitz, D. and Giryes, R. (2018) Improving DNN Robustness to Adversarial Attacks Using Jacobian Regularization. Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, 8-14 September 2018, 525-541. [Google Scholar] [CrossRef
[19] Lang, A.H., Vora, S., Caesar, H., et al. (2019) PointPillars: Fast Encoders for Object Detection from Point Clouds. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 12689-12697. [Google Scholar] [CrossRef
[20] Li, R., Li, X., Heng, P.-A., et al. (2020) PointAugment: An Auto-Augmentation Framework for Point Cloud Classification. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 6377-6386. [Google Scholar] [CrossRef
[21] Sun, P., Wang, W., Chai, Y., et al. (2021) RSN: Range Sparse Net for Efficient, Accurate LiDAR 3D Object Detection. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 5721-5730. [Google Scholar] [CrossRef
[22] Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2021) An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. Proceedings of the 9th International Conference on Learning Representations, Online, 3-7 May 2021, 1-21.
[23] Vaswani, A., Shazeer, N., Parmar, N., et al. (2017) Attention Is All You Need. Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, 4-9 December 2017, 1-11.
[24] Sun, P., Tan, M., Wang, W., et al. (2022) SWFormer: Sparse Window Transformer for 3D Object Detection in Point Clouds. Proceedings of the 17th European Conference on Computer Vision, Tel Aviv, 23-27 October 2022, 426-442. [Google Scholar] [CrossRef
[25] Zeng, A., Song, S., Niessner, M., et al. (2017) 3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 1802-1811. [Google Scholar] [CrossRef
[26] Santhakumar, K., et al. (2021) Exploring 2D Data Augmentation for 3D Monocular Object Detection. arXiv:2104.10786
[27] Tomasi, C. and Manduchi, R. (1998) Bilateral Filtering for Gray and Color Images. Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271), 7 January 1998, Bombay, 839-846.
[28] Zhao, H.-K., Osher, S. and Fedkiw, R. (2001) Fast Surface Reconstruction Using the Level Set Method. Proceedings IEEE Workshop on Variational and Level Set Methods in Computer Vision, Vancouver, 13 July 2001, 194-201.
[29] Khoury, M., Zhou, Q.-Y. and Koltun, V. (2017) Learning Compact Geometric Features. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 153-161. [Google Scholar] [CrossRef
[30] Shi, S., Wang, X. and Li, H. (2019) PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 770-779. [Google Scholar] [CrossRef
[31] Ma, W., Chen, J., Du, Q., et al. (2021) PointDrop: Improving Object Detection from Sparse Point Clouds via Adversarial Data Augmentation. 2020 25th International Conference on Pattern Recognition (ICPR), Milan, 10-15 January 2021, 10004-10009. [Google Scholar] [CrossRef
[32] Hu, J.S.K. and Waslander, S.L. (2021) Pattern-Aware Data Augmentation for LiDAR 3D Object Detection. 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), Indianapolis, 19-22 September 2021, 2703-2710. [Google Scholar] [CrossRef
[33] Zhao, Y., Birdal, T., Deng, H., et al. (2019) 3D Point Capsule Networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 1009-1018. [Google Scholar] [CrossRef
[34] Shi, S., Guo, C., Jiang, L., et al. (2020) PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 10526-10535. [Google Scholar] [CrossRef
[35] Wang, Y. and Solomon, J.M. (2019) Deep Closest Point: Learning Representations for Point Cloud Registration. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 3522-3531. [Google Scholar] [CrossRef
[36] Chen, Y., Hu, V.T., Gavves, E., et al. (2020) PointMixup: Augmentation for Point Clouds. Proceedings of the 16th European Conference on Computer Vision, Glasgow, 23-28 August 2020, 330-345. [Google Scholar] [CrossRef
[37] Zhang, J., Chen, L., Ouyang, B., et al. (2022) PointCutMix: Regularization Strategy for Point Cloud Classification. Neurocomputing, 505, 58-67. [Google Scholar] [CrossRef
[38] Xiao, A., Huang, J., Guan, D., et al. (2022) PolarMix: A General Data Augmentation Technique for LiDAR Point Clouds. arXiv:2208.00223
[39] Yan, Y., Mao, Y. and Li, B. (2018) SECOND: Sparsely Embedded Convolutional Detection. Sensors, 18, Article 3337. [Google Scholar] [CrossRef] [PubMed]
[40] Hu, X., Duan, Z., Huang, X., et al. (2023) Context-Aware Data Augmentation for LIDAR 3d Object Detection. 2023 IEEE International Conference on Image Processing (ICIP), Kuala Lumpur, 8-11 October 2023, 11-15. [Google Scholar] [CrossRef
[41] Chen, X., Ma, H., Wan, J., et al. (2017) Multi-View 3D Object Detection Network for Autonomous Driving. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 6526-6534. [Google Scholar] [CrossRef
[42] Qi, C.R., Liu, W., Wu, C., et al. (2018) Frustum PointNets for 3D Object Detection from RGB-D Data. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 918-927. [Google Scholar] [CrossRef
[43] Choi, J., Song, Y. and Kwak, N. (2021) Part-Aware Data Augmentation for 3D Object Detection in Point Cloud. 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, 27 September-1 October 2021, 3391-3397. [Google Scholar] [CrossRef
[44] Lehner, A., Gasperini, S., Marcos-Ramiro, A., et al. (2022) 3D-VField: Adversarial Augmentation of Point Clouds for Domain Generalization in 3D Object Detection. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 17274-17283. [Google Scholar] [CrossRef
[45] Leng, Z., Cheng, S., Caine, B., et al. (2022) PseudoAugment: Learning to Use Unlabeled Data for Data Augmentation in Point Clouds. Proceedings of the 17th European Conference on Computer Vision, Tel Aviv, 23-27 October 2022, 555-572. [Google Scholar] [CrossRef
[46] Cheng, S., Leng, Z., Cubuk, E.D., et al. (2020) Improving 3D Object Detection through Progressive Population Based Augmentation. Proceedings of the 16th European Conference on Computer Vision, Glasgow, 23-28 August 2020, 279-294. [Google Scholar] [CrossRef
[47] Leng, Z., Li, G., Liu, C., et al. (2023) Lidar Augment: Searching for Scalable 3D LiDAR Data Augmentations. 2023 IEEE International Conference on Robotics and Automation (ICRA), London, 29 May-2 June 2023, 7039-7045. [Google Scholar] [CrossRef
[48] Geiger, A., Lenz, P. and Urtasun, R. (2012) Are We Ready for Autonomous Driving? The KITTI Vision Benchmark Suite. 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, 16-21 June 2012, 3354-3361. [Google Scholar] [CrossRef
[49] Sun, P., Kretzschmar, H., Dotiwalla, X., et al. (2020) Scalability in Perception for Autonomous Driving: Waymo Open Dataset. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 2446-2454. [Google Scholar] [CrossRef
[50] Caesar, H., Bankiti, V., Lang, A.H., et al. (2020) nuScenes: A Multimodal Dataset for Autonomous Driving. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 11618-11628. [Google Scholar] [CrossRef
[51] Lewis, D.D. (1991) Evaluating Text Categorization. Proceedings of the Workshop on Speech and Natural Language, Pacific Grove, 19-22 February 1991, 312-318. [Google Scholar] [CrossRef
[52] Rezatofighi, H., Tsoi, N., Gwak, J., et al. (2019) Generalized Intersection over Union: A Metric and a Loss for Bounding Box Regression. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 658-666. [Google Scholar] [CrossRef
[53] Girshick, R., Donahue, J., Darrell, T., et al. (2014) Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 23-28 June 2014, 580-587. [Google Scholar] [CrossRef
[54] Ren, S., He, K., Girshick, R., et al. (2017) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1137-1149. [Google Scholar] [CrossRef
[55] Padilla, R., Passos, W.L., Dias, T.L.B., et al. (2021) A Comparative Analysis of Object Detection Metrics with a Companion Open-Source Toolkit. Electronics, 10, Article 279. [Google Scholar] [CrossRef
[56] Everingham, M., Van Gool, L., Williams, C.K.I., et al. (2010) The Pascal Visual Object Classes (VOC) Challenge. International Journal of Computer Vision, 88, 303-338. [Google Scholar] [CrossRef
[57] Singh, B. and Davis, L.S. (2018) An Analysis of Scale Invariance in Object Detection-SNIP. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 3578-3587. [Google Scholar] [CrossRef
[58] Lin, T.Y., Maire, M., Belongie, S., et al. (2014) Microsoft COCO: Common Objects in Context. Proceedings of the 13th European Conference on Computer Vision, Zurich, 6-12 September 2014, 740-755. [Google Scholar] [CrossRef