融合DFC注意力与特征剪枝的小目标三维检测与跟踪方法研究
Research on 3D Detection and Tracking Method for Small Objects Integrating DFC Attention and Feature Pruning
摘要: 随着自动驾驶技术的逐步发展,车辆对环境中小目标的高效、准确感知(包括三维定位与持续跟踪)已成为其技术落地的关键。现有的视觉方法在目标尺度较小或发生部分遮挡时,由于像素信息稀疏、特征表达能力弱,三维检测精度和后续跟踪稳定性显著下降,难以满足复杂道路场景下的实时应用需求。针对上述问题,本文提出一种基于特征剪枝和可解耦全连接注意力(Decoupled Fully Connected Attention, DFC)机制的三维小目标检测与跟踪框架,在保持高实时性的同时提升小目标的三维检测与跟踪性能。首先,针对主干网络输出特征设计图像特征剪枝策略,对候选小目标区域进行深度挖掘以强化其表征能力。其次,在左右视图特征融合过程中引入硬件友好的DFC注意力机制,高效捕获长距离像素依赖并增强立体几何约束。最后,在检测网络输出的三维回归与分类结果基础上构建轻量级三维多目标跟踪模块,实现对小目标轨迹的准确关联与更新。为验证所提方法的有效性,我们在公开的基准数据集KITTI上进行了充分实验,与多种模型对比表明,该方法在小目标三维检测精度、实时性以及三维目标跟踪稳定性方面均取得了更优表现。
Abstract: With the gradual development of autonomous driving technology, the efficient and accurate perception of small objects in the environment by vehicles (including three-dimensional positioning and continuous tracking) has become the key to the implementation of this technology. The existing visual methods, when the object scale is small or partial occlusion occurs, due to sparse pixel information and weak feature expression ability, the 3D detection accuracy and subsequent tracking stability significantly decline, making it difficult to meet the real-time application requirements in complex road scenarios. In response to the above problems, this paper proposes a 3D small object detection and tracking framework based on feature pruning and DFC (Decoupled Fully Connected) attention mechanism, which improves the 3D detection and tracking performance of small objects while maintaining high real-time performance. Firstly, an image feature pruning strategy is designed for the output features of the backbone network, and the candidate small object regions are deeply mined to enhance their representation ability. Secondly, in the process of fusion of left and right view features, a hardware-friendly DFC attention mechanism is introduced to efficiently capture long-distance pixel dependencies and enhance stereo geometric constraints. Finally, based on the three-dimensional regression and classification results output by the detection network, a lightweight three-dimensional multi-object tracking module is constructed to accurately associate and update the trajectories of small objects. To verify the effectiveness of the proposed method, we conducted thorough experiments on the public benchmark dataset KITTI. Comparisons with multiple models show that this method achieves superior performance in terms of the three-dimensional detection accuracy of small objects, real-time performance, and the stability of three-dimensional object tracking.
文章引用:杨鹏成. 融合DFC注意力与特征剪枝的小目标三维检测与跟踪方法研究[J]. 人工智能与机器人研究, 2026, 15(1): 267-276. https://doi.org/10.12677/airr.2026.151026

参考文献

[1] You, Y., Wang, Y., Chao, W.L., et al. (2019) Pseudo-Lidar++: Accurate Depth for 3D Object Detection in Autonomous Driving.
[2] Pang, Y., Zhao, X., Xiang, T., Zhang, L. and Lu, H. (2022) Zoom in and out: A Mixed-Scale Triplet Network for Camouflaged Object Detection. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 2160-2170. [Google Scholar] [CrossRef
[3] Chen, Y., Liu, S., Shen, X. and Jia, J. (2020) DSGN: Deep Stereo Geometry Network for 3D Object Detection. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 12536-12545. [Google Scholar] [CrossRef
[4] Pon, A.D., Ku, J., Li, C. and Waslander, S.L. (2020) Object-Centric Stereo Matching for 3D Object Detection. 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, 31 May-31 August 2020, 8383-8389. [Google Scholar] [CrossRef
[5] Rao, Y., Zhao, W., Liu, B., et al. (2021) Dynamicvit: Efficient Vision Transformers with Dynamic Token Sparsification. 34th Conference on Neural Information Processing Systems (NeurIPS 2020), 6-12 December 2020, 13937-13949.
[6] Zhao, T., Ning, X., Hong, K., Qiu, Z., Lu, P., Zhao, Y., et al. (2023) Ada3d: Exploiting the Spatial Redundancy with Adaptive Inference for Efficient 3D Object Detection. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, 2-3 October 2023, 17728-17738. [Google Scholar] [CrossRef
[7] Xu, X., Sun, Z., Wang, Z., Liu, H., Zhou, J. and Lu, J. (2024) DSPDet3D: 3D Small Object Detection with Dynamic Spatial Pruning. In: European Conference on Computer Vision, Springer, 355-373. [Google Scholar] [CrossRef
[8] He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef
[9] Weng, X., Wang, J., Held, D., et al. (2020) Ab3dmot: A Baseline for 3d Multi-Object Tracking and New Evaluation Metrics.
[10] Shi, S., Wang, X. and Li, H. (2019) Pointrcnn: 3D Object Proposal Generation and Detection from Point Cloud. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 16-21 June 2019, 770-779. [Google Scholar] [CrossRef
[11] Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C. and Xu, C. (2020) Ghostnet: More Features from Cheap Operations. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 1580-1589. [Google Scholar] [CrossRef
[12] Geiger, A., Lenz, P., Stiller, C. and Urtasun, R. (2013) Vision Meets Robotics: The KITTI Dataset. The International Journal of Robotics Research, 32, 1231-1237. [Google Scholar] [CrossRef
[13] Chen, X., Kundu, K., Zhu, Y., et al. (2015) 3D Object Proposals for Accurate Object Class Detection. Proceedings of the 29th International Conference on Neural Information Processing Systems, Volume 1, 424-432.
[14] Li, P., Chen, X. and Shen, S. (2019) Stereo R-CNN Based 3D Object Detection for Autonomous Driving. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 16-21 June 2019, 7644-7652. [Google Scholar] [CrossRef
[15] Liu, Y., Wang, L. and Liu, M. (2021) Yolostereo3d: A Step Back to 2D for Efficient Stereo 3D Detection. 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, 30 May-5 June 2021, 13018-13024. [Google Scholar] [CrossRef