弱纹理场景下结合表面法向量约束的单目SLAM位姿优化算法研究
Research on Monocular SLAM Pose Optimization Algorithm Integrating Surface Normal Constraints inWeak-Texture Scenes
DOI: 10.12677/mos.2025.1410627, PDF,   
作者: 张 磊, 简晓富:上海理工大学光电信息与计算机工程学院,上海;魏国亮*:上海理工大学管理学院,上海
关键词: 单目SLAM弱纹理场景深度学习表面法向量Monocular SLAM Weak Texture Scenes Deep Learning Surface Normal
摘要: 为解决传统单目SLAM系统中特征点法在弱纹理场景下因有效特征点不足,导致局部位姿优化阶段过度依赖点的重投影误差约束而出现精度不足的问题,本研究提出了一种结合深度学习模型的鲁棒的单目SLAM系统框架。在预处理阶段,利用深度学习模型提取输入图像帧稠密表面法向量,通过聚类算法获取帧内代表方向。随后基于相邻帧代表方向之间的匹配关系推导出两种新的几何约束,将其作为额外的图优化因子融入到后续的局部位姿优化过程中,以弥补局部位姿优化中仅依赖点重投影误差约束的不足。在TUM数据集和ICL NUIM数据集上的测试结果表明,所提出的系统在弱纹理场景中的位姿估计精度与运行鲁棒性均得到有效提升。
Abstract: To address the problem that the feature-point-based method in traditional monocular SLAM systems has insufficient accuracy in the local pose optimization stage due to over-reliance on the reprojection error constraint caused by insufficient effective feature points in weak texture scenes, this study proposes a robust monocular SLAM system framework combined with a deep learning model. In the preprocessing stage, a deep learning model is used to extract dense surface normal vectors of input image frames, and a clustering algorithm is adopted to obtain representative directions within the frames. Subsequently, two new geometric constraints are derived based on the matching relationship between the representative directions of adjacent frames, and they are incorporated into the subsequent local pose optimization process as additional graph optimization factors to make up for the deficiency of only relying on the reprojection error constraint in local pose optimization. The test results on the TUM dataset and ICL NUIM dataset demonstrate that the proposed system effectively enhances both the pose estimation accuracy and running robustness in weak texture scenes.
文章引用:张磊, 魏国亮, 简晓富. 弱纹理场景下结合表面法向量约束的单目SLAM位姿优化算法研究[J]. 建模与仿真, 2025, 14(10): 329-337. https://doi.org/10.12677/mos.2025.1410627

参考文献

[1] Abaspur Kazerouni, I., Fitzgerald, L., Dooly, G. and Toal, D. (2022) A Survey of State-of-the-Art on Visual Slam. Expert Systems with Applications, 205, Article ID: 117734. [Google Scholar] [CrossRef
[2] Mur-Artal, R., Montiel, J.M.M. and Tardos, J.D. (2015) ORB-SLAM: A Versatile and Accurate Monocular SLAM System. IEEE Transactions on Robotics, 31, 1147-1163. [Google Scholar] [CrossRef
[3] Engel, J., Schöps, T. and Cremers, D. (2014) LSD-SLAM: Large-Scale Direct Monocular SLAM. Proceedings of the 13th European Conference on Computer Vision, Zurich, 6-12 September 2014, 834-849. [Google Scholar] [CrossRef
[4] Wang, R., Schworer, M. and Cremers, D. (2017) Stereo DSO: Large-Scale Direct Sparse Visual Odometry with Stereo Cameras. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 3923-3931. [Google Scholar] [CrossRef
[5] Forster, C., Zhang, Z., Gassner, M., Werlberger, M. and Scaramuzza, D. (2017) SVO: Semidirect Visual Odometry for Monocular and Multicamera Systems. IEEE Transactions on Robotics, 33, 249-265. [Google Scholar] [CrossRef
[6] Newcombe, R.A., Lovegrove, S.J. and Davison, A.J. (2011) DTAM: Dense Tracking and Mapping in Real-Time. 2011 International Conference on Computer Vision, Barcelona, 6-13 November 2011, 2320-2327. [Google Scholar] [CrossRef
[7] Liu, J., Zhang, T. and Sun, S. (2024) Review of Deep Learning Algorithms in Molecular Simulations and Perspective Applications on Petroleum Engineering. Geoscience Frontiers, 15, Article ID: 101735. [Google Scholar] [CrossRef
[8] Yang, L., Kang, B., Huang, Z., Xu, X., Feng, J. and Zhao, H. (2024) Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 16-22 June 2024, 10371-10381. [Google Scholar] [CrossRef
[9] Hu, M., Yin, W., Zhang, C., Cai, Z., Long, X., Chen, H., et al. (2024) Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-Shot Metric Depth and Surface Normal Estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46, 10579-10596. [Google Scholar] [CrossRef] [PubMed]
[10] Bae, G. and Davison, A.J. (2024) Rethinking Inductive Biases for Surface Normal Estimation. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 16-22 June 2024, 9535-9545. [Google Scholar] [CrossRef
[11] Kar, O.F., Yeo, T., Atanov, A. and Zamir, A. (2022) 3D Common Corruptions and Data Augmentation. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 18941-18952. [Google Scholar] [CrossRef
[12] Choi, S., Kang, D. and Cho, M. (2024) Contrastive Mean-Shift Learning for Generalized Category Discovery. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 16-22 June 2024, 23094-23104. [Google Scholar] [CrossRef
[13] Firman, M. (2016) RGBD Datasets: Past, Present and Future. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Las Vegas, 26 June-1 July 2016, 661-673. [Google Scholar] [CrossRef
[14] Handa, A., Whelan, T., McDonald, J. and Davison, A.J. (2014) A Benchmark for RGB-D Visual Odometry, 3D Reconstruction and SLAM. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, 31 May-7 June 2014, 1524-1531.