CDNet：空间向量用于双视图对应学习的研究

doi:10.12677/csa.2025.154074

期刊菜单

CDNet：空间向量用于双视图对应学习的研究
CDNet: Using Spatial Vectors for Two-View Correspondence Learning Research

DOI: 10.12677/csa.2025.154074, PDF,
作者: 李浩然：温州大学计算机与人工智能学院，浙江温州
关键词: 向量场；上下文；Transformer；Vector Field； Context； Transformer

摘要: 特征匹配是计算机视觉中的一项基本而重要的任务，目的是在给定的一对图像之间找到正确的对应关系(即内线)。严格地说，特征匹配通常包括四个步骤，即特征提取、特征描述、建立初始对应集和去除虚假对应(即离群值去除)。然而，现有的方法单纯考虑到了对应点之间的联系，而忽视了场景图片中可以获取的视觉信息。在本文中，我们提出了一种新型剪枝框架Context Depth Net (CDNet)来准确识别内线和恢复相机姿态。我们从对应点中提取方向信息作为提示方法指导剪枝操作，并利用向量场更好地挖掘对应之间的深层空间信息，最后设计一组融合模块来使空间信息更好融合。实验表明，所提出的CDNet在室内室外数据集上的测试结果优于先前提出的方法。

Abstract: Feature matching is a fundamental and crucial task in computer vision, aiming to find the correct correspondences (i.e., inliers) between a given pair of images. Strictly speaking, feature matching typically involves four steps: feature extraction, feature description, establishing an initial set of correspondences, and removing false correspondences (i.e., outlier removal). However, existing methods only consider the connections between corresponding points while neglecting the visual information that can be obtained from scene images. In this paper, we propose a novel pruning framework called Context Depth Net (CDNet) to accurately identify inliers and recover camera poses. We extract directional information from corresponding points as a cue to guide the pruning process, and utilize vector fields to better mine the deep spatial information between correspondences. Finally, we design a set of fusion modules to better integrate the spatial information. Experiments show that the proposed CDNet performs better on indoor and outdoor datasets than previously proposed methods.

文章引用：李浩然. CDNet：空间向量用于双视图对应学习的研究[J]. 计算机科学与应用, 2025, 15(4): 22-32. https://doi.org/10.12677/csa.2025.154074

参考文献

[1]	Havlena, M. and Schindler, K. (2014) VocMatch: Efficient Multiview Correspondence for Structure from Motion. Computer Vision—ECCV 2014, Zurich, 6-12 September 2014, 46-60. [Google Scholar] [CrossRef]
[2]	Mur-Artal, R., Montiel, J.M.M. and Tardos, J.D. (2015) ORB-SLAM: A Versatile and Accurate Monocular SLAM System. IEEE Transactions on Robotics, 31, 1147-1163. [Google Scholar] [CrossRef]
[3]	Ma, J., Ma, Y. and Li, C. (2019) Infrared and Visible Image Fusion Methods and Applications: A Survey. Information Fusion, 45, 153-178. [Google Scholar] [CrossRef]
[4]	Fischler, M.A. and Bolles, R.C. (1981) Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Communications of the ACM, 24, 381-395. [Google Scholar] [CrossRef]
[5]	Ni, K., Jin, H. and Dellaert, F. (2009) GroupSAC: Efficient Consensus in the Presence of Groupings. 2009 IEEE 12th International Conference on Computer Vision, Kyoto, 29 September-2 October 2009, 2193-2200. [Google Scholar] [CrossRef]
[6]	Raguram, R., Chum, O., Pollefeys, M., Matas, J. and Frahm, J. (2013) USAC: A Universal Framework for Random Sample Consensus. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 2022-2038. [Google Scholar] [CrossRef] [PubMed]
[7]	Fragoso, V., Sen, P., Rodriguez, S. and Turk, M. (2013) EVSAC: Accelerating Hypotheses Generation by Modeling Matching Scores with Extreme Value Theory. 2013 IEEE International Conference on Computer Vision, Sydney, 1-8 December 2013, 2472-2479. [Google Scholar] [CrossRef]
[8]	Ma, J., Jiang, X., Fan, A., Jiang, J. and Yan, J. (2020) Image Matching from Handcrafted to Deep Features: A Survey. International Journal of Computer Vision, 129, 23-79. [Google Scholar] [CrossRef]
[9]	Yi, K.M., Trulls, E., Ono, Y., Lepetit, V., Salzmann, M. and Fua, P. (2018) Learning to Find Good Correspondences. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 2666-2674. [Google Scholar] [CrossRef]
[10]	Zhang, J., Sun, D., Luo, Z., Yao, A., Zhou, L., Shen, T., et al. (2019) Learning Two-View Correspondences and Geometry Using Order-Aware Network. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 5844-5853. [Google Scholar] [CrossRef]
[11]	Zhao, C., Ge, Y., Zhu, F., Zhao, R., Li, H. and Salzmann, M. (2021) Progressive Correspondence Pruning by Consensus Learning. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 6444-6453. [Google Scholar] [CrossRef]
[12]	Liu, Y., Liu, L., Lin, C., Dong, Z. and Wang, W. (2021) Learnable Motion Coherence for Correspondence Pruning. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 3236-3245. [Google Scholar] [CrossRef]
[13]	Lowe, D.G. (2004) Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 60, 91-110. [Google Scholar] [CrossRef]
[14]	DeTone, D., Malisiewicz, T. and Rabinovich, A. (2018) SuperPoint: Self-Supervised Interest Point Detection and Description. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, 18-22 June 2018, 224-236. [Google Scholar] [CrossRef]
[15]	Zhang, S. and Ma, J. (2024) ConvMatch: Rethinking Network Design for Two-View Correspondence Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46, 2920-2935. [Google Scholar] [CrossRef]
[16]	Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P. and Bengio, Y. (2017) Graph Attention Networks. arXiv: 1710.10903. [Google Scholar] [CrossRef]
[17]	Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M. and Solomon, J.M. (2019) Dynamic Graph CNN for Learning on Point Clouds. ACM Transactions on Graphics, 38, 1-12. [Google Scholar] [CrossRef]
[18]	Hartley, R. and Zisserman, A. (2004) Multiple View Geometry in Computer Vision. 2nd Edition, Cambridge University Press. [Google Scholar] [CrossRef]
[19]	Ranftl, R. and Koltun, V. (2018) Deep Fundamental Matrix Estimation. Computer Vision—ECCV 2018, Munich, 8-14 September 2018, 292-309. [Google Scholar] [CrossRef]
[20]	Thomee, B., Shamma, D.A., Friedland, G., Elizalde, B., Ni, K., Poland, D., et al. (2016) YFCC100M: The New Data in Multimedia Research. Communications of the ACM, 59, 64-73. [Google Scholar] [CrossRef]
[21]	Xiao, J., Owens, A. and Torralba, A. (2013) SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels. 2013 IEEE International Conference on Computer Vision, Sydney, 1-8 December 2013, 1625-1632. [Google Scholar] [CrossRef]
[22]	Rublee, E., Rabaud, V., Konolige, K. and Bradski, G. (2011) ORB: An Efficient Alternative to SIFT or SURF. 2011 International Conference on Computer Vision, Barcelona, 6-13 November 2011, 2564-2571. [Google Scholar] [CrossRef]

为你推荐

友情链接