基于图像检索的无人机相机重定位方法
UAV Camera Relocation Method Based on Image Retrieval
摘要: 无人机视觉重定位技术是其在GPS拒止环境下实现自主导航的核心支撑,广泛应用于城市巡检、灾害救援等场景。然而,复杂动态环境导致传统方法面临特征匹配歧义性高、位姿解算累积误差大等挑战。针对此,本文提出一种基于图像检索的无人机相机重定位方法,采用改进的MobileNetV2骨干网络,结合深度可分离卷积与反向残差模块的通道扩展–压缩策略,在减少参数量的同时保留关键几何信息;全局分支引入可微分NetVLAD层动态聚合局部特征,生成紧凑的4096维描述符;局部特征提取分支设计双解码器架构,利用子像素卷积与双三次插值实现高精度关键点检测与连续性描述子生成,并结合图注意力网络动态筛选几何一致性强的匹配对,通过Sinkhorn算法迭代优化软匹配矩阵,以自适应阈值剔除低置信度噪声。实验表明,该算法在弱纹理、动态遮挡等复杂条件下显著降低位姿误差。
Abstract: UAV visual repositioning technology is the core support for its autonomous navigation in GPS denial environments, which is widely used in urban inspection, disaster rescue and other scenarios. However, the complex dynamic environment causes the traditional method to face challenges such as high ambiguity in feature matching and large cumulative error in position solving. To address this, this paper proposes a UAV camera relocation method based on image retrieval, which adopts an improved MobileNetV2 backbone network, combining a channel expansion-compression strategy with depth-separable convolution and inverse residual module, to reduce the number of parameters while retaining the key geometrical information; a global branch introduces a differentiable NetVLAD layer to dynamically aggregate local features, and generates compact 4096-dimensional descriptors; a local feature extraction branch introduces a local feature extraction branch to dynamically aggregate local features, and generates compact 4096-dimensional descriptors. The global branch introduces a differentiable NetVLAD layer to dynamically aggregate local features and generate compact 4096-dimensional descriptors; the local feature extraction branch designs a dual-decoder architecture, uses sub-pixel convolution and dual cubic interpolation to achieve high-precision keypoint detection and continuity descriptor generation, and combines with the graph-attention network to dynamically screen matching pairs with strong geometric consistency, and optimises the soft-matching matrix iteratively through the Sinkhorn algorithm to reject low-confidence noise by an adaptive thresholding. Experiments show that the algorithm significantly reduces the positional error under complex conditions such as weak texture and dynamic occlusion.
文章引用:任昭扬. 基于图像检索的无人机相机重定位方法[J]. 软件工程与应用, 2025, 14(3): 703-713. https://doi.org/10.12677/sea.2025.143062

参考文献

[1] Bianchi, M. and Barfoot, T.D. (2021) UAV Localization Using Autoencoded Satellite Images. IEEE Robotics and Automation Letters, 6, 1761-1768. [Google Scholar] [CrossRef
[2] Xu, D., Li, Y.F. and Tan, M. (2008) A General Recursive Linear Method and Unique Solution Pattern Design for the Perspective-N-Point Problem. Image and Vision Computing, 26, 740-750. [Google Scholar] [CrossRef
[3] Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T. and Sivic, J. (2018) Netvlad: CNN Architecture for Weakly Supervised Place Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 1437-1451. [Google Scholar] [CrossRef] [PubMed]
[4] Ding, M., Wang, Z., Sun, J., et al. (2019) CamNet: Coarse-to-Fine Retrieval for Camera Re-Localization. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October 2019-2 November 2019, 2871-2880. [Google Scholar] [CrossRef
[5] 王静, 胡少毅, 郭苹, 等. 改进场景坐标回归网络的室内相机重定位方法[J]. 计算机工程与应用, 2023, 59(15): 160-168.
[6] DeTone, D., Malisiewicz, T. and Rabinovich, A. (2018) SuperPoint: Self-Supervised Interest Point Detection and Description. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, 18-22 June 2018, 337-33712. [Google Scholar] [CrossRef
[7] Michele, A., Colin, V. and Santika, D.D. (2019) MobileNet Convolutional Neural Networks and Support Vector Machines for Palmprint Recognition. Procedia Computer Science, 157, 110-117. [Google Scholar] [CrossRef
[8] Delhumeau, J., Gosselin, P., Jégou, H. and Pérez, P. (2013) Revisiting the VLAD Image Representation. Proceedings of the 21st ACM international conference on Multimedia, Barcelona, 21-25 October 2013, 653-656. [Google Scholar] [CrossRef
[9] Li, Y., Huang, Y., Liu, Z., et al. (2024) A Distributed Scheme for the Taxi Cruising Route Recommendation Problem Using a Graph Neural Network. Electronics, 13, Article 574. [Google Scholar] [CrossRef
[10] Luise, G., Rudi, A., Pontil, M., et al. (2018) Differential Properties of Sinkhorn Approximation for Learning with Wasserstein Distance. arXiv.1805.11897.
[11] Hao, L.Y., Min, T. and Yun-Bo, Z. (2020) Cross-Modality Person Re-Identification Framework Based on Improved Hard Triplet Loss.
[12] Ghasemi, S. and Moshtagh, J. (2014) A Novel Codification and Modified Heuristic Approaches for Optimal Reconfiguration of Distribution Networks Considering Losses Cost and Cost Benefit from Voltage Profile Improvement. Applied Soft Computing, 25, 360-368. [Google Scholar] [CrossRef
[13] Zheng, Z., Wei, Y. and Yang, Y. (2020) University-1652: A Multi-View Multi-Source Benchmark for Drone-Based Geo-Localization. Proceedings of the 28th ACM International Conference on Multimedia, Seattle WA, 12-16 October 2020, 1395-1403. [Google Scholar] [CrossRef
[14] Bilasco, I.M., Gensel, J., Villanova-Oliver, M. and Martin, H. (2005) On Indexing of 3D Scenes Using MPEG-7. Proceedings of the 13th annual ACM international conference on Multimedia, Hilton, 6-11 November 2005, 471-474. [Google Scholar] [CrossRef