基于CNN和视觉Transformer的哈希图像检索算法综述
Survey on Hash Image Retrieval Algorithms Based on CNN and Vision Transformer
DOI: 10.12677/csa.2025.1511280, PDF,   
作者: 任 欢, 赵虹阳:新疆理工职业大学人工智能学院,新疆 图木舒克;刘小华*:新疆理工职业大学人工智能学院,新疆 图木舒克;深圳职业技术大学人工智能学院,广东 深圳
关键词: 图像检索深度学习哈希Image Retrieval Deep Learning Hash
摘要: 图像检索的核心目标是从预设的图像数据库中,精准定位并提取出与给定查询图像属于同一类别的所有相关图像。然而,由于传统算法通常采用简单的线性变换来构建哈希函数,并且参数优化中需要人为手动操作。因此,传统的检索方法往往存在着较大的提升空间。近年来,深度学习和哈希技术融合在拥有高检索效率同时拥有较高的检索准确度为图像检索领域提供了新思路。本文综述了各种深度哈希方法,评估了不同类别方法的原理及特性进行介绍,对各种方法的优缺点进行分析,实验结果表明,基于深度学习的哈希图像检索方法取得了较高的检索准确性。最后展望了深度学习在优化算法和计算能力方面的潜力,预测其将在图像检索中起到越来越关键的作用,为实际应用提供更精准的技术支持。
Abstract: The core objective of image retrieval is to precisely locate and extract all relevant images belonging to the same category as a given query image from a predefined image database. However, traditional algorithms typically employ simple linear transformations to construct hash functions, requiring manual parameter optimization. Consequently, conventional retrieval methods often exhibit significant room for improvement. In recent years, the integration of deep learning and hashing techniques has provided new insights for image retrieval, offering both high retrieval efficiency and accuracy. This paper reviews various deep hashing methods, evaluates the principles and characteristics of different categories of approaches, analyzes the advantages and disadvantages of each method, and presents experimental results demonstrating that deep learning-based hashing image retrieval methods achieve high retrieval accuracy. Finally, it explores the potential of deep learning in optimizing algorithms and computational capabilities, predicting that it will play an increasingly critical role in image retrieval, providing more precise technical support for practical applications.
文章引用:任欢, 赵虹阳, 刘小华. 基于CNN和视觉Transformer的哈希图像检索算法综述[J]. 计算机科学与应用, 2025, 15(11): 33-41. https://doi.org/10.12677/csa.2025.1511280

参考文献

[1] Barrios, J.M., Diaz-Espinoza, D. and Bustos, B. (2009) Text-Based and Content-Based Image Retrieval on Flickr: Demo. 2009 Second International Workshop on Similarity Search and Applications, Prague, 29-30 August 2009, 156-157. [Google Scholar] [CrossRef
[2] Hörster, E., Lienhart, R. and Slaney, M. (2007) Image Retrieval on Large-Scale Image Databases. Proceedings of the 6th ACM International Conference on Image and Video Retrieval, Amsterdam, 9-11 July 2007, 17-24. [Google Scholar] [CrossRef
[3] Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2020) An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv:2010.11929.
[4] Datar, M., Immorlica, N., Indyk, P. and Mirrokni, V.S. (2004) Locality-Sensitive Hashing Scheme Based on P-Stable Distributions. Proceedings of the Twentieth Annual Symposium on Computational Geometry, Brooklyn, 8-11 June 2004, 253-262. [Google Scholar] [CrossRef
[5] Weiss, Y., Torralba, A. and Fergus, R. (2008) Spectral Hashing. Advances in Neural Information Processing Systems, 21, 1753-1760.
[6] Gong, Y., Lazebnik, S., Gordo, A. and Perronnin, F. (2013) Iterative Quantization: A Procrustean Approach to Learning Binary Codes for Large-Scale Image Retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 2916-2929. [Google Scholar] [CrossRef] [PubMed]
[7] Liu, W., Wang, J., Ji, R., Jiang, Y. and Chang, S. (2012) Supervised Hashing with Kernels. 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, 16-21 June 2012, 2074-2081. [Google Scholar] [CrossRef
[8] Shen, F., Shen, C., Liu, W. and Shen, H.T. (2015) Supervised Discrete Hashing. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 7-12 June 2015, 37-45. [Google Scholar] [CrossRef
[9] He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef
[10] Xia, R., Pan, Y., Lai, H., Liu, C. and Yan, S. (2014) Supervised Hashing for Image Retrieval via Image Representation Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 28, 2156-2162. [Google Scholar] [CrossRef
[11] Cao, Z., Long, M., Wang, J. and Yu, P.S. (2017) HashNet: Deep Learning to Hash by Continuation. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 5608-5617. [Google Scholar] [CrossRef
[12] Fan, L., Ng, K.W., Ju, C., Zhang, T. and Chan, C.S. (2020) Deep Polarized Network for Supervised Learning of Accurate Binary Hashing Codes. Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 825-831. [Google Scholar] [CrossRef
[13] Yuan, L., Wang, T., Zhang, X., Tay, F.E., Jie, Z., Liu, W., et al. (2020) Central Similarity Quantization for Efficient Image and Video Retrieval. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 3083-3092. [Google Scholar] [CrossRef
[14] Xu, C., Chai, Z., Xu, Z., Li, H., Zuo, Q., Yang, L., et al. (2023) HHF: Hashing-Guided Hinge Function for Deep Hashing Retrieval. IEEE Transactions on Multimedia, 25, 7428-7440. [Google Scholar] [CrossRef
[15] Chen, Y., Zhang, S., Liu, F., Chang, Z., Ye, M. and Qi, Z. (2022) Transhash: Transformer-Based Hamming Hashing for Efficient Image Retrieval. Proceedings of the 2022 International Conference on Multimedia Retrieval, Newark, 27-30 June 2022, 127-136. [Google Scholar] [CrossRef
[16] Li, T., Zhang, Z., Pei, L. and Gan, Y. (2022) HashFormer: Vision Transformer Based Deep Hashing for Image Retrieval. IEEE Signal Processing Letters, 29, 827-831. [Google Scholar] [CrossRef
[17] Ren, X., Zheng, X., Zhou, H., Liu, W. and Dong, X. (2022) Contrastive Hashing with Vision Transformer for Image Retrieval. International Journal of Intelligent Systems, 37, 12192-12211. [Google Scholar] [CrossRef
[18] 杨梦雅, 赵琰, 薛亮. 基于改进的Vision Transformer深度哈希图像检索[J]. 陕西科技大学学报, 2025, 43(4): 183-191.
[19] 刘华咏, 徐明慧. 基于混合注意力与偏振非对称损失的哈希图像检索[J]. 计算机科学, 2025, 52(8): 204-213.
[20] Song, C.H., Yoon, J., Choi, S. and Avrithis, Y. (2023) Boosting Vision Transformers for Image Retrieval. 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, 2-7 January 2023, 107-117. [Google Scholar] [CrossRef
[21] Krizhevsky, A. and Hinton, G. (2009) Learning Multiple Layers of Features from Tiny Images. Technical Report TR-2009. University of Toronto.
[22] Chua, T., Tang, J., Hong, R., Li, H., Luo, Z. and Zheng, Y. (2009) NUS-WIDE: A Real-World Web Image Database from National University of Singapore. Proceedings of the ACM International Conference on Image and Video Retrieval, Santorini, 8-10 July 2009, 1-9. [Google Scholar] [CrossRef