一种用于遥感图像检索的双重注意力深度神经网络
A Dual Attention Deep Neural Network for Remote Sensing Image Retrieval
DOI: 10.12677/CSA.2021.113052, PDF,    国家自然科学基金支持
作者: 陈光明, 王卓薇, 陈立宜, 何俊霖:广东工业大学计算机学院,广东 广州;邱俊豪:广东工业大学机电工程学院,广东 广州
关键词: 遥感图像检索注意力机制CNN深度学习Remote Sensing Image Retrieval Attention Mechanism CNN Deep Learning
摘要: 因为遥感图像背景复杂,所以提取判别性强特征是遥感图像检索的一个核心技术。本文引入双重自注意力模块,利用空间和通道上的长距离上下文信息,编码局部特征,从而增强特征的表达能力。本文分别在3个典型的数据集上做了实验,在UC Merced Land Use、Satellite Remote Sensing Image Database、NWPU-RESISC45的平局检索精度分别为0.92、0.90和0.89。实验表明,双重自注意力深度学习网络对遥感图像检索性能的提升有显著的作用。
Abstract: Extracting discriminative features is a core technology for remote sensing image retrieval due to the complex background of the remote sensing image. In order to enhance the expressive ability of the features, the paper introduces dual attention module to encode the long-distance length information on the spatial and the channel dimensions into local features. Experiments were carried out on three typical datasets. We have conducted experiments on three typical datasets to ascertain the effectiveness of our method. The retrieval precisions on UC Merced Land Use, Satellite Remote Sensing Image Database, and NWPU-RESISC45 are 0.92, 0.90 and 0.89. The experiment shows the self-attention deep learning network gets a significant effect on the improvement of remote sensing image retrieval performance.
文章引用:陈光明, 王卓薇, 陈立宜, 邱俊豪, 何俊霖. 一种用于遥感图像检索的双重注意力深度神经网络[J]. 计算机科学与应用, 2021, 11(3): 515-524. https://doi.org/10.12677/CSA.2021.113052

参考文献

[1] Du, P.J., Chen, Y.H., Tang, H. and Fang, T. (2005) Study on Content-Based Remote Sensing Image Retrieval. IEEE In-ternational Geoscience & Remote Sensing Symposium, Seoul, 29 July 2005, 4. [Google Scholar] [CrossRef
[2] Ning, X., Li, D. and Ye, W. (2005) Content-Based Remote Sensing Image Retrieval. Proceedings of SPIE—The International Society for Optical Engineering, 6044, 60440Q. [Google Scholar] [CrossRef
[3] Sudha, S.K. and Aji, S. (2019) A Review on Recent Advances in Remote Sensing Image Retrieval Techniques. Journal of the Indian Society of Remote Sensing, 47, 2129-2139. [Google Scholar] [CrossRef
[4] Zhu, X.X., Tuia, D., Mou, L., Xia, G.S., Zhang, L., Xu, F. and Fraundorfer, F. (2017) Deep Learning in Remote Sensing: A Review.
https://arxiv.org/abs/1710.03959v1
[5] Wan, J., Wang, D., Hoi, S.C.H., Wu, P. and Li, J. (2014) Deep Learning for Content-Based Image Retrieval: A Comprehensive Study. Proceedings of the 22nd ACM International Conference on Multimedia, ACM, November 2014, 157-166. [Google Scholar] [CrossRef
[6] Lowe, D.G. (1999) Object Recognition from Local Scale-Invariant Features. Proceedings of the Seventh IEEE International Conference on Computer Vision, 2, 1150-1157. [Google Scholar] [CrossRef
[7] Bay, H., Tuy-telaars, T. and Van Gool, L. (2006) SURF: Speeded up Robust Features. In: Leonardis, A., Bischof, H. and Pinz, A., Eds., European Conference on Computer Vision, Springer, Berlin, Heidelberg, 404-417. [Google Scholar] [CrossRef
[8] Yang, J., Liu, J. and Dai, Q. (2015) An Improved Bag-of-Words Framework for Remote Sensing Image Retrieval in Large-Scale Image Databases. International Journal of Digital Earth, 8, 273-292. [Google Scholar] [CrossRef
[9] Tang, X., Zhang, X., Liu, F. and Jiao, L. (2018) Unsupervised Deep Feature Learning for Remote Sensing Image Retrieval. Remote Sensing, 10, 1243. [Google Scholar] [CrossRef
[10] Jégou, H., Douze, M., Schmid, C. and Pérez, P. (2010) Aggregating Local Descriptors into a Compact Image Representation. 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, 13-18 June 2010, 3304-3311. [Google Scholar] [CrossRef
[11] Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2012) Imagenet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems, 25, 1097-1105.
[12] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015) Going Deeper with Convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 7-12 June 2015, 1-9. [Google Scholar] [CrossRef
[13] He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef
[14] Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B. and Belongie, S. (2017) Feature Pyramid Networks for Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 2117-2125. [Google Scholar] [CrossRef
[15] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y. and Berg, A.C. (2016) SSD: Single Shot Multibox Detector. In: Leibe, B., Matas, J., Sebe, N. and Welling, M., Eds., European Conference on Computer Vision, Springer, Cham, 21-37. [Google Scholar] [CrossRef
[16] Redmon, J. and Farhadi, A. (2017) YOLO9000: Better, Faster, Stronger. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 7263-7271. [Google Scholar] [CrossRef
[17] Long, J., Shelhamer, E. and Darrell, T. (2015) Fully Convolutional Networks for Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 7-12 June 2015, 3431-3440. [Google Scholar] [CrossRef
[18] Szegedy, C., Ioffe, S., Vanhoucke, V. and Alemi, A. (2017) In-ception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 31, 4278-4284.
[19] Ge, Y., Jiang, S., Xu, Q., Jiang, C. and Ye, F. (2018) Exploiting Repre-sentations from Pre-Trained Convolutional Neural Networks for High-Resolution Remote Sensing Image Retrieval. Mul-timedia Tools and Applications, 77, 17489-17515. [Google Scholar] [CrossRef
[20] Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z. and Lu, H. (2019) Dual Attention Network for Scene Segmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 15-20 June 2019, 3146-3154. [Google Scholar] [CrossRef
[21] Mnih, V., Heess, N., Graves, A. and Kavukcuoglu, K. (2014) Re-current Models of Visual Attention. arXiv preprint arXiv:1406.6247.
[22] Gregor, K., Danihelka, I., Graves, A., Rezende, D. and Wierstra, D. (2015) DRAW: A Recurrent Neural Network for Image Generation. Proceedings of the 32nd International Conference on Machine Learning, 37, 1462-1471.
[23] Ba, J., Mnih, V. and Kavukcuoglu, K. (2014) Multiple Object Recognition with Visual Attention. arXiv preprint arXiv:1412.7755.
[24] Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X. and Tang, X. (2017) Residual Attention Network for Image Classification. Pro-ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 3156-3164. [Google Scholar] [CrossRef
[25] Shao, Z., Yang, K. and Zhou, W. (2018) Performance Evaluation of Single-Label and Multi-Label Remote Sensing Image Retrieval Using a Dense Labeling Dataset. Remote Sensing, 10, 964. [Google Scholar] [CrossRef
[26] Roy, S., Sangineto, E., Demir, B. and Sebe, N. (2020) Met-ric-Learning-Based Deep Hashing Network for Content-Based Retrieval of Remote Sensing Images. IEEE Geoscience and Remote Sensing Letters, 18, 226-230. [Google Scholar] [CrossRef
[27] Bello, I., Zoph, B., Vaswani, A., Shlens, J. and Le, Q.V. (2019) Attention Augmented Convolutional Networks. Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, 27 October-2 November 2019, 3286-3295. [Google Scholar] [CrossRef
[28] Xiong, W., Lv, Y., Cui, Y., Zhang, X. and Gu, X. (2019) A Discriminative Feature Learning Approach for Remote Sensing Image Retrieval. Remote Sensing, 11, 281. [Google Scholar] [CrossRef
[29] Imbriaco, R., Sebastian, C., Bondarev, E. and de With, P.H.N. (2019) Aggregated Deep Local Features for Remote Sensing Image Retrieval. Remote Sensing, 11, 493. [Google Scholar] [CrossRef
[30] Yuan, Y. and Wang, J. (2018) Ocnet: Object Context Network for Scene Parsing. arXiv preprint arXiv:1809.00916.
[31] Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y. and Liu, W. (2019) CCNet: Criss-Cross Attention for Semantic Segmentation. Proceedings of the IEEE/CVF Interna-tional Conference on Computer Vision, Seoul, 27 October-2 November 2019, 603-612. [Google Scholar] [CrossRef
[32] Wang, X., Girshick, R., Gupta, A. and He, K. (2018) Non-Local Neural Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 7794-7803. [Google Scholar] [CrossRef
[33] Du, Y., Yuan, C., Li, B., Zhao, L., Li, Y. and Hu, W. (2018) In-teraction-Aware Spatio-Temporal Pyramid Attention Networks for Action Classification. Proceedings of the European Conference on Computer Vision (ECCV), 373-389. [Google Scholar] [CrossRef
[34] Berman, M., Jégou, H., Vedaldi, A., Kokkinos, I. and Douze, M. (2019) Multigrain: A Unified Image Embedding for Classes and Instances. arXiv preprint arXiv:1902.05509.
[35] Babenko, A. and Lempitsky, V. (2015) Aggregating Deep Convolutional Features for Image Retrieval. arXiv preprint arXiv:1510.07493.
[36] Tolias, G., Sicre, R. and Jégou, H. (2015) Particular Object Retrieval with Integral Max-Pooling of CNN Activations. arXiv preprint arXiv:1511.05879.
[37] Radenović, F., Tolias, G. and Chum, O. (2018) Fine-Tuning CNN Image Retrieval with No Human Annotation. IEEE Transactions on Pattern Analy-sis and Machine Intelligence, 41, 1655-1668. [Google Scholar] [CrossRef
[38] Yang, Y. and Newsam, S. (2010) Bag-of-Visual-Words and Spatial Extensions for Land-Use Classification. Proceedings of the 18th SIGSPATIAL International Conference on Ad-vances in Geographic Information Systems, November 2010, 270-279. [Google Scholar] [CrossRef
[39] Tang, X., Jiao, L., Emery, W.J., Liu, F. and Zhang, D. (2017) Two-Stage Reranking for Remote Sensing Image Retrieval. IEEE Transactions on Geoscience and Remote Sensing, 55, 5798-5817. [Google Scholar] [CrossRef
[40] Zhao, B., Zhong, Y., Xia, G.S. and Zhang, L. (2015) Di-richlet-Derived Multiple Topic Scene Classification Model for High Spatial Resolution Remote Sensing Imagery. IEEE Transactions on Geoscience and Remote Sensing, 54, 2108-2123. [Google Scholar] [CrossRef
[41] Cheng, G., Han, J. and Lu, X. (2017) Remote Sensing Image Scene Classification: Benchmark and State of the Art. Proceedings of the IEEE, 105, 1865-1883. [Google Scholar] [CrossRef
[42] Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D. and Batra, D. (2017) Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. Pro-ceedings of the IEEE International Conference on Computer Vision, Venice, 22-29 October 2017, 618-626. [Google Scholar] [CrossRef