基于深度学习的语义场景图像检索
Deep Learning Based Semantic Scene Image Retrieval
DOI: 10.12677/CSA.2019.98175, PDF,    科研立项经费支持
作者: 徐海蛟*, 张展鸿, 何佳蕾, 方钰敏:广东第二师范学院,计算机科学系,广东 广州
关键词: 语义场景图像检索卷积神经网络深度学习Semantic Scene Image Retrieval Convolutional Neural Network Deep Learning
摘要: 随着互联网图像等多媒体内容的爆炸式增长,在线Web图像的语义场景检索问题引起了学者们的研究兴趣。传统的研究工作聚焦在基于单概念的图像检索上,未能很好检索含有复杂语义场景的图像。为解决语义场景Web图像检索问题,我们提出了一种基于多模态深度学习的语义场景图像检索方法(SSIR)。首先,使用一个多模态CNN训练网络作为概念分类器;其次,通过计算语义概念之间的依赖关系来精炼概念的语义分数,以进一步增强分类器的场景识别能力;最后,为提升对稀疏场景概念的检索性能,应用梯度下降算法来补偿在真实应用中不平衡图像集上语义概念的频率差。在MIR Flickr 2011标准图像数据集上对比了其他传统方法,结果表明我们的语义场景检索方法性能更优。
Abstract: With the explosive growth of multimedia objects such as Web images over the Internet, online semantic scene image retrieval has been receiving increasing research interest. Conventional studies focus on single-concept-based image retrieval and cannot effectively retrieve semantic scene images including multiple concepts that describe characteristic semantic scene. To tackle this issue, i.e., semantic scene Web image retrieval, we propose a novel approach called multi-modal deep learning based Semantic Scene Image Retrieval (SSIR) in this paper. In particular, we first train a multi-modal Convolutional Neural Network (CNN) as a concept classifier for images and texts. Second, semantic interdependencies of the subconcepts included in the images are utilized to refine the predicted semantic scores in order to enhance holistic scene recognition. Finally, to improve the performance of retrieving rare scene concepts, a gradient descent algorithm is used for compensating the varying frequencies of concepts derived from imbalanced image datasets. The results of our experiments on MIR Flickr 2011 have shown that our proposed approach performs favorably compared with several traditional methods.
文章引用:徐海蛟, 张展鸿, 何佳蕾, 方钰敏. 基于深度学习的语义场景图像检索[J]. 计算机科学与应用, 2019, 9(8): 1561-1568. https://doi.org/10.12677/CSA.2019.98175

参考文献

[1] 熊回香, 叶佳鑫. 基于同义词词林的社会化标签等级结构构建研究[J]. 情报杂志, 2018, 37(1): 126-131.
[2] 龚彦宇, 董剑利, 侯明亮. 综合颜色特征的图像检索方法研究[J]. 计算机科学与应用, 2016, 6(10): 583-589.
[3] Fang, Q., Xu, C. and Sang, J. (2016) Folksonomy-Based Visual Ontology Construction and Its Applica-tions. IEEE Transactions Multimedia, 18, 702-713. [Google Scholar] [CrossRef
[4] Wang, W., Yang, X. and Ooi, B.C. (2016) Effective Deep Learning Based Multi-Modal Retrieval. The VLDB Journal, 25, 79-101. [Google Scholar] [CrossRef
[5] 杨泽明, 刘军, 薛程, 于子红. 卷积神经网络在图像分类上的应用综述[J]. 人工智能与机器人研究, 2018, 7(1): 17-24.
[6] 叶天顺. 一种改进的社交词嵌入算法[J]. 计算机应用与软件, 2018(9): 132-137.
[7] He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef
[8] Kim, Y. (2014) Convolutional Neural Networks for Sentence Classi-fication. ACL International Conference on Empirical Methods in Natural Language Processing, Doha, 25-29 October 2014, 1746-1751. [Google Scholar] [CrossRef
[9] 陶秉墨, 鲁淑霞. 基于自适应随机梯度下降方法的非平衡数据分类[J]. 计算机科学, 2018, 45(z1): 487-492.
[10] Nowak, S., Nagel, K. and Liebetrau, J. (2011) The CLEF 2011 Photo Annotation and Concept-Based Retrieval Tasks. CLEF Conference and Labs of the Evaluation Forum, Amsterdam, 19-22 September 2011, 1-25.
[11] 李鹏, 蒋品群, 曾上游, 夏海英, 廖志贤, 范瑞. 基于分组残差结构的轻量级卷积神经网络设计[J]. 微电子学与计算机, 2019, 36(7): 43-47.
[12] 王雅慧, 刘博, 袁晓彤. 基于近似牛顿法的分布式卷积神经网络训练[J]. 计算机科学, 2019, 46(7): 180-185.
[13] 张涛, 杨剑, 宋文爱, 郭雁蓉. 改进卷积神经网络模型设计方法[J]. 计算机工程与设计, 2019, 40(7): 1885-1890.
[14] 程俊华, 曾国辉, 鲁敦科, 黄勃. 基于Dropout的改进卷积神经网络模型平均方法[J]. 计算机应用, 2019, 39(6): 1601-1606.
[15] Simonyan, K. and Zis-serman, A. (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition.
[16] Guillaumin, M., Ver-beek, J.J. and Schmid, C. (2010) Multimodal Semi-Supervised Learning for Image Classification. The 23rd IEEE Con-ference on Computer Vision and Pattern Recognition, San Francisco, 13-18 June 2010, 4307-4311. [Google Scholar] [CrossRef
[17] Srivastava, N. and Salakhutdinov, R. (2014) Multimodal Learn-ing with Deep Boltzmann Machines. Journal of Machine Learning Research, 15, 2949-2980. [Google Scholar] [CrossRef