基于显著性融合的细粒度图像分类方法研究
Research of Improved Fine-Grained Image Classification Based on Saliency
DOI: 10.12677/CSA.2019.912247, PDF,    国家自然科学基金支持
作者: 吴慧诗*, 程良伦:广东工业大学计算机学院,广东 广州;陈仿雄:深圳和而泰家居在线网络科技有限公司,广东 深圳
关键词: 细粒度图像分类卷积神经网络显著性检测算法显著图特征融合Fine-Grained Image Classification Convolutional Neural Network (CNN) Saliency Detection Algorithm Saliency Map Feature Fusion
摘要: 针对细粒度图像存在的类内差异大、类间差异小和依赖数据标注的问题,提出了一种基于显著度融合改进细粒度图像分类的算法。该算法基于一种双输入的深度神经网络,包括显著性特征融合结构和特征提取网络两个部分。首先,根据Fusion层网络结构将原RGB图与显著图进行特征融合,显著图是由SALICON显著性检测算法计算产生;其次,为充分利用更高分辨显著特征的调制潜力,利用最大池化操作对数据空间进行降维操作;最后,借助迁移学习思想,把在ImageNet数据集上预训练好的深度神经网络模型Inception_V3.0作为基础特征提取模型,进一步提取高层语义特征。在公开数据集CUB200-2011和Stanford Dogs中进行对比实验,结果表明,该算法的分类准确率分别达到84.36%、84.94%,相较于Part R-CNN、LRBP等多个主流细粒度分类算法,本文方法能取得更好的分类效果。
Abstract: In view of large intraclass differences, small differences between classes and the problems of dependency on data annotation in fine-grained images, an algorithm based on saliency fusion to improve fine-grained image classification is proposed. This paper introduced a two-input deep neural network, which integrated two components in a single framework: the salient feature fusion structure and the feature extractor. Firstly, the SALICON saliency detection algorithm is used to generate the saliency map. The original RGB image is fused with the saliency map according to the fusion network structure. Secondly, in order to make full use of higher resolution, the modulation potential of the salient features, maximum pooling operation is used to reduce the dimensionality of the data space so that the modulation potential of higher resolution salient features can be fully utilized. Finally, with the help of migration learning, the deep neural network model Inception_V3.0 pretrained on the ImageNet dataset is used as the basic feature extraction model to extract high-level semantic features. The comparison experiments in the public datasets CUB200-2011 and Stanford Dogs show that the classification accuracy of the algorithm is 84.36%, 84.94%, compared with Part R-CNN, LRBP and other mainstream fine-grained classification algorithms, this method can achieve better classification results.
文章引用:吴慧诗, 程良伦, 陈仿雄. 基于显著性融合的细粒度图像分类方法研究[J]. 计算机科学与应用, 2019, 9(12): 2218-2230. https://doi.org/10.12677/CSA.2019.912247

参考文献

[1] Wah, C., Branson, S., Welinder, P., et al. (2011) The Caltech-UCSD Birds-200-2011 Dataset.
[2] Khosla, A., Jaya-devaprakash, N., Yao, B. and Li, F.-F. (2011) Novel Dataset for Fine-Grained Image Categorization: Stanford Dogs. Proceedings of CVPR Workshop on Fine-Grained Visual Categorization, 1-2.
[3] Maji, S., Rahtu, E., Kannala, J., Blaschko, M. and Vedaldi, A. (2013) Fine-Grained Visual Classification of Aircraft. ArXiv Preprint ArXiv: 1306.5151.
[4] Nilsback, M.E. and Zisserman, A. (2008) Automated Flower Classification over a Large Number of Classes. 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, Bhubaneswar, India, 16-19 December 2008, 722-729. [Google Scholar] [CrossRef
[5] Krause, J., Stark, M., Deng, J. and Li, F.-F. (2013) 3D Object Representations for Fine-Grained Categorization. 2013 IEEE International Conference on Com-puter Vision Workshops, Sydney, Australia, 2-8 December 2013, 554-561. [Google Scholar] [CrossRef
[6] 罗建豪, 吴建鑫. 基于深度卷积特征的细粒度图像分类研究综述[J]. 自动化学报, 2017, 43(8): 1306-1318.
[7] 张琳波, 王春恒, 肖柏华, 等. 基于Bag-of-Phrases的图像表示方法[J]. 自动化学报, 2012, 38(1): 46-54.
[8] Berg, T. and Belhumeur, P.N. (2013) POOF: Part-Based One-vs-One Features for Fine-Grained Categorization, Face Verification, and Attribute Estimation. 2013 IEEE Conference on Com-puter Vision and Pattern Recognition, Portland, OR, 23-28 June 2013, 955-962. [Google Scholar] [CrossRef
[9] Daniilidis, K., Maragos, P. and Paragios, N. (2010) Improving the Fisher Kernel for Large-Scale Image Classification. Proceedings of the 11th European Conference on Computer Vision (ECCV), Crete, Greece, 5-11 September 2010, 143-156. [Google Scholar] [CrossRef
[10] Wang, P., et al. (2013) Supervised Kernel Descriptors for Visual Recognition. 2013 IEEE Conference on Computer Vision and Pattern Recognition, 23-28 June 2013, Portland, OR, 1828-1830. [Google Scholar] [CrossRef
[11] Zhang, N., Donahue, J., Girshick, R. and Darrell, T. (2014) Part-Based R-CNNs for Fine-Grained Category Detection. In: Fleet, D., Pajdla, T., Schiele, B. and Tuytelaars, T., Eds., Computer Vision-ECCV 2014. Lecture Notes in Computer Science, Volume 8689, Springer, Cham, 834-849. [Google Scholar] [CrossRef
[12] Branson, S., Belongie, S., Van Horn, G. and Perona, P. (2014) Bird Species Categorization Using Pose Normalized Deep Convolutional Nets. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, 594-605. [Google Scholar] [CrossRef
[13] Wei, X.-S., Xie, C.-W., Wu, J.X. and Shen, C. (2018) Mask-CNN: Localizing Parts and Selecting Descriptors for Fine-Grained Bird Species Categorization. Pattern Recognition, 76, 704-714. [Google Scholar] [CrossRef
[14] Lam, M., Todorovic, S. and Mahasseni, B. (2017) Fine-Grained Recognition as HSnet Search for Informative Image Parts. 2017 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, 21-26 July 2017, 6497-6506. [Google Scholar] [CrossRef
[15] Xiao, T.J., et al. (2015) The Application of Two-Level Attention Models in Deep Convolutional Neural Network for Fine-Grained Image Classification. Proceedings of the IEEE Confer-ence on Computer Vision and Pattern Recognition (CVPR), Boston, MA, 842-850.
[16] Simon, M. and Rodner, E. (2015) Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks. Proceed-ings of the 15th IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7-13 December 2015, 1143-1151. [Google Scholar] [CrossRef
[17] Lin, T.Y., Roychowdhury, A. and Maji, S. (2015) Bilin-ear CNN Models for Fine-Grained Visual Recognition. Proceedings of the 15th IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7-13 December 2015, 1449-1457. [Google Scholar] [CrossRef
[18] Fu, J.L., Zheng, H.L. and Mei, T. (2017) Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition. 2017 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, 21-26 July 2017, 4438-4446. [Google Scholar] [CrossRef
[19] Zhang, X.P., Xiong, H., Zhou, W., Lin, W. and Tian, Q. (2016) Picking Deep Filter Responses for Fine-Grained Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 27-30 June 2016, 1134-1142. [Google Scholar] [CrossRef
[20] Zhao, B., Wu, X., Feng, J.S., Peng, Q. and Yan, S. (2017) Diversi-fied Visual Attention Networks for Fine-Grained Object Classification. IEEE Transactions on Multimedia, 19, 1245-1256. [Google Scholar] [CrossRef
[21] Liu, X., Xia, T., Wang, J., et al. (2016) Fully Con-volutional Attention Localization Networks: Efficient Attention Localization for Fine-Grained Recognition.
https://arxiv.org/pdf/1603.06765.pdf
[22] Kong, S. and Fowlkes, C. (2017) Low-Rank Bilinear Pooling for Fi-ne-Grained Classification. 2017 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Hono-lulu, HI, 21-26 July 2017, 365-374. [Google Scholar] [CrossRef