基于Grad-CAM与B-CNN的细粒度图像分类方法研究
Fine-Grained Image Classification Algorithm Based on Grad-CAM and B-CNN
摘要: 细粒度图像具有类间差异小,类内差异大的特点。图像之间的差异主要存在于细微的局部区域,局部区域定位及其代表性特征提取成为细粒度图像分类的主要研究问题之一。本文基于Grad-CAM和双线性卷积神经网络B-CNN模型对细粒度图像分类方法进行研究,它利用Grad-CAM模型定位原图像中的显著区域,并裁剪出显著性区域图像作为双线性CNN的输入,融合全局和局部的特征,从而完成分类。在CUB-200-2011、Stanford Dogs和Stanford Cars三个数据集上的实验表明,相较于传统模型,该方法能够更加准确定位图像特征显著区域,具有更好的分类效果。
Abstract: Fine-grained images are characterized by small differences between classes and large differences within classes. The differences between images mainly exist in subtle local areas, and local area localization and its representative feature extraction have become one of the main research issues in fine-grained image classification. In this paper, the fine-grained categorization method is studied based on the Grad-CAM and the Bilinear Convolution Neural Networks B-CNN. It uses the Grad-CAM model to locate the salient region in the original image, and crops the salient region image as the input of the bilinear CNN, fusing the global and local features to complete the classification. Experiments on the three datasets of CUB-200-2011, Stanford Dogs and Stanford Cars show that compared with the traditional model, this method can more accurately locate areas with significant image features and have better classification effects.
文章引用:邓绍伟, 张伯泉. 基于Grad-CAM与B-CNN的细粒度图像分类方法研究[J]. 计算机科学与应用, 2020, 10(5): 841-850. https://doi.org/10.12677/CSA.2020.105087

参考文献

[1] 杨兴. 基于B-CNN模型的细粒度分类算法研究[D]: [硕士学位论文]. 北京: 中国地质大学, 2017.
[2] Fu, J., Zheng, H. and Mei, T. (2017) Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fi-ne-Grained Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hon-olulu, 21-26 July 2017, 4438-4446. [Google Scholar] [CrossRef
[3] 罗建豪, 吴建鑫. 基于深度卷积特征的细粒度图像分类研究综述[J]. 自动化学报, 2017, 43(8): 1306-1318.
[4] 盛纾纬. 基于弱监督学习的细粒度图像识别技术研究[D]: [硕士学位论文]. 成都: 电子科技大学, 2019.
[5] Zhang, N., Donahue, J., Girshick, R., et al. (2014) Part-Based RCNNs for Fine-Grained Category Detection. In: Proceedings of the 13th European Conference on Computer Vision, Springer, Zurich, 834-849. [Google Scholar] [CrossRef
[6] Huang, S., Xu, Z., Tao, D., et al. (2016) Part-Stacked CNN for Fine-Grained Visual Categorization. Computer Vision and Pattern Recognition IEEE, Las Vegas, 27-30 June 2016, 1173-1182. [Google Scholar] [CrossRef
[7] Krause, J., Jin, H.L., Yang, J.C., et al. (2015) Fi-ne-Grained Recognition without Part Annotations. Proceedings of the 15th IEEE International Conference on Computer Vision and Pattern Recognition, Boston, 7-12 June 2015, 5546-5555. [Google Scholar] [CrossRef
[8] Xiao, T.J., Xu, Y.C., Yang, K.Y., et al. (2015) The Application of Two-Level Attention Models in Deep Convolutional Neural Network for Fine-Grained Image Classification. Pro-ceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, 7-12 June 2015, 842-850. [Google Scholar] [CrossRef
[9] Lin, T.Y., Aruni, R., Maji, S., et al. (2015) Bilinear CNN Mod-els for Fine-Grained Visual Recognition. Proceedings of the 15th IEEE International Conference on Computer Vision, Santiago, 7-13 December 2015, 1449-1457. [Google Scholar] [CrossRef
[10] Selvaraju, R., Cogswell, M., Das, A., et al. (2017) Grad-Cam: Visual Explanations from Deep Networks via Gradient-Based Localization. Proceedings of the IEEE International Conference on Computer Vision, Venice, 22-29 October 2017, 618-626. [Google Scholar] [CrossRef
[11] Zhou, B., Khosla, A., Lapedriza, A., et al. (2016) Learning Deep Features for Discriminative Localization. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 2921-2929. [Google Scholar] [CrossRef
[12] He, K.M., Zhang, X.Y., Ren, S.Q., et al. (2016) Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef
[13] Wah, C., Branson, S., Welinder, P., et al. (2011) The Caltech-UCSD Birds-200-2011 Dataset. Computation & Neural Systems Technical Report, CNS-TR, California Institute of Technology, Pasadena.
[14] Khosla, A., Jayadevaprakkash, N., Yao, B.P., et al. (2011) Novel Dataset for Fine-Grained Image Cate-gorization: Stanford Dogs. Proceedings of the 1st Workshop on Fine-Grained Visual Categorization, IEEE Conference on Computer Vision and Pattern Recognition, 1-2.
[15] Krause, J., Stark, M., Deng, J., et al. (2013) 3d Object Repre-sentations for Fine-Grained Categorization. Proceedings of the 4th IEEE Workshop on 3D Representation, IEEE Interna-tional Conference on Computer Vision, Sydney, 2-8 December 2013, 554-561. [Google Scholar] [CrossRef
[16] Zhang, X.P., Xiong, H.K., Zhou, W.G., et al. (2016) Picking Deep Filter Responses for Fine-Grained Image Recognition. IEEE Conference on Computer Visio and Pattern Recognition, Las Vegas, 27-30 June 2016, 1134-1142. [Google Scholar] [CrossRef
[17] Xiao, T., Xu, Y., Yang, K., et al. (2015) The Application of Two-Level Attention Models in Deep Convolutional Neural Network for Fine-Grained Image Classification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 7-12 June 2015, 842-850.
[18] Kong, S. and Fowlkes, C. (2017) Low-Rank Bilinear Pooling for Fine-Grained Classification. 2017 IEEE Conference on Com-puter Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 7025-7034. [Google Scholar] [CrossRef
[19] Simon, M. and Rodner, E. (2015) Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks. Proceedings of the 15th IEEE International Confer-ence on Computer Vision, Santiago, 7-13 December 2015, 1143-1151. [Google Scholar] [CrossRef
[20] Zhao, B., Wu, X., Feng, J., et al. (2017) Diversified Visual Attention Networks for Fine-Grainde Object Classification. IEEE Transactions on Multimedia, 19, 1245-1256. [Google Scholar] [CrossRef