# 显著性目标检测轮廓增强技术研究Refinement-Based Approach of Saliency Detection

• 全文下载: PDF(1727KB)    PP.107-113   DOI: 10.12677/CSA.2018.81014
• 下载量: 759  浏览量: 1,239

The aim of saliency detection is to find out significant regions of an image. Traditional salient ob-ject detection methods often use various prior knowledge and hand-crafted features to formulate contrast to get the saliency object, which have poor adaptability. Recently, deep learning is more and more popular in saliency detection. With a comprehensive learning set, the result will be much better than the traditional methods, especially for complex scenes. In this paper, a new deep learning model is proposed with a coarse-extracting process and fine-refining process. The coarse-extracting process contains two subnetworks. The first subnetwork’s output feature map with the local context of superpixels cascade to the second subnetwork’s high feature map extracted by VGG, then generating a coarse saliency prediction map. The fine-refining process composed of a series of recurrent convolution layers refining the coarse prediction map from coarse scales to fine scales, finally generating a fine saliency map.

1. 介绍

Liu等人 [11] 提出用两个子网络来产生预测map图，用VGG16 [12] 提取粗糙的全局的预测，另一个网络由一系列循环卷积层结合前一个网络中相应的特征进行精度提炼。该方法得到的显著性检测效果良好，虽然使用了多层循环卷积网络，但是多次下采样还是会导致结果边缘模糊，并性能上有待提高。本文提出的新模型，输入的是原图像，这样有利于抓住原图像的整体空间信息(整体上下文)，端对端输出显著性区域图像。首先通过VGG16 [12] 和Region-CNN两个子网络得到一个粗略的结果，用于大概定位显著性目标的形状和位置，然后通过一系列RCL (Recurrent CNN Layer) [13] 来将粗略的显著性区域图像的轮廓精度提升，最后得到比较精确的结果。

2. 模型介绍

Figure 1. Model structure

2.1. Region-CNN子网络

Region-CNN分为两个阶段。第一个是区域分割阶段，这里采用的是Mean Shift算法，该算法抗噪性和边缘贴合度好，生成的超像素极不规则。第二个阶段时CNN阶段划，对分割后的图像进行操作，具体细节如下图2所示：经过八层的神经网络，第一层和第二层分别是一次卷积和一次最大值池化，第三层和第四层都是卷积操作，第五层是一次卷积和一次最大值池化，第六层和第七层是一次全连接得到一个392维的向量，即为之前图1中所示的用于拼接的392维向量。

2.2. RCL (Recurrent CNN Layer)与轮廓精度提升

${x}_{ijk}\left(t\right)=g\left(f\left({z}_{ijk}\left(t\right)\right)\right)$ (1)

$g\left({f}_{ijk}\left(t\right)\right)=\frac{{f}_{ijk}\left(t\right)}{{\left(1+\frac{\alpha }{N}{\sum }_{{k}^{\prime }=\mathrm{max}\left(0,k-N/2\right)}^{\mathrm{min}\left(K,k+\frac{N}{2}\right)}{\left({f}_{ij{k}^{\prime }}\right)}^{2}\right)}^{\beta }}$ (2)

f(z(t))缩写成f，其中K是feature map的个数，N是相邻feature map的大小，α和β是两个常量，分

Figure 2. Convolution process of the Region-CNN

${z}_{ijk}\left(t\right)={\left({w}_{k}^{f}\right)}^{\text{T}}{u}^{\left(i,j\right)}+{\left({w}_{k}^{r}\right)}^{\text{T}}{x}^{\left(i,j\right)}\left(t-1\right)+{b}_{k}$ (3)

3. 训练方法

$C=-\frac{1}{n}{\sum }^{\text{​}}\left[y\mathrm{ln}a+\left(1-y\right)\mathrm{ln}\left(1-a\right)\right]$

4. 实验

5. 总结

Figure 4. The experimental results

Figure 5. Our model compares the recall rate and accuracy with other algorithms in four data sets

 [1] Cheng, M., Mitra, N.J., Huang, X., Torr, P.H. and Hu, S. (2015) Global Contrast Based Salient Region Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 569-582. https://doi.org/10.1109/TPAMI.2014.2345401 [2] Guo, C. and Zhang, L. (2010) A Novel Multiresolution Spatiotemporal Saliency Detection Model and Its Applications in Image and Video Compression. IEEE TIP, 19, 185-198. [3] Sun, J., Xie, J., Liu, J. and Sikora, T. (2013) Image Adaptation and Dynamic Browsing Based on Two-Layer Saliency Combination. IEEE Transactions on Broadcasting, 59, 602-613. https://doi.org/10.1109/TBC.2013.2272172 [4] Margolin, R., Zelnik-Manor, L. and Tal, A. (2013) Saliency for Image Manipulation. The Visual Computer, 29, 1-12. https://doi.org/10.1007/s00371-012-0740-x [5] Itti, L., Koch, C. and Niebur, E. (1998) A Model of Saliency-Based Visual Attention for Rapid Scene Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 1254-1259. https://doi.org/10.1109/34.730558 [6] Harel, J., Koch, C. and Perona, P. (2006) Graph-Based Visual Saliency. NIPS, 545-552. [7] Achanta, R., Hemami, S., Estrada, F. and Susstrunk, S. (2009) Frequency-Tuned Salient Region Detection. 2009 IEEE Conference on Computer Vision and Pattern Recognition, June 20-25 2009, Miami, 1597-1604. https://doi.org/10.1109/CVPR.2009.5206596 [8] LeCun, Y., Bottou, L., Bengio, Y. and Haffner, P. (1998) Gradient Based Learning Applied to Document Recognition. Proceedings of the IEEE, 86, 2278-2324. https://doi.org/10.1109/5.726791 [9] He, S., Lau, R., Liu, W., Huang, Z. and Yang, Q. (2015) Supercnn: A Superpixelwise Convolutional Neural Network for Salient Object Detection. International Journal of Computer Vision, 115, 330-344. https://doi.org/10.1007/s11263-015-0822-0 [10] Wang, L., Lu, H., Ruan, X. and Yang, M.-H. (2015) Deep Networks for Saliency Detection via Local Estimation and Global Search. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 7-12 2015, Boston. https://doi.org/10.1109/CVPR.2015.7298938 [11] Liu, N. and Han, J. (2016) Dhsnet: Deep Hierarchical Saliency Network for Salient Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 27-30 2016, Las Vegas, 678-686. https://doi.org/10.1109/CVPR.2016.80 [12] Simonyan, K. and Zisserman, A. (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition. Computer Vision and Pattern Recognition. arXiv:1409.1556 [13] Liang, M. and Hu, X. (2015) Recurrent Convolutional Neural Network for Object Recognition. 2015 IEEE Conference on Computer Vision and Pattern Recognition, June 7-12 2015, Boston. https://doi.org/10.1109/CVPR.2015.7298958 [14] Comaniciu, D. and Meer, P. (2002) Mean Shift: A Robust Approach toward Feature Space Analysis. IEEE Transactions on Pattern Analysis & Machine Intelligence, 24, 603-619. https://doi.org/10.1109/34.1000236 [15] Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2012) Imagenet Classification with Deep Convolutional Neural Networks. NIPS, 1097-1105. [16] Deng, L.Y. (2006) The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning. Technometrics, 48. https://doi.org/10.1198/tech.2006.s353