# 基于双重注意力特征增强网络的语义分割方法Dual Attention Based Feature Enhanced Networks for Semantic Segmentation

DOI: 10.12677/CSA.2020.1011205, PDF, HTML, XML, 下载: 139  浏览: 277  国家自然科学基金支持

Abstract: As one of the research hotspots in the field of computer vision, semantic segmentation has been widely applied in various fields such as geographic information systems, medical image analysis and robotics. However, contemporary semantic segmentation tasks generally face two challenges, namely intra-class inconsistency problem and inter-class indistinction problem. To this end, we solve the semantic segmentation by proposing Dual Attention based Feature Enhanced Networks. In this method, the position attention module and channel attention module are used to obtain rich spatial and context information, and the pyramid pooling module is added at the end of the network to aggregate the context information of different regions, which could improve the capability of the networks to capture global information. Finally, the experimental results on the standard dataset demonstrate the effectiveness of the proposed method.

1. 引言

Figure 1. Dual attention based feature enhanced networks

2. 位置注意力模块

$f\left({x}_{i},{x}_{j}\right)={\text{e}}^{\theta {\left({x}_{i}\right)}^{\text{T}}\phi \left({x}_{j}\right)}$ (1)

$f\left({x}_{i},{x}_{j}\right)$ 代表位置ij位置的特征之间的依赖关系，θφ是卷积操作，其中 $\left\{i,j\right\}\in {R}^{C×W×H}$

${y}_{ij}=\frac{1}{C\left(x\right)}{\sum }_{\forall j}f\left({x}_{i},{x}_{j}\right)$ (2)

${y}_{ij}$ 代表特征 ${x}_{j}$${x}_{i}$ 的影响，且 $C\left(X\right)={\sum }_{\forall j}f\left({x}_{i},{x}_{j}\right)$，根据softmax函数的定义，公式(2)可以进一步转化为公式(3)。

${y}_{ij}=\text{softmax}\left(\theta {\left({x}_{i}\right)}^{\text{T}}\phi \left({x}_{j}\right)\right)$ (3)

$Z=\text{CBR}\left({A}_{z}L\right)+H$ (4)

Figure 2. Components of the position attention block

3. 双重注意力特征增强网络

Figure 3. Components of the channel attention block

Figure 4. Components of the refinement residual block

$Loss=-{\sum }_{i=1}^{m}{y}_{i}\mathrm{log}\left({p}_{i}\right)$ (6)

$SNLOS{S}_{i}=CrossEntropyLoss\left({y}_{si};w\right)$ (7)

$BNLOS{S}_{i}=FocalLoss\left({y}_{bi};w\right)$ (8)

$L={\sum }_{i=0}^{3}SNLOS{S}_{i}+\sigma {\sum }_{i=0}^{3}BNLOS{S}_{i}$ (9)

4. 性能评价

4.1. 数据集及参数设置

PASCAL VOC 2012：作为语义分割标准数据库，PASCAL VOC 2012包括20个类别以及一个背景，其中包含1464张训练图像和1449张验证图像。通过使用语义边界数据集 [15] 对PASCAL VOC 2012进行扩充，扩充后的PASCAL VOC 2012数据集包含10582张训练数据集。

4.2. 实验结果

(a) 输入图像 (b) 语义分割标签图像 (c) 基准网络输出图像(d) 本文网络输出图像

Figure 5. Comparison of segmentation results between Dual Attention based Feature Enhanced Networks and benchmark network

Table 1. The comparison of the performance in Mean IOU between different algorithms

5. 结论

 [1] Garcia-Garcia, A., et al. (2017) A Review on Deep Learning Techniques Applied to Semantic Segmentation. Internation-al Conference on Computational Linguistics, Spain, 22 April 2017, 2132-2144. [2] 陈一鸣, 彭艳兵, 高剑飞. 基于深度学习的遥感图像新增建筑物语义分割[J]. 计算机与数字工程, 2019, 47(12): 3182-3186. [3] Long, J., Shel-hamer, E. and Darrell, T. (2014) Fully Convolutional Networks for Semantic Segmentation. IEEE Transactions on Pat-tern Analysis & Machine Intelligence, 39, 640-651. https://doi.org/10.1109/TPAMI.2016.2572683 [4] Yu, F. and Koltun, V. (2016) Multi-Scale Context Aggregation by Dilated Convolutions. [5] Wang, X., Girshick, R., Gupta, A., et al. (2018) Non-Local Neural Networks. IEEE Computer Society Conference on Computer Vision and Pattern Recogni-tion, Salt Lake City, 18-22 June 2018, 7794-7803. https://doi.org/10.1109/CVPR.2018.00813 [6] Huang, Z., Wang, X., Huang, L., et al. (2019) CCNet: Criss-Cross Attention for Semantic Segmentation. IEEE International Conference on Computer Vision, Seoul, 27-28 October 2019, 603-612. https://doi.org/10.1109/ICCV.2019.00069 [7] Chao, P., et al. (2017) Large Kernel Matters—Improve Semantic Segmentation by Global Convolutional Network. IEEE Conference on Computer Vision and Pattern Recognition, Hono-lulu, 21-26 July 2017, 1743-1751. https://doi.org/10.1109/CVPR.2017.189 [8] Chen, L., et al. (2020) ANU-Net: Attention-Based Nested U-Net to Exploit Full Resolution Features for Medical Image Segmentation. Computers & Graphics, 90, 11-20. https://doi.org/10.1016/j.cag.2020.05.003 [9] Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convolu-tional Networks for Biomedical Image Segmentation. International Conference on Medical Image Computing and Com-puter-Assisted Intervention, Munich, 5-9 October 2015, 234-241. https://doi.org/10.1007/978-3-319-24574-4_28 [10] Yu, C., et al. (2018) BiSeNet: Bilateral Segmentation Network for Real-Time Semantic Segmentation. European Conference on Computer Vision, Munich, 8-14 September 2018, 334-349. https://doi.org/10.1007/978-3-030-01261-8_20 [11] Yu, C., Wang, J., Peng, C., et al. (2018) Learning a Discriminative Feature Network for Semantic Segmentation. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-22 June 2018, 1857-1866. https://doi.org/10.1109/CVPR.2018.00199 [12] Zhao, H., Shi, J., Qi, X., et al. (2016) Pyramid Scene Parsing Network. IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 6230-6239. https://doi.org/10.1109/CVPR.2017.660 [13] 翟鹏博, 杨浩, 宋婷婷. 结合注意力机制的双路径语义分割[J]. 中国图象图形学报, 2020, 25(8): 1627-1636. [14] Lin, T.Y., Goyal, P., Girshick, R., et al. (2017) Focal Loss for Dense Object Detection. IEEE Transactions on Pattern Analysis & Machine Intelligence, 99, 2999-3007. [15] Hari-haran, B., Arbelaez, P., Bourdev, L.D., et al. (2011) Semantic Contours from Inverse Detectors. IEEE International Conference on Computer Vision, Barcelona, 6-13 November 2011, 991-998. https://doi.org/10.1109/ICCV.2011.6126343 [16] Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2012) Imagenet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems, Vol. 2, 1097-1105. [17] 熊炜, 童磊, 金靖熠. 基于卷积神经网络的语义分割算法研究[J/OL]. 计算机应用研究, 2020, 38(3): 1-5. [18] Russakovsky, O., Deng, J., Su, H., et al. (2015) ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 115, 211-252. https://doi.org/10.1007/s11263-015-0816-y [19] Liu, W., Rabinovich, A. and Berg, A.C. (2015) ParseNet: Looking Wider to See Better. arXiv preprint arXiv: 1506.04579 [20] Chen, L.C., Papandreou, G., Schroff, F., et al. (2017) Rethinking Atrous Convolution for Se-mantic Image Segmentation. [21] Zhang, H., Dana, K., Shi, J., et al. (2018) Context Encoding for Semantic Segmenta-tion. IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 7151-7160. https://doi.org/10.1109/CVPR.2018.00747 [22] Zhang, Z., Zhang, X., Peng, C., et al. (2018) ExFuse: Enhancing Feature Fusion for Semantic Segmentation. European Conference on Computer Vision, Munich, 8-14 September 2018, 269-284. https://doi.org/10.1007/978-3-030-01249-6_17 [23] Jun, F., et al. (2019) Stacked Deconvolutional Network for Semantic Segmentation. International Conference on Image Processing, Taipei, 22-25 September 2019, 3085-3089. https://doi.org/10.1109/TIP.2019.2895460 [24] Luo, P., Wang, G., Lin, L., et al. (2017) Deep Dual Learning for Semantic Image Segmentation. IEEE International Conference on Computer Vision, Venice, 22-29 October 2017, 2737-2745. https://doi.org/10.1109/ICCV.2017.296 [25] Xia, L., Zhong, Z.S., Wu, J.L., et al. (2019) Expecta-tion-Maximization Attention Networks for Semantic Segmentation. IEEE International Conference on Computer Vision, Seoul, 27-28 October 2019, 9166-9175.