# 基于双重注意力特征增强网络的语义分割方法Dual Attention Based Feature Enhanced Networks for Semantic Segmentation

Abstract: As one of the research hotspots in the field of computer vision, semantic segmentation has been widely applied in various fields such as geographic information systems, medical image analysis and robotics. However, contemporary semantic segmentation tasks generally face two challenges, namely intra-class inconsistency problem and inter-class indistinction problem. To this end, we solve the semantic segmentation by proposing Dual Attention based Feature Enhanced Networks. In this method, the position attention module and channel attention module are used to obtain rich spatial and context information, and the pyramid pooling module is added at the end of the network to aggregate the context information of different regions, which could improve the capability of the networks to capture global information. Finally, the experimental results on the standard dataset demonstrate the effectiveness of the proposed method.

1. 引言

Figure 1. Dual attention based feature enhanced networks

2. 位置注意力模块

$f\left({x}_{i},{x}_{j}\right)={\text{e}}^{\theta {\left({x}_{i}\right)}^{\text{T}}\phi \left({x}_{j}\right)}$ (1)

$f\left({x}_{i},{x}_{j}\right)$ 代表位置ij位置的特征之间的依赖关系，θφ是卷积操作，其中 $\left\{i,j\right\}\in {R}^{C×W×H}$

${y}_{ij}=\frac{1}{C\left(x\right)}{\sum }_{\forall j}f\left({x}_{i},{x}_{j}\right)$ (2)

${y}_{ij}$ 代表特征 ${x}_{j}$${x}_{i}$ 的影响，且 $C\left(X\right)={\sum }_{\forall j}f\left({x}_{i},{x}_{j}\right)$，根据softmax函数的定义，公式(2)可以进一步转化为公式(3)。

${y}_{ij}=\text{softmax}\left(\theta {\left({x}_{i}\right)}^{\text{T}}\phi \left({x}_{j}\right)\right)$ (3)

$Z=\text{CBR}\left({A}_{z}L\right)+H$ (4)

Figure 2. Components of the position attention block

3. 双重注意力特征增强网络

Figure 3. Components of the channel attention block

Figure 4. Components of the refinement residual block

$Loss=-{\sum }_{i=1}^{m}{y}_{i}\mathrm{log}\left({p}_{i}\right)$ (6)

$SNLOS{S}_{i}=CrossEntropyLoss\left({y}_{si};w\right)$ (7)

$BNLOS{S}_{i}=FocalLoss\left({y}_{bi};w\right)$ (8)

$L={\sum }_{i=0}^{3}SNLOS{S}_{i}+\sigma {\sum }_{i=0}^{3}BNLOS{S}_{i}$ (9)

4. 性能评价

4.1. 数据集及参数设置

PASCAL VOC 2012：作为语义分割标准数据库，PASCAL VOC 2012包括20个类别以及一个背景，其中包含1464张训练图像和1449张验证图像。通过使用语义边界数据集 [15] 对PASCAL VOC 2012进行扩充，扩充后的PASCAL VOC 2012数据集包含10582张训练数据集。

4.2. 实验结果

(a) 输入图像 (b) 语义分割标签图像 (c) 基准网络输出图像(d) 本文网络输出图像

Figure 5. Comparison of segmentation results between Dual Attention based Feature Enhanced Networks and benchmark network

Table 1. The comparison of the performance in Mean IOU between different algorithms

5. 结论

