基于改进HRNet的遥感图像地物覆盖语义分割研究
Research on Semantic Segmentation Method for Remote Sensing Image Land Cover Based on Improved HRNet
摘要: 随着科技的发展,利用深度学习的方法帮助遥感图像地物覆盖语义分割工作取得了很大的进展。然而,针对于遥感语义分割数据集中存在的像素分布不平衡问题,我们提出了基于注意力的HRNet (Attention-based HRNet, AbHRNet)结构。首先,针对于各个类别目标之间的像素数量不平衡问题,本文在特征提取网络中引入了卷积注意力模块,使得网络对于我们感兴趣的目标特征尤其是数量较少的目标特征赋予了更多的关注,并减小了由复杂的背景信息带来的干扰;其次,针对于目标和目标、目标和背景之间像素数量不平衡的问题,在基准网络交叉熵损失的基础上又引入了二元交叉熵损失和Dice Loss,以实现对背景样本的有效监督,并解决由于像素数量不平衡带来的模型难以优化的问题。在LoveDA数据集上的实验结果表明,我们提出的AbHRNet的平均交并比达到了51.14%,相较于基准HRNet模型提升了1.97%,尤其是帮助分割效果很差的荒地类别的精度提升了一倍。
Abstract: With the development of science and technology, great progress has been made in the remote sensing image land cover semantic segmentation task by using deep learning methods. However, in view of the unbalanced pixel distribution of remote sensing semantic segmentation dataset, we propose the Attention-based HRNet (AbHRNet) structure. First, in view of the imbalance problem of the number of pixels between various category targets, Convolutional Block Attention Module (CBAM) is introduced into the feature extraction network, so that the network gives more attention to the target features that we are interested in, especially the target features with a small number, and reduces the interference caused by complex background information; second, in view of the problem of the unbalanced pixels number between target and target, target and background, bina-ry cross entropy loss and Dice Loss are introduced on the basis of cross entropy loss of baseline network to achieve effective supervision of background samples and to solve the problem is difficult to be optimized due to the unbalanced pixels number. The experimental results on the LoveDA dataset show that the mean intersection over union (mIoU) of our proposed AbHRNet reaches 51.14%, which has a 1.97% improvement compared to the baseline HRNet model, especially helping the barren category with poor segmentation effect to double the accuracy.
文章引用:张琦智, 王正勇, 何小海, 陈洪刚. 基于改进HRNet的遥感图像地物覆盖语义分割研究[J]. 计算机科学与应用, 2022, 12(12): 2657-2666. https://doi.org/10.12677/CSA.2022.1212269

参考文献

[1] Shelhamer, E., Long, J. and Darrell, T. (2017) Fully Convolutional Networks for Semantic Segmentation. IEEE Transac-tions on Pattern Analysis and Machine Intelligence, 39, 640-651. [Google Scholar] [CrossRef
[2] Badrinarayanan, V., Kendall, A. and Cipolla, R. (2017) Se-gNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 2481-2495. [Google Scholar] [CrossRef
[3] Noh, H., Hong, S. and Han, B. (2015) Learning Deconvolu-tion Network for Semantic Segmentation. IEEE International Conference on Computer Vision, Santiago, 11-18 Decem-ber 2015, 1520-1528. [Google Scholar] [CrossRef
[4] Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convolu-tional Networks for Biomedical Image Segmentation. 18th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), Munich, 5-9 October 2015, 234-241. [Google Scholar] [CrossRef
[5] Zhou, Z., Siddiquee, M.M.R., Tajbakhsh, N. and Liang, J. (2018) UNet++: A Nested U-Net Architecture for Medical Image Segmentation. Deep Learning in Medical Image Anal-ysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th Inter-national Workshop, ML-CDS 2018, Granada, 11045, 3-11. [Google Scholar] [CrossRef] [PubMed]
[6] Wang, J., Sun, K., Cheng, T., et al. (2021) Deep High-Resolution Representation Learning for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 3349-3364. [Google Scholar] [CrossRef
[7] Sulla-Menashe, D. and Friedl, M.A. (2018) User Guide to Collection 6 MODIS Land Cover (MCD12Q1 and MCD12C1) Product. USGS, Reston, 1, 18.
[8] Alemohammad, H. and Booth, K. (2020) LandCoverNet: A Global Benchmark Land Cover Classification Training Dataset.
https://arxiv.org/abs/2012.03111
[9] Jun, C., Ban, Y. and Li, S. (2014) Open Access to Earth Land-Cover Map. Nature, 514, 434-434. [Google Scholar] [CrossRef] [PubMed]
[10] Mou, L.C., Hua, Y.S. and Zhu, X.X. (2019) A Relation-Augmented Fully Convolutional Network for Semantic Segmentation in Aerial Scenes. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Proceedings, Long Beach, 16-20 June 2019, 12408-12417.
[11] Volpi, M. and Ferrari, V. (2015) Semantic Segmentation of Urban Scenes by Learning Local Class Interactions. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 7-12 Jun 2015, 1-9. [Google Scholar] [CrossRef
[12] Marcos, D., Volpi, M., Kellenberger, B. and Tuia, D. (2018) Land Cover Mapping at Very High Resolution with Rotation Equivariant CNNs: Towards Small Yet Accurate Models. ISPRS Journal of Photogrammetry and Remote Sensing, 145, 96-107. [Google Scholar] [CrossRef
[13] Wang, J., Zheng, Z., Ma, A., Lu, X. and Zhong, Y. (2021) LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation.
https://arxiv.org/abs/2110.08733
[14] Woo, S.H., Park, J., Lee, J.Y. and Kweon, I.S. (2018) CBAM: Convolu-tional Block Attention Module. 15th European Conference on Computer Vision (ECCV), Munich, 8-14 September 2018, 3-19. [Google Scholar] [CrossRef
[15] Milletari, F., Navab, N. and Ahmadi, S.A. (2016) V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. 4th IEEE International Conference on 3D Vision (3DV), Stanford, 25-28 October 2016, 565-571. [Google Scholar] [CrossRef
[16] Chen, L.C.E., Zhu, Y.K., Papandreou, G., Schroff, F. and Adam, H. (2018) Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. 15th European Confer-ence on Computer Vision (ECCV), Munich, 8-14 September 2018, 833-851. [Google Scholar] [CrossRef
[17] Li, H., Xiong, P., An, J. and Wang, L. (2018) Pyramid Atten-tion Network for Semantic Segmentation.
https://arxiv.org/abs/1805.10180
[18] Kirillov, A., Girshick, R., Kaiming, H. and Dollar, P. (2019) Panoptic Fea-ture Pyramid Networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Proceedings, Long Beach, 15-20 June 2019, 6392-6401. [Google Scholar] [CrossRef
[19] Zhao, H.S., Shi, J.P., Qi, X.J., Wang, X.G. and Jia, J.Y. (2017) Pyramid Scene Parsing Network. 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 6230-6239. [Google Scholar] [CrossRef
[20] Chaurasia, A. and Culurciello, E. (2017) LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation. IEEE Visual Communications and Image Processing (VCIP), St. Petersburg, 10-13 December 2017, 1-4. [Google Scholar] [CrossRef
[21] Zheng, Z., Zhong, Y.F., Wang, J.J. and Ma, A.L. (2020) Fore-ground-Aware Relation Network for Geospatial Object Segmentation in High Spatial Resolution Remote Sensing Im-agery. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 14-19 June 2020, 4095-4104. [Google Scholar] [CrossRef
[22] Ma, A., Wang, J., Zhong, Y. and Zheng, Z. (2021) Factseg: Foreground Activation-Driven Small Object Semantic Segmentation in Large-Scale Remote Sensing Im-agery. IEEE Transactions on Geoscience Remote Sensing, 60, 1-16. [Google Scholar] [CrossRef