基于改进坐标注意力和U-Net网络的高分辨率遥感图像建筑物提取
Building Extraction from High-Resolution Remote Sensing Images Based on Improved Coordinate Attention and U-Net Network
摘要: 在城市规划、统计调查和灾害应急评估等领域,从遥感图像中准确提取建筑物至关重要。然而,由于高分辨率遥感图像中建筑形态的多样性和地面环境的复杂性,实现建筑的完整、高精度提取仍然是一个挑战。为此,本文提出了一种用于从高分辨率遥感图像中提取建筑物的新网络,该网络保留了U-Net的编码器–解码器结构,并融合了坐标自注意模块(CSAM),以调整网络对输入图像中不同区域的关注程度,使得网络能够有选择性地捕捉和强调重要的语义信息,增强特征提取能力。在空间分辨率为0.3 m的WHU建筑物数据集上进行的实验结果表明,与U-Net、PSPNet、DeepLabV3+相比,所提出的网络能够获得更准确的建筑提取结果,达到98.21%的像素精度、95.28%的精准率、94.57%的召回率和90.34%的交并比。
Abstract: Accurately extracting buildings from remote sensing images is crucial in areas such as urban plan-ning, statistical surveys, and disaster emergency assessment. However, due to the diversity of building forms and the complexity of ground environment in high-resolution remote sensing imag-es, achieving complete and high-precision extraction of buildings remains a challenge. Therefore, this paper proposes a new network for extracting buildings from high-resolution remote sensing images, which retains the encoder decoder structure of U-Net and integrates a Coordinate Self At-tention Module (CSAM) to adjust the network’s attention to different regions in the input image, enabling the network to selectively capture and emphasize important semantic information and enhance feature extraction capabilities. The experimental results on the WHU building dataset with a spatial resolution of 0.3 m show that the proposed network can achieve more accurate building extraction results compared to U-Net, PSPNet, and DeepLabV3+, achieving pixel accuracy of 98.21%, accuracy of 95.28%, recall of 94.57%, and intersection to union ratio of 90.34%.
文章引用:陈康. 基于改进坐标注意力和U-Net网络的高分辨率遥感图像建筑物提取[J]. 应用数学进展, 2024, 13(3): 891-899. https://doi.org/10.12677/AAM.2024.133084

参考文献

[1] 王俊, 秦其明, 叶昕, 等. 高分辨率光学遥感图像建筑物提取研究进展[J]. 遥感技术与应用, 2016, 31(4): 653- 662+701.
[2] Al-Amri, S.S. and Kalyankar, N.V. (2010) Image Segmentation by Using Threshold Techniques. Com-puter Vision and Pattern Recognition, 2.
[3] Liow, Y.T. and Pavlidis, T. (1990) Use of Shadows for Extracting Build-ings in Aerial Images. Computer Vision, Graphics, and Image Processing, 49, 242-277. [Google Scholar] [CrossRef
[4] Avudaiammal, R., Elaveni, P., Selvan, S., et al. (2020) Ex-traction of Buildings in Urban Area for Surface Area Assessment from Satellite Imagery Based on Morphological Build-ing Index Using SVM Classifier. Journal of the Indian Society of Remote Sensing, 48, 1325-1344. [Google Scholar] [CrossRef
[5] Long, J., Shelhamer, E. and Darrell, T. (2015) Fully Convolu-tional Networks for Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 7-12 June 2015, 3431-3440. [Google Scholar] [CrossRef
[6] Badrinarayanan, V., Kendall, A. and Cipolla, R. (2017) Segnet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 2481-2495. [Google Scholar] [CrossRef
[7] Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. Medical Image Computing and Computer-Assisted Inter-vention—MICCAI 2015: 18th International Conference, Munich, 5-9 October 2015, 234-241. [Google Scholar] [CrossRef
[8] Zhao, H., Shi, J., Qi, X., et al. (2017) Pyramid Scene Parsing Network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 2881-2890. [Google Scholar] [CrossRef
[9] Chen, L.C., Papandreou, G., Kokkinos, I., et al. (2014) Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. Computer Vision and Pattern Recognition.
[10] Chen, L.C., Papandreou, G., Kokkinos, I., et al. (2017) Deeplab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 834-848. [Google Scholar] [CrossRef
[11] Chen, L.C., Papandreou, G., Schroff, F., et al. (2017) Rethinking Atrous Convolution for Semantic Image Segmentation. European Conference on Computer Vision, 833-851.
[12] Chen, L.C., Zhu, Y., Papandreou, G., et al. (2018) Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. Proceedings of the European Conference on Computer Vision (ECCV), 801-818. [Google Scholar] [CrossRef
[13] 李传林, 黄风华, 胡威, 等. 基于Res_AttentionUnet的高分辨率遥感影像建筑物提取方法[J]. 地球信息科学学报, 2021, 23(12): 2232-2243.
[14] Qiu, W., Gu, L., Gao, F., et al. (2023) Building Extraction from Very High-Resolution Remote Sensing Images Using Refine-UNet. IEEE Geo-science and Remote Sensing Letters, 20, 1-5. [Google Scholar] [CrossRef
[15] Hu, J., Shen, L. and Sun, G. (2018) Squeeze-and-Excitation Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake, 7132-7141. [Google Scholar] [CrossRef
[16] Woo, S., Park, J., Lee, J.Y., et al. (2018) CBAM: Convolutional Block Attention Module. Proceedings of the European Conference on Com-puter Vision (ECCV), 3-19. [Google Scholar] [CrossRef
[17] Hou, Q., Zhou, D. and Feng, J. (2021) Coordinate Attention for Efficient Mobile Network Design. Proceedings of the IEEE/CVF Conference on Com-puter Vision and Pattern Recognition, Nashville, 13713-13722. [Google Scholar] [CrossRef
[18] Ji, S., Wei, S. and Lu, M. (2018) Fully Convolutional Net-works for Multisource Building Extraction from an Open Aerial and Satellite Imagery Data Set. IEEE Transactions on Geoscience and Remote Sensing, 57, 574-586. [Google Scholar] [CrossRef