深度学习在高分辨率遥感图像语义分割中的算法研究
Algorithm Research of Deep Learning in High-Resolution Remote Sensing Image Semantic Segmentation
DOI: 10.12677/AIRR.2022.114048, PDF,    国家自然科学基金支持
作者: 邓露露, 张长伦, 邢 思:北京建筑大学理学院,北京
关键词: 深度学习高分辨率遥感图像语义分割Deep Learning High-Resolution Remote Sensing Image Semantic Segmentation
摘要: 遥感图像语义分割是为遥感图像分配像素级语义标签的计算机视觉任务。随着传感器技术以及深度学习的发展,深度学习算法在精度与速度上远超传统算法。其中,基于深度学习的高分辨率遥感图像语义分割的算法成为众多学者的主要研究方向之一。本文主要针对深度学习在遥感图像语义分割中的相关算法以及网络结构进行介绍。首先介绍语义分割CNN网络,其次分别从三个方面对高分辨率遥感图像语义分割算法进行阐述:一是结合多尺度、多阶段、上下文聚合策略,二是在语义分割之后采用后处理技术,三是结合注意力机制。随后介绍经典数据集,最后对未来深度学习在高分辨率遥感图像语义分割中的算法的发展进行总结与展望。
Abstract: Remote sensing image semantic segmentation is a computer vision task to assign pixel level semantic labels to remote sensing images. With the development of sensor technology and deep learning, deep learning algorithm is far superior to traditional algorithms in accuracy and speed. Among them, the algorithm of high-resolution remote sensing image semantic segmentation based on deep learning has become one of the main research directions of many scholars. This paper mainly introduces the related algorithms and network structure of deep learning in remote sensing image semantic segmentation. First, the semantic segmentation CNN network is introduced, and then the semantic segmentation algorithm of high-resolution remote sensing images is described from three directions: first, combining multi-scale, multi-stage, context aggregation strategies, second, using post-processing technology after semantic segmentation, and third, combining attention mechanism. Then we introduce the classical datasets, and finally, summarize and prospect the development of deep learning algorithm in the high-resolution remote sensing images semantic segmentation in the future.
文章引用:邓露露, 张长伦, 邢思. 深度学习在高分辨率遥感图像语义分割中的算法研究[J]. 人工智能与机器人研究, 2022, 11(4): 468-479. https://doi.org/10.12677/AIRR.2022.114048

参考文献

[1] Maggiori, E., Tarabalka, Y., Charpiat, G. and Alliez, P. (2017) Convolutional Neural Networks for Large-Scale Remote-Sensing Image Classification. IEEE Transactions on Geoscience and Remote Sensing, 55, 645-657. [Google Scholar] [CrossRef
[2] Cheng, G., Zhou, P. and Han, J. (2016) Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images. IEEE Transactions on Geoscience and Remote Sensing, 54, 7405-7415. [Google Scholar] [CrossRef
[3] Zhu, H., Jiao, L., Ma, W., Liu, F. and Zhao, W. (2019) A Novel Neural Network for Remote Sensing Image Matching. IEEE Transactions on Neural Networks and Learning Systems, 30, 2853-2865. [Google Scholar] [CrossRef
[4] Zhu, H., Ma, W., Li, L., Jiao, L., Yang, S. and Hou, B. (2020) A Dual-Branch Attention Fusion Deep Network for Multiresolution Remote-Sensing Image Classification. Information Fusion, 58, 116-131. [Google Scholar] [CrossRef
[5] Maboudi, M., Amini, J., Malihi, S. and Hahn, M. (2018) Integrating Fuzzy Object Based Image Analysis and Ant Colony Optimization for Road Extraction from Remotely Sensed Images. ISPRS Journal of Photogrammetry and Remote Sensing, 138, 151-163. [Google Scholar] [CrossRef
[6] Zhang, Q. and Seto, K.C. (2011) Mapping Urbanization Dynamics at Regional and Global Scales Using Multi-Temporal DMSP/OLS Nighttime Light Data. Remote Sensing of Environment, 115, 2320-2329. [Google Scholar] [CrossRef
[7] Marcos, D., Volpi, M., Kellenberger, B. and Tuia, D. (2018) Land Cover Mapping at Very High Resolution with Rotation Equivariant CNNs: Towards Small Yet Accurate Models. ISPRS Journal of Photogrammetry and Remote Sensing, 145, 96-107. [Google Scholar] [CrossRef
[8] Li, A., Jiao, L., Zhu, H., Li, L. and Liu, F. (2022) Multitask Semantic Boundary Awareness Network for Remote Sensing Image Segmentation. IEEE Transactions on Geoscience and Remote Sensing, 60, 1-14. [Google Scholar] [CrossRef
[9] Maxwell, S.K., Schmidt, G.L. and Storey, J.C. (2007) A Multi-Scale Segmentation Approach to Filling Gaps in Landsat ETM+ SLC-Off Images. International Journal of Remote Sensing, 28, 5339-5356. [Google Scholar] [CrossRef
[10] Ton, J., Sticklen, J. and Jain, A.K. (1991) Knowledge-Based Segmentation of Landsat Images. IEEE Transactions on Geoscience and Remote Sensing, 29, 222-232. [Google Scholar] [CrossRef
[11] Liu, D., Han, L., Ning, X. and Zhu, Y. (2018) A Segmentation Method for High Spatial Resolution Remote Sensing Images Based on the Fusion of Multifeatures. IEEE Geoscience and Remote Sensing Letters, 15, 1274-1278. [Google Scholar] [CrossRef
[12] Lu, L., Wang, C. and Yin, X. (2019) Incorporating Texture into SLIC Super-Pixels Method for High Spatial Resolution Remote Sensing Image Segmentation. 2019 8th International Conference on Agro-Geoinformatics, Istanbul, 16-19 July 2019, 1-5. [Google Scholar] [CrossRef
[13] Yang, P., Hou, Z., Liu, X. and Shi, Z. (2016) Texture Feature Extraction of Mountain Economic Forest Using High Spatial Resolution Remote Sensing Images. IEEE International Geoscience and Remote Sensing Symposium, Beijing, 10-15 July 2016, 3156-3159. [Google Scholar] [CrossRef
[14] Fu, Y., et al. (2017) An Improved Combination of Spectral and Spatial Features for Vegetation Classification in Hyperspectral Images. Remote Sensing, 9, Article No. 261. [Google Scholar] [CrossRef
[15] Tatsumi, K., Yamashiki, Y., Canales Torres, M.A. and Taipe, C.L.R. (2015) Crop Classification of Upland Fields Using Random Forest of Time-Series Landsat 7 ETM+ Data. Computers and Electronics in Agriculture, 115, 171-179. [Google Scholar] [CrossRef
[16] Zhong, P. and Wang, R. (2007) A Multiple Conditional Random Fields Ensemble Model for Urban Area Detection in Remote Sensing Optical Images. IEEE Transactions on Geoscience and Remote Sensing, 45, 3978-3988. [Google Scholar] [CrossRef
[17] Adede, C., Oboko, R., Wagacha, P.W. and Atzberger, C. (2019) A Mixed Model Approach to Vegetation Condition Prediction Using Artificial Neural Networks (ANN): Case of Kenya’s Operational Drought Monitoring. Remote Sensing, 11, Article No. 1099. [Google Scholar] [CrossRef
[18] Zhang, C., et al. (2018) A Hybrid MLP-CNN Classifier for Very Fine Resolution Remotely Sensed Image Classification. ISPRS Journal of Photogrammetry and Remote Sensing, 140, 133-144. [Google Scholar] [CrossRef
[19] Wang, L., Li, R., Duan, C., Zhang, C., Meng, X. and Fang, S. (2021) A Novel Transformer Based Semantic Segmentation Scheme for Fine-Resolution Remote Sensing Images. ArXiv: 2104.12137.
http://arxiv.org/abs/2104.12137
[20] Long, J., Shelhamer, E. and Darrell, T. (2015) Fully Convolutional Networks for Semantic Segmentation. 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, 7-12 June 2015, 3431-3440. [Google Scholar] [CrossRef
[21] Badrinarayanan, V., Kendall, A. and Cipolla, R. (2017) SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 2481-2495. [Google Scholar] [CrossRef
[22] Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab, N., Hornegger, J., Wells, W. and Frangi, A., Eds., Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Springer, Cham, 234-241. [Google Scholar] [CrossRef
[23] Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K. and Yuille, A.L. (2014) Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. ArXiv: 1412.7062.
[24] Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K. and Yuille, A.L. (2018) DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 834-848. [Google Scholar] [CrossRef
[25] Yu, F. and Koltun, V. (2015) Multi-Scale Context Aggregation by Dilated Convolutions. ArXiv: 1511.07122.
[26] Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F. and Adam, H. (2018). Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., Computer Vision—ECCV 2018, Springer, Cham, 801-818.[CrossRef
[27] Paszke, A., Chaurasia, A., Kim, S. and Culurciello, E. (2016) Enet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation. ArXiv: 1606.02147.
[28] Zhao, H., Qi, X., Shen, X., Shi, J. and Jia, J. (2018) ICNet for Real-Time Semantic Segmentation on High-Resolution Images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds., Computer Vision—ECCV 2018, Springer, Cham, 418-434. [Google Scholar] [CrossRef
[29] Chen, L.C., Yang, Y., Wang, J., Xu, W. and Yuille, A.L. (2016) Attention to Scale: Scale-Aware Semantic Image Segmentation. 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, 27-30 June 2016, 3640-3649. [Google Scholar] [CrossRef
[30] Hou, L., Vicente, T.F.Y., Hoai, M. and Samaras, D. (2021) Large Scale Shadow Annotation and Detection Using Lazy Annotation and Stacked CNNs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 1337-1351. [Google Scholar] [CrossRef
[31] Kirillov, A., Girshick, R., He, K. and Dollár, P. (2019) Panoptic Feature Pyramid Networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, 15-20 June 2019, 6392-6401. [Google Scholar] [CrossRef
[32] Lin, G., Milan, A., Shen, C. and Reid, I. (2017) RefineNet: Multi-Path refinement Networks for High-Resolution Semantic Segmentation. 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, 21-26 July 2017, 5168-5177. [Google Scholar] [CrossRef
[33] Zhao, H., Shi, J., Qi, X., Wang, X. and Jia, J. (2017) Pyramid Scene Parsing Network. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 6230-6239. [Google Scholar] [CrossRef
[34] Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., Liu, D., Mu, Y., Tan, M., Wang, X., Liu, W. and Xiao, B. (2020) Deep High-Resolution Representation Learning for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 3349-3364. [Google Scholar] [CrossRef
[35] Zhao, W. and Du, S. (2016) Learning Multiscale and Deep Representations for Classifying Remotely Sensed Imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 113, 155-165. [Google Scholar] [CrossRef
[36] Cheng, D., Meng, G., Xiang, S. and Pan, C. (2017) FusionNet: Edge Aware Deep Convolutional Networks for Semantic Segmentation of Remote Sensing Harbor Images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 10, 5769-5783. [Google Scholar] [CrossRef
[37] Marmanis, D., Schindler, K., Wegner, J.D., Galliani, S., Datcu, M. and Stilla, U. (2018) Classification with an Edge: Improving Semantic Image Segmentation with Boundary Detection. ISPRS Journal of Photogrammetry and Remote Sensing, 135, 158-172. [Google Scholar] [CrossRef
[38] Chen, J., Zhu, J., Sun, G., Li, J. and Deng, M. (2021) SMAF-Net: Sharing Multiscale Adversarial Feature for High-Resolution Remote Sensing Imagery Semantic Segmentation. IEEE Geoscience and Remote Sensing Letters, 18, 1921-1925. [Google Scholar] [CrossRef
[39] Ma, B. and Chang, C.-Y. (2022) Semantic Segmentation of High-Resolution Remote Sensing Images Using Multiscale Skip Connection Network. IEEE Sensors Journal, 22, 3745-3755. [Google Scholar] [CrossRef
[40] Xia, F., Wang, P., Chen, L.-C. and Yuille, A.L. (2016) Zoom Better to See Clearer: Human and Object Parsing with Hierarchical Auto-Zoom Net. In: Leibe, B., Matas, J., Sebe, N. and Welling, M., Eds., Computer Vision—ECCV 2016, Springer, Cham, 648-663. [Google Scholar] [CrossRef
[41] Takahama, S., Kurose, Y., Mukuta, Y., Abe, H., Fukayama, M., Yoshizawa, A., Kitagawa, M. and Harada, T. (2019) Multi-Stage Pathological Image Classification Using Semantic Segmentation. 2019 IEEE/CVF International Conference on Computer Vision, Seoul, 27 October-2 November 2019, 10701-10710. [Google Scholar] [CrossRef
[42] Liu, Y., Fan, B., Wang, L., Bai, J., Xiang, S. and Pan, C. (2018) Semantic Labeling in Very High Resolution Images via a Self-Cascaded Convolutional Neural Network. ISPRS Journal of Photogrammetry and Remote Sensing, 145, 78-95. [Google Scholar] [CrossRef
[43] Liu, W., Rabinovich, A. and Berg, A.C. (2015) ParseNet: Looking Wider to See Better. ArXiv: 1506.04579.
[44] Yu, C., Wang, J., Peng, C., Gao, C., Yu, G. and Sang, N. (2018) BiseNet: Bilateral Segmentation Network for Real-Time Semantic Segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., Computer Vision—ECCV 2018, Springer, Cham, 334-349. [Google Scholar] [CrossRef
[45] Tokunaga, H., Teramoto, Y., Yoshizawa, A. and Bise, R. (2019) Adaptive Weighting Multi-Field-of-View CNN for Semantic Segmentation in Pathology. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 15-20 June 2019, 12589-12598. [Google Scholar] [CrossRef
[46] Chen, W., Jiang, Z., Wang, Z., Cui, K. and Qian, X. (2019) Collaborative Global-Local Networks for Memory-Efficient Segmentation of Ultra-High Resolution Images. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 15-20 June 2019, 8916-8925. [Google Scholar] [CrossRef
[47] Li, Q., Yang, W., Liu, W., Yu, Y. and He, S. (2021) From Contexts to Locality: Ultra-High Resolution Image Segmentation via Locality-Aware Contextual Correlation. 2021 IEEE/CVF International Conference on Computer Vision, Montreal, 10-17 October 2021, 7232-7241. [Google Scholar] [CrossRef
[48] Bai, H., Cheng, J., Huang, X., Liu, S. and Deng, C. (2022) HCANet: A Hierarchical Context Aggregation Network for Semantic Segmentation of High-Resolution Remote Sensing Images. IEEE Geoscience and Remote Sensing Letters, 19, 1-5. [Google Scholar] [CrossRef
[49] He, K., Sun, J. and Tang, X. (2010) Guided Image Filtering. In: Daniilidis, K., Maragos, P. and Paragios, N., Eds., Computer Vision—ECCV 2010, Springer, Berlin, 1-14.
[50] Wu, H., Zheng, S., Zhang, J. and Huang, K. (2018) Fast End-to-End Trainable Guided Filter. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June2018, 1838-1847. [Google Scholar] [CrossRef
[51] Li, K., Hariharan, B. and Malik, J. (2016) Iterative Instance Segmentation. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 3659-3667. [Google Scholar] [CrossRef
[52] Cheng, H.K., Chung, J., Tai, Y.-W. and Tang, C.-K. (2020) CascadePSP: Toward Class-Agnostic and Very Highresolution Segmentation via Global and Local Refinement. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 13-19 June 2020, 8887-8896. [Google Scholar] [CrossRef
[53] Kirillov, A., Wu, Y., He, K. and Girshick, R. (2020) PointRend: Image Segmentation as Rendering. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 13-19 June 2020, 9796-9805. [Google Scholar] [CrossRef
[54] Huynh, C., Tran, A.T., Luu, K. and Hoai, M. (2021) Progressive Semantic Segmentation. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, 20-25 June 2021, 16750-16759. [Google Scholar] [CrossRef
[55] Zi, W., Xiong, W., Chen, H., Li, J. and Jing, N. (2021) SGA-Net: Self-Constructing Graph Attention Neural Network for Semantic Segmentation of Remote Sensing Images. Remote Sensing, 13, Article No. 4201. [Google Scholar] [CrossRef
[56] Lv, L., Guo, Y., Bao, T., Fu, C., Huo, H. and Fang, T. (2021) MFALNet: A Multiscale Feature Aggregation Lightweight Network for Semantic Segmentation of High-Resolution Remote Sensing Images. IEEE Geoscience and Remote Sensing Letters, 18, 2172-2176. [Google Scholar] [CrossRef