基于特征提取的RGBD语义分割算法研究
Research on RGBD Semantic Segmentation Algorithm Based on Feature Extraction
DOI: 10.12677/CSA.2023.1312237, PDF,   
作者: 徐薇蓉:同济大学电子信息与工程学院,上海
关键词: 语义分割特征提取多模态Semantic Segmentation Feature Extraction Multimodality
摘要: RGBD语义分割是近年来备受关注的研究领域。该领域的挑战在于有效地利用RGB和深度图像的不同信息特征。RGB图像具有全局颜色变化的特点,而深度图像则提供了关于对象局部位置的信息。因此,深度图像被认为是更具代表性的局部语义信息源,有助于后续的编码和分割。然而,目前的方法往往将RGB和深度图像通过相同的卷积运算符进行处理,忽略了它们之间的固有差异。为了解决这一问题,本文对传统的重叠补丁嵌入方法进行了修改,以更好地利用深度信息,实现更高精度的语义分割。通过修改传统的重叠补丁嵌入方法,本文能够更好地利用深度信息。具体来说,本文提出了一种改进的方法,通过对深度图像进行处理,更好的进行特征提取。通过在数据集上进行实验,本文的方法在RGBD语义分割任务中取得了较高的准确性和鲁棒性。
Abstract: RGBD semantic segmentation is a research area that has received much attention in recent years. The challenge in this area is to effectively utilize the different information characteristics of RGB and depth images.RGB images feature global color changes, while depth images provide infor-mation about the local location of an object. As a result, depth images are considered to be a more representative source of local semantic information, which helps in subsequent coding and seg-mentation. However, current approaches tend to process RGB and depth images by the same con-volution operator, ignoring the inherent differences between them. To address this problem, this paper modifies the traditional overlapping patch embedding method to better utilize the depth in-formation and achieve higher precision semantic segmentation. By modifying the traditional over-lapping patch embedding method, this paper is able to better utilize the depth information. Specif-ically, this paper proposes an improved method for better feature extraction by processing depth images. By conducting experiments on the dataset, the method in this paper achieves high accuracy and robustness in RGBD semantic segmentation tasks.
文章引用:徐薇蓉. 基于特征提取的RGBD语义分割算法研究[J]. 计算机科学与应用, 2023, 13(12): 2372-2378. https://doi.org/10.12677/CSA.2023.1312237

参考文献

[1] Chen, L.-C., et al. (2014) Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. arXiv: 1412.7062. [Google Scholar] [CrossRef
[2] Cordts, M., et al. (2016) The Cityscapes Dataset for Semantic Urban Scene Understanding. Proceedings of the IEEE Conference on Computer Vision and Pattern Recog-nition, Las Vegas, 27-30 June 2016, 3213-3223. [Google Scholar] [CrossRef
[3] Silberman, N., et al. (2012) Indoor Segmentation and Support In-ference from RGBD Images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y. and Schmid, C., Eds., Computer Vi-sion—ECCV 2012, Springer, Berlin. [Google Scholar] [CrossRef
[4] Hu, X., et al. (2019) Acnet: Attention Based Network to Ex-ploit Complementary Features for RGBD Semantic Segmentation. 2019 IEEE International Conference on Image Pro-cessing (ICIP), Taipei, 22-25 September 2019, 1440-1444. [Google Scholar] [CrossRef
[5] Xie, E., et al. (2021) SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. Advances in Neural Information Processing Systems, 34, 12077-12090.
[6] Wang, W. and Neumann, U. (2018) Depth-Aware CNN for RGBD Segmentation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, 8-14 September 2018, 144-161. [Google Scholar] [CrossRef
[7] Liu, Z., et al. (2021) Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, 10-17 October 2021, 9992-10002. [Google Scholar] [CrossRef
[8] Long, J., Shelhamer, E. and Darrell, T. (2015) Fully Con-volutional Networks for Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 7-12 June 2015, 3431-3440. [Google Scholar] [CrossRef
[9] Russakovsky, O., et al. (2015) ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 115, 211-252. [Google Scholar] [CrossRef
[10] Gupta, S., et al. (2014) Learning Rich Features from RGB-D Images for Object Detection and Segmentation. Computer Vision—ECCV 2014: 13th European Conference, Zurich, 6-12 September 2014, 345-360. [Google Scholar] [CrossRef
[11] Li, Z., et al. (2016) LSTM-CF: Unifying Context Modeling and Fusion with LSTMs for RGB-D Scene Labeling. Computer Vision—ECCV 2016: 14th European Conference, Am-sterdam, 11-14 October 2016, 541-557. [Google Scholar] [CrossRef
[12] Chen, L.-Z., et al. (2021) Spatial Information Guided Convo-lution for Real-Time RGBD Semantic Segmentation. IEEE Transactions on Image Processing, 30, 2313-2324. [Google Scholar] [CrossRef
[13] Cheng, Y., et al. (2017) Locality-Sensitive Deconvolution Net-works with Gated Fusion for RGB-D Indoor Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 1475-1483. [Google Scholar] [CrossRef
[14] Cao, J., et al. (2021) Shapeconv: Shape-Aware Convolutional Layer for Indoor RGB-D Semantic Segmentation. Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, 10-17 October 2021, 7068-7077. [Google Scholar] [CrossRef
[15] Wang, J., et al. (2016) Learning Common and Specific Fea-tures for RGB-D Semantic Segmentation with Deconvolutional Networks. Computer Vision—ECCV 2016: 14th Euro-pean Conference, Amsterdam, 11-14 October 2016, 664-679. [Google Scholar] [CrossRef
[16] Ye, H. and Xu, D. (2022) Inverted Pyramid Multi-Task Transformer for Dense Scene Understanding. European Conference on Computer Vision, Tel Aviv, 23-27 October 2022, 514-530. [Google Scholar] [CrossRef
[17] Chen, X., et al. (2020) Bi-Directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation. Euro-pean Conference on Computer Vision, Glasgow, 23-28 August 2020, 561-577. [Google Scholar] [CrossRef
[18] Borse, S., et al. (2021) Inverseform: A Loss Function for Structured Boundary-Aware Segmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, 20-25 June 2021, 5897-5907. [Google Scholar] [CrossRef