基于残差U块和上下文变换器的三分支实时语义分割
Trilateral Network with Residual U-Blocks and Contextual Transformer Block for Real-Time Semantic Segmentation
摘要: 针对有限的卷积接受域阻碍了全局关系建模的问题,本文提出一种基于残差U块和上下文变换器的三分支实时语义分割算法,该网络采用空间信息、上下文信息、边界信息三个并行的分支结构,并且采用不同深度的残差U块构建网络的上下文信息分支来获取更具鲁棒性的多尺度特征。同时增加上下文变换器模块来增强全局关系建模能力。通过实验表明了该方法的有效性,在Cityscapes数据集上,没有使用预训练的情况下可以在单个V100上使用全分辨率图像(1024 × 2048)以76.5 FPS的速度达到78.6% MIoU。
Abstract: To address the problem where the limited receptive field of convolutions hinders the modeling of global relationships, this paper proposes a three-branch real-time semantic segmentation algorithm based on residual U-blocks and context transformers. The network employs three parallel branch structures for spatial information, contextual information, and boundary information, utilizing residual U-blocks of varying depths to build the network’s contextual information branch to obtain more robust multi-scale features. Additionally, a context transformer module is introduced to enhance the capability for global relationship modeling. Experiments demonstrate the effectiveness of this method; on the Cityscapes dataset, without the use of pretraining, it can achieve 78.6% MIoU at a speed of 76.8 FPS on a single V100 GPU using full-resolution images (1024 × 2048).
文章引用:冉照彬, 王超. 基于残差U块和上下文变换器的三分支实时语义分割[J]. 计算机科学与应用, 2024, 14(4): 141-150. https://doi.org/10.12677/csa.2024.144085

参考文献

[1] Feng, D., Haase-Schütz, C., Rosenbaum, L., et al. (2020) Deep Multi-Modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges. IEEE Transactions on Intelligent Transportation Systems, 22, 1341-1360. [Google Scholar] [CrossRef
[2] Asgari Taghanaki, S., Abhishek, K., Cohen, J.P., et al. (2021) Deep Semantic Segmentation of Natural and Medical Images: A Review. Artificial Intelligence Review, 54, 137-178. [Google Scholar] [CrossRef
[3] Yuan, X., Shi, J. and Gu, L. (2021) A Review of Deep Learning Methods for Semantic Segmentation of Remote Sensing Imagery. Expert Systems with Applications, 169, Article ID: 114417. [Google Scholar] [CrossRef
[4] Long, J., Shelhamer, E. and Darrell, T. (2015) Fully Convolutional Networks for Semantic Segmentation. The Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 7-12 June 2015, 3431-3440. [Google Scholar] [CrossRef
[5] Shotton, J., Johnson, M. and Cipolla, R. (2008) Semantic Text on Forests for Image Categorization and Segmentation. Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, 23-28 June 2008, 1-8. [Google Scholar] [CrossRef
[6] Chen, L.-C., Papandreou, G., Kokkinos, I., et al. (2014) Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected Crfs.
[7] Zhao, H., Shi, J., Qi, X., et al. (2017) Pyramid Scene Parsing Network. The Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 6230-6239. [Google Scholar] [CrossRef
[8] Xiao, X., Zhao, Y., Zhang, F., et al. (2023) BASeg: Boundary Aware Semantic Segmentation for Autonomous Driving. Neural Networks, 157, 460-470.
[9] Shvets, A.A., Rakhlin, A., Kalinin, A.A., et al. (2018) Automatic Instrument Segmentation in Robot-Assisted Surgery Using Deep Learning. Proceedings of the 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), Orlando, 17-20 December 2018, 624-628. [Google Scholar] [CrossRef
[10] Paszke, A., Chaurasia, A., Kim, S., et al. (2016) Enet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation.
[11] Zhao, H., Qi, X., Shen, X., et al. (2018) Icnet for Real-Time Semantic Segmentation on High-Resolution Images. The Proceedings of the European Conference on Computer Vision (ECCV), Munich, 8-14 September 2018, 418-434. [Google Scholar] [CrossRef
[12] Howard, A.G., Zhu, M., Chen, B., et al. (2017) Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications.
[13] Yu, C., Wang, J., Peng, C., et al. (2018) Bisenet: Bilateral Segmentation Network for Real-Time Semantic Segmentation. The Proceedings of the European Conference on Computer Vision (ECCV), Munich, 8-14 September 2018, 334-349. [Google Scholar] [CrossRef
[14] Poudel, R.P., Liwicki, S. and Cipolla, R. (2019) Fast-Scnn: Fast Semantic Segmentation Network.
[15] Xu, J., Xiong, Z. and Bhattacharyya, S.P. (2023) PIDNet: A Real-Time Semantic Segmentation Network Inspired by PID Controllers. The Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, 17-24 June 2023, 19529-19539. [Google Scholar] [CrossRef
[16] Hong, Y., Pan, H., Sun, W., et al. (2021) Deep Dual-Resolution Networks for Real-Time and Accurate Semantic Segmentation of Road Scenes.
[17] Yu, C., Gao, C., Wang, J., et al. (2021) Bisenet V2: Bilateral Network with Guided Aggregation for Real-Time Semantic Segmentation. International Journal of Computer Vision, 129, 3051-3068. [Google Scholar] [CrossRef
[18] Hao, S., Zhou, Y., Guo, Y., et al. (2022) Real-Time Semantic Segmentation via Spatial-Detail Guided Context Propagation. IEEE Transactions on Neural Networks and Learning Systems.
[19] Mehta, S., Rastegari, M., Caspi, A., et al. (2018) Espnet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation. The Proceedings of the European Conference on Computer Vision (ECCV), Munich, 8-14 September 2018, 561-580. [Google Scholar] [CrossRef
[20] Lo, S.-Y., Hang, H.-M., Chan, S.-W., et al. (2019) Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation. Proceedings of the 1st ACM International Conference on Multimedia in Asia, Beijing, 15-18 December 2019, 1-6. [Google Scholar] [CrossRef
[21] Li, H., Xiong, P., Fan, H., et al. (2019) Dfanet: Deep Feature Aggregation for Real-Time Semantic Segmentation. The Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 15-20 June 2019, 9514-9523. [Google Scholar] [CrossRef
[22] Fu, J., Liu, J., Tian, H., et al. (2019) Dual Attention Network for Scene Segmentation. The Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 15-20 June 2019, 3141-3149. [Google Scholar] [CrossRef
[23] Hu, P., Perazzi, F., Heilbron, F.C., et al. (2020) Real-Time Semantic Segmentation with Fast Attention. IEEE Robotics and Automation Letters, 6, 263-270. [Google Scholar] [CrossRef
[24] 靳瑜昕, 杨晓文, 张元, 等. 注意力引导多模态融合的RGB-D图像分割[J]. 计算机工程与设计, 2022, 43(12): 3453-3460.
[25] Li, Y., Yao, T., Pan, Y., et al. (2022) Contextual Transformer Networks for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45, 1489-1500. [Google Scholar] [CrossRef
[26] Qin, X., Zhang, Z., Huang, C., et al. (2020) U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection. Pattern Recognition, 106, Article ID: 107404. [Google Scholar] [CrossRef
[27] Li, X., Chen, H., Qi, X., et al. (2018) H-DenseUNet: Hybrid Densely Connected UNet for Liver and Tumor Segmentation from CT Volumes. IEEE Transactions on Medical Imaging, 37, 2663-2674. [Google Scholar] [CrossRef
[28] Hu, J., Shen, L. and Sun, G. (2018) Squeeze-and-Excitation Networks. The Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 7132-7141. [Google Scholar] [CrossRef