面向交通场景的图像分割网络
Image Segmentation Network for Traffic Scenes
摘要: 语义分割技术是自动驾驶的重要基础技术之一。在交通场景的图像语义分割中,图像语义分割希望对图像的每一个像素进行分类,并进行颜色标注,使车辆能够准确地检测道路上的交通参与者和可以行驶的路面区域。目前,典型的图像语义分割算法通常融合骨干网络生成的不同阶段的特征图,以提高分割性能。简单的融合方法不能充分利用这些特征信息。这将导致相似物体之间的分割错误和小物体边界分割粗糙,使车辆无法准确感知周围环境,从而影响上层的决策,甚至对其他交通参与者造成严重的安全隐患。针对这一问题,本文设计了一种密集特征融合与边界细化网络(DFBNet),它包括两部分:特征融合网络(FFN)利用多个不同感受野大小的分支和卷积核提取特征图中的信息,并利用注意力机制为提取的特征分配权重;边界细化网络(BRN)利用空间注意力对每个像素位置赋予给予权值,使得目标对象的边界区域分割得更精细。我们在两个数据集上进行实验:Cityscapes和Camvid数据集。我们取得了良好的分割结果,在Cityscapes验证集上的平均交并比(mIoU)为79.47%。在Camvid测试集上的mIoU为75.13%。
Abstract: This Semantic segmentation technology is one of the important basic technologies of automatic driving. In automatic driving, image semantic segmentation hopes to classify every pixel of the image and make color labeling, so that the vehicle can accurately detect the traffic participants on the road and the pavement area that can be used. At present, typical image semantic segmentation algorithms usually fuse the feature maps of different stages generated by the backbone network to improve segmentation performance. The simple fusion method cannot fully utilize this feature information. This will result in segmentation errors between similar objects and rough boundary segmentation of small objects, making the vehicle unable to accurately perceive the surrounding environment, thus affecting the decision-making of the upper level, and even causing serious safety hazards to other traffic participants. To solve this problem, this paper designs a dense feature fusion and boundary refinement network (DFBNet), which includes two parts: Feature Fusion Network (FFN) using multiple branches and convolutional kernels of different receptive field sizes to extract the information in the feature map and using the attention mechanism to assign weights to the extracted features; Boundary Refinement Network (BRN) using spatial attention to give weights to each pixel position, making the boundary area of the target object more finely segmented. We experimented on two datasets: Cityscapes and Camvid. We achieved good segmentation results with an average intersection (mIoU) of 79.47% on the Cityscapes validation set and 75.13% on the Camvid test set.
文章引用:高程阳, 郁湧, 秦江龙. 面向交通场景的图像分割网络[J]. 计算机科学与应用, 2024, 14(4): 13-23. https://doi.org/10.12677/csa.2024.144072

参考文献

[1] Long, J., Shelhamer, E. and Darrell, T. (2015) Fully Convolutional Networks for Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 7-12 June 2015, 3431-3440. [Google Scholar] [CrossRef
[2] Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, 5-9 October 2015, 234-241. [Google Scholar] [CrossRef
[3] Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang, Y., Fu, Y., Feng, J., Xiang, T., Torr, P.H.S. and Zhang, L. (2021) Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 20-25 June 2021, 6877-6886. [Google Scholar] [CrossRef
[4] He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef
[5] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S. and Schiele, B. (2016) The Cityscapes Dataset for Semantic Urban Scene Understanding. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 3213-3223. [Google Scholar] [CrossRef
[6] Brostow, G.J., Fauqueur, J. and Cipolla, R. (2009) Semantic Object Classes in Video: A High Definition Ground Truth Database. Pattern Recognition Letters, 30, 88-97. [Google Scholar] [CrossRef
[7] Contributors, M. (2020) MMSegmentation: OpenMMLab Semantic Segmentation Toolbox and Benchmark.
https://github.com/open-mmlab/mmsegmentation
[8] Zhao, H., Shi, J., Qi, X., Wang, X. and Jia, J. (2017) Pyramid Scene Parsing Network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 2881-2890. [Google Scholar] [CrossRef
[9] Zhao, H., Zhang, Y., Liu, S., Shi, J., Loy, C.C., Lin, D. and Jia, J. (2018) Psanet: Point-Wise Spatial Attention Network for Scene Parsing. Proceedings of the European Conference on Computer Vision (ECCV), 8-14 September 2018, 267-283. [Google Scholar] [CrossRef
[10] Kirillov, A., Girshick, R., He, K. and Dollar, P. (2019) Panoptic Feature Pyramid Networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 15-20 June 2019, 6399-6408. [Google Scholar] [CrossRef
[11] Cao, Y., Xu, J., Lin, S., Wei, F. and Hu, H. (2019) Gcnet: Non-Local Networks Meet Squeeze Excitation Networks and Beyond. 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Long Beach, 16-17 June 2019, 1971-1980. [Google Scholar] [CrossRef
[12] Kirillov, A., Wu, Y., He, K. and Girshick, R. (2020) Pointrend: Image Segmentation as Rendering. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 14-19 June 2020, 9799-9808. [Google Scholar] [CrossRef
[13] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K. and Stiefelhagen, R. (2022) Bending Reality: Distortion-Aware Transformers for Adapting to Panoramic Semantic Segmentation. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 16896-16906. [Google Scholar] [CrossRef
[14] Liu, Z. and Lei, Z. (2023) Buffer Ladder Feature Fusion Architecture for Semantic Segmentation Improvement. Signal, Image and Video Processing, 18, 475-483. [Google Scholar] [CrossRef
[15] Yu, C., Wang, J., Peng, C., Gao, C., Yu, G. and Sang, N. (2018) Bisenet: Bilateral Segmentation Network for Real-Time Semantic Segmentation. Proceedings of the European Conference on Computer Vision (ECCV), 8-14 September 2018, 325-341. [Google Scholar] [CrossRef
[16] Zhao, H., Qi, X., Shen, X., Shi, J. and Jia, J. (2018) ICNET for Real-Time Semantic Segmentation on High-Resolution Images. Proceedings of the European Conference on Computer Vision (ECCV), 8-14 September 2018, 405-420. [Google Scholar] [CrossRef
[17] Bilinski, P. and Prisacariu, V. (2018) Dense Decoder Shortcut Connections for Single-Pass Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-22 June 2018, 6596-6605. [Google Scholar] [CrossRef
[18] Yu, C., Gao, C., Wang, J., Yu, G., Shen, C. and Sang, N. (2021) Bisenet V2: Bilateral Network with Guided Aggregation for Realtime Semantic Segmentation. International Journal of Computer Vision, 129, 3051-3068. [Google Scholar] [CrossRef
[19] Fan, M., Lai, S., Huang, J., Wei, X., Chai, Z., Luo, J. and Wei, X. (2021) Rethinking Bisenet for Realtime Semantic Segmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, 19-25 June 2021, 9716-9725. [Google Scholar] [CrossRef
[20] Peng, J., Liu, Y., Tang, S., Hao, Y., Chu, L., Chen, G., Wu, Z., Chen, Z., Yu, Z., Du, Y., Dang, Q., Lai, B., Liu, Q., Hu, X., Yu, D. and Ma, Y. (2022) Pp-Liteseg: A Superior Realtime Semantic Segmentation Model.
[21] Ding, P. and Qian, H. (2023) Light-Deeplabv3 : A Lightweight Real-Time Semantic Segmentation Method for Complex Environment Perception. Journal of Real-Time Image Processing, 21, Article No. 1. [Google Scholar] [CrossRef