一种基于增强校准策略的自动驾驶多任务分割网络
A Multi-Task Segmentation Network Based on an Enhancement-Calibration Strategy for Autonomous Driving
DOI: 10.12677/csa.2026.162054, PDF,    科研立项经费支持
作者: 游玙瑞, 宋春林:同济大学信息与通信工程系,上海;徐旭辉:同济大学海洋地质国家重点实验室,上海
关键词: 语义分割多任务学习驾驶感知可行驶区域检测车道线检测Semantic Segmentation Multi-Task Learning Driving Perception Drivable Area Detection Lane Line Detection
摘要: 在自动驾驶中,可行驶区域检测与车道线检测是两项语义上紧密相关的关键视觉任务。然而多数方法仍将两项任务独立处理,未能有效利用任务间的关联,此外,大规模网络结构庞大、计算量高,难以满足实时嵌入式车载系统的应用需求。本文提出一种基于增强–校准策略的轻量化多任务语义分割网络GMSANet (Grouped Multi-Scale Attention Network),用于同时处理可行驶区域检测与车道线检测任务。模型以GSConv-ESP (Grouped Shuffle Convolution-Efficient Spatial Pyramid)结构为编码器基础,通过增强–校准设计,在保持高精度的同时降低模型复杂度。网络引入分组多尺度注意力(Grouped Multi-Scale Attention, GMSA)模块,以分组条带卷积提升对方向性与关键区域的响应;同时提出多尺度动态矩形自校准(Multi-scale Dynamic Rectangular Self-Calibration Module, MD-RCM)模块,通过多尺度感受野实现对目标区域的自适应校准。实验结果表明,在BDD100K数据集上,GMSANet以仅2.9M参数和6.45 G FLOPs的复杂度,实现了92.8%的可行驶区域mIoU、85.1%的车道线准确率与34.0%的车道线IoU,性能优于YOLOP和A-YOLOM等轻量化方法。模型推理速度可达55 FPS,具备良好的实时性与嵌入式部署潜力。
Abstract: Drivable area detection and lane line detection are two related perception tasks in autonomous driving. Many prior works treat them as two separate problems and do not explicitly model their shared features. In many cases, the required computation and model size exceed what typical embedded platforms can support in real time. This paper introduces the Grouped Multi-Scale Attention Network (GMSANet), a lightweight multi-task semantic segmentation network based on an enhancement-calibration strategy, designed for drivable area and lane line detection. The model is built upon the GSConv-ESP (Grouped Shuffle Convolution-Efficient Spatial Pyramid) encoder and uses an enhancement-calibration design, which reduces model complexity while preserving high segmentation accuracy. We integrate a Grouped Multi-Scale Attention (GMSA) module into the network. GMSA applies grouped strip convolutions at multiple scales, which increases the sensitivity to directional features and key regions. Also, we introduce a Multi-scale Dynamic Rectangular Self-Calibration Module (MD-RCM) that calibrates target regions by adjusting receptive fields across multiple scales. Experimental results on the BDD100K dataset show that GMSANet, with only 2.9 M parameters and 6.45 G FLOPs, achieves 92.8% mIoU for drivable area segmentation, 85.1% LaneAccuracy, and 34.0% LaneIoU, outperforming lightweight models such as YOLOP and A-YOLOM. The model further achieves an inference speed of 55 FPS, demonstrating strong real-time capability and suitability for embedded deployment.
文章引用:游玙瑞, 宋春林, 徐旭辉. 一种基于增强校准策略的自动驾驶多任务分割网络[J]. 计算机科学与应用, 2026, 16(2): 223-239. https://doi.org/10.12677/csa.2026.162054

参考文献

[1] Chiu, K.Y. and Lin, S.F. (2005) Lane Detection Using Color-Based Segmentation. Proceedings of the IEEE Intelligent Vehicles Symposium, Las Vegas, 6-8 June 2005, 706-711. [Google Scholar] [CrossRef
[2] Satzoda, R.K., Sathyanarayana, S., Srikanthan, T. and Sathyanarayana, S. (2010) Hierarchical Additive Hough Transform for Lane Detection. IEEE Embedded Systems Letters, 2, 23-26. [Google Scholar] [CrossRef
[3] Zhang, Y., Gong, P., Ji, S. and Xu, Q. (2022) Real-Time Lane Detection Method Based on Region of Interest. 2022 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS), Hengyang, 26-27 March 2022, 1188-1192. [Google Scholar] [CrossRef
[4] He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef
[5] Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab, N., Hornegger, J., Wells, W. and Frangi, A., Eds., Lecture Notes in Computer Science, Springer International Publishing, 234-241. [Google Scholar] [CrossRef
[6] Badrinarayanan, V., Kendall, A. and Cipolla, R. (2017) SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 2481-2495. [Google Scholar] [CrossRef] [PubMed]
[7] Zhao, H., Shi, J., Qi, X., Wang, X. and Jia, J. (2017) Pyramid Scene Parsing Network. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 6230-6239. [Google Scholar] [CrossRef
[8] 高程阳, 郁湧, 秦江龙. 面向交通场景的图像分割网络[J]. 计算机科学与应用, 2024, 14(4): 13-23.
[9] Parashar, A., Rhu, M., Mukkara, A., Puglielli, A., Venkatesan, R., Khailany, B., et al. (2017) SCNN: An Accelerator for Compressed-Sparse Convolutional Neural Networks. Proceedings of the 44th Annual International Symposium on Computer Architecture, Toronto, 24-28 June 2017, 27-40. [Google Scholar] [CrossRef
[10] Honda, H. and Uchida, Y. (2024) ClrerNet: Improving Confidence of Lane Detection with Laneiou. 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, 3-8 January 2024, 1165-1174. [Google Scholar] [CrossRef
[11] Wang, L. and Zhong, H. (2024) FENet: Focusing Enhanced Network for Lane Detection. 2024 IEEE International Conference on Multimedia and Expo (ICME), Niagara Falls, 15-19 July 2024, 1-6. [Google Scholar] [CrossRef
[12] Wu, D., Liao, M., Zhang, W., Wang, X., Bai, X., Cheng, W., et al. (2022) YOLOP: You Only Look Once for Panoptic Driving Perception. Machine Intelligence Research, 19, 550-562. [Google Scholar] [CrossRef
[13] Han, C., Zhao, Q., Zhang, S., Chen, Y., Zhang, Z. and Yuan, J. (2022) YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception.
https://arxiv.org/abs/2208.11434
[14] Wang, J., Jonathan Wu, Q.M. and Zhang, N. (2024) You Only Look at Once for Real-Time and Generic Multi-Task. IEEE Transactions on Vehicular Technology, 73, 12625-12637. [Google Scholar] [CrossRef
[15] Tian, W., Yu, X. and Hu, H. (2023) Interactive Attention Learning on Detection of Lane and Lane Marking on the Road by Monocular Camera Image. Sensors, 23, Article 6545. [Google Scholar] [CrossRef] [PubMed]
[16] Mehta, S., Rastegari, M., Shapiro, L. and Hajishirzi, H. (2019) ESPNetv2: A Light-Weight, Power Efficient, and General Purpose Convolutional Neural Network. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 9182-9192. [Google Scholar] [CrossRef
[17] Li, H., Li, J., Wei, H., Liu, Z., Zhan, Z. and Ren, Q. (2022) Slim-Neck by GSConv: A Lightweight-Design for Real-Time Detector Architectures.
https://arxiv.org/abs/2206.02424
[18] Chollet, F. (2017) Xception: Deep Learning with Depthwise Separable Convolutions. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 1800-1807. [Google Scholar] [CrossRef
[19] Woo, S., Park, J., Lee, J. and Kweon, I.S. (2018) CBAM: Convolutional Block Attention Module. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y. Eds., Lecture Notes in Computer Science, Springer International Publishing, 3-19. [Google Scholar] [CrossRef
[20] Hou, Q., Zhou, D. and Feng, J. (2021) Coordinate Attention for Efficient Mobile Network Design. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 13708-13717. [Google Scholar] [CrossRef
[21] Szegedy, C., Ioffe, S., Vanhoucke, V. and Alemi, A. (2017) Inception-v4, Inception-Resnet and the Impact of Residual Connections on Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 31, 4278-4284. [Google Scholar] [CrossRef
[22] Cai, X., Lai, Q., Wang, Y., Wang, W., Sun, Z. and Yao, Y. (2024) Poly Kernel Inception Network for Remote Sensing Detection. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 16-22 June 2024, 27706-27716. [Google Scholar] [CrossRef
[23] Ni, Z., Chen, X., Zhai, Y., Tang, Y. and Wang, Y. (2024) Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T. and Varol, G., Eds., Lecture Notes in Computer Science, Springer, 239-255. [Google Scholar] [CrossRef
[24] Yu, F., Chen, H., Wang, X., Xian, W., Chen, Y., Liu, F., et al. (2020) BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 2633-2642. [Google Scholar] [CrossRef
[25] Che, Q., Le, D., Pham, M., Nguyen, V. and Lam, D. (2025) TwinLiteNet+: An Enhanced Multi-Task Segmentation Model for Autonomous Driving. Computers and Electrical Engineering, 128, Article 110694. [Google Scholar] [CrossRef
[26] Chen, L., Papandreou, G., Kokkinos, I., Murphy, K. and Yuille, A.L. (2018) Deeplab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 834-848. [Google Scholar] [CrossRef] [PubMed]
[27] Xie, E., Wang, W., Yu, Z., et al. (2021) SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. Advances in Neural Information Processing Systems, 34, 12077-12090.
[28] Li, Z., Bian, J., Sun, M., Zhao, X., Li, W. and Zhang, L. (2025) PDPMamba: Potential Panoptic Driving Perception via Multi-Task Visual Mamba. 2025 25th International Conference on Digital Signal Processing (DSP), Pylos (Messinia, Southwest Pelo-Ponnese), 25-27 June 2025, 1-5. [Google Scholar] [CrossRef
[29] Nguyen, P., Nguyen, T., Pham, P. and Bui, Q. (2026) U-MobileViT: A Lightweight Vision Transformer-Based Backbone for Panoptic Driving Segmentation. Signal Processing: Image Communication, 142, Article 117461. [Google Scholar] [CrossRef
[30] Liu, Y., Ma, H., Zhu, J. and Zhang, Q. (2024) GDMNet: A Unified Multi-Task Network for Panoptic Driving Perception. Computers, Materials & Continua, 80, 2963-2978. [Google Scholar] [CrossRef
[31] Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D. and Batra, D. (2017) Grad-Cam: Visual Explanations from Deep Networks via Gradient-Based Localization. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 618-626. [Google Scholar] [CrossRef