基于ResNet-ViT和注意力机制的车道线检测方法
Lane Detection Method Based on ResNet-ViT and Attention Mechanism
DOI: 10.12677/SEA.2023.123038, PDF,    国家自然科学基金支持
作者: 何 飞, 唐春晖:上海理工大学,光电信息与计算机工程学院,上海
关键词: 车道线检测ResNet-ViT注意力机制行锚分类Lane Detection ResNet-ViT Attention Mechanism Row Anchor Classification
摘要: 车道线检测是自动驾驶领域中的重要感知任务。针对当前基于卷积神经网络(CNN)的车道线检测方法存在网络推理速度慢和对细长车道线结构建模能力不佳的问题,提出了一种基于ResNet-ViT和注意力机制的车道线检测方法。具体地,该方法首先搭建主干网络ResNet用于特征提取,并在主干网络中引入Vision Transformer (ViT)的编码结构,以提高网络对车道线细长结构的建模能力。其次,设计辅助分割网络,在其中嵌入通道注意力机制模块,以增强网络对重要通道的学习能力;辅助分割网络与主干网络通过共享部分参数来实现权重共享,从而提高模型的效率和泛化能力。最后,特征解码部分引入行锚分类的思想,在特征图行方向上预测车道线的位置坐标,输出带有车道线标记点的图像。经过实验验证,本文所提出的方法在TuSimple数据集上的准确率达到96.04%,推理速度达到98帧/秒,验证了其有效性。
Abstract: Lane detection is a crucial task in the field of autonomous driving. However, the current lane detection methods based on Convolutional Neural Networks (CNN) suffer from slow network inference speed and poor ability to model the slender lane structure. To overcome these limitations, this paper proposes a lane detection method based on ResNet-ViT and attention mechanism. Specifically, the proposed method first constructs a backbone network ResNet for feature extraction, and incorporates the Vision Transformer (ViT) coding structure into the backbone network to enhance the network’s ability to model the slender structure of lane lines. Additionally, an auxiliary segmentation network is designed, in which a channel attention mechanism module is incorporated to enhance the network’s learning ability for important channels. The auxiliary segmentation network and the backbone network share some parameters to achieve weight sharing, thereby improving the efficiency and generalization ability of the model. Finally, the line anchor classification concept is introduced in the feature decoding part to predict the position coordinates of the lane lines in the line direction of the feature map and generate the image with lane mark points. Experimental results on the TuSimple dataset demonstrate that the proposed method achieves an accuracy of 96.04% and an inference speed of 98 frames per second, verifying its effectiveness.
文章引用:何飞, 唐春晖. 基于ResNet-ViT和注意力机制的车道线检测方法[J]. 软件工程与应用, 2023, 12(3): 381-392. https://doi.org/10.12677/SEA.2023.123038

参考文献

[1] Chao, M. and Mei, X. (2010) A Method for Lane Detection Based on Color Clustering. Third International Conference on Knowledge Discovery & Data Mining, Phuket, 9-10 January 2010, 200-203. [Google Scholar] [CrossRef
[2] 段建民, 李岳, 庄博阳. 基于改进SIS算法和顺序RANSAC的车道线检测方法研究[J]. 计算机测量与控制, 2018, 26(8): 280-284, 289.
[3] 吴彦文, 张楠, 周涛, 严巍. 基于多传感融合的车道线检测与跟踪方法的研究[J]. 计算机应用研究, 2018, 35(2): 600-603, 607.
[4] Kim, J. and Lee, M. (2014) Robust Lane Detection Based on Convolutional Neural Network and Random Sample Consensus. In: Loo, C.K., Yap, K.S., Wong, K.W., Teoh, A. and Huang, K., Eds., Neural Information Processing, Springer, Cham, 454-461. [Google Scholar] [CrossRef
[5] Li, J., Mei, X. and Prokhorov, D. (2016) Deep Neural Network for Structural Prediction and Lane Detection in Traffic Scene. IEEE Transactions on Neural Networks & Learning Systems, 28, 690-703.
[6] Qin, Z.Q., Wang, H. and Li, X. (2018) Ultra Fast Structure-Aware Deep Lane Detection. In: Vedaldi, A., Bischof, H., Brox, T. and Frahm, J.M., Eds., Computer Vision—ECCV 2020, Springer, Cham, 276-291. [Google Scholar] [CrossRef
[7] Neven, D., De Brabandere, B., Georgoulis, S., Proesmans, M. and Van Gool, L. (2018) Towards End-to-End Lane Detection: An Instance Segmentation Approach. 2018 IEEE Intelligent Vehicles Symposium, Changshu, 26-30 June 2018, 286-291. [Google Scholar] [CrossRef
[8] Sun, Y., Wang, L., Chen, Y.Q. and Liu, M. (2019) Accurate Lane Detection with Atrous Convolution and Spatial Pyramid Pooling for Autonomous Driving. 2019 IEEE International Conference on Robotics and Biomimetics, Dali, 6-8 December 2019, 642-647. [Google Scholar] [CrossRef
[9] Liu, R., Yuan, Z., Liu, T. and Xiong, Z.L. (2021) End-to-End Lane Shape Prediction with Transformers. 2021 IEEE Winter Conference on Applications of Computer Vision, Waikoloa, 3-8 January 2021, 3693-3701. [Google Scholar] [CrossRef
[10] He, K.M., Zhang, X.Y., Ren, S.Q. and Sun, J. (2016) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef
[11] Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2021) An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale.
https://arxiv.org/abs/2010.11929
[12] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A. and Zagoruyko, S. (2020) End-to-End Object Detection with Transformers. In: Vedaldi, A., Bischof, H., Brox, T. and Frahm, J.M., Eds., Computer Vision—ECCV 2020, Springer, Cham, 213-229. [Google Scholar] [CrossRef
[13] Vaswani, A., Shazeer, N., Parmar, N., et al. (2017) Attention Is All You Need. 31st Annual Conference on Neural Information Processing Systems (NIPS), Long Beach, 4-9 December 2017, 6000-6010.
[14] Simonyan, K. and Zisserman, A. (2015) Very deep Convolutional Networks for Large-Scale Image Recognition.
https://arxiv.org/abs/1409.1556
[15] Ren, S.Q., He, K.M., Girshick, R. and Sun, J. (2017) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1137-1149. [Google Scholar] [CrossRef
[16] Woo, S., Park, J., Lee, J.Y. and Kweon, I.S. (2018) CBAM: Convolutional Block Attention Module. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., Computer Vision—ECCV 2018, Springer, Cham, 3-19. [Google Scholar] [CrossRef
[17] Bigelow, P. (2022) TuSimple Embracing Self-Driving Challenges. Automotive News, 96.
[18] Pan, X.G., Shi, J.P., Luo, P., Wang, X.G. and Tang, X.O. (2018) Spatial as Deep: Spatial CNN for Traffic Scene Understanding. Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, 2-7 February 2018, 7276-7283. [Google Scholar] [CrossRef
[19] Guo, Z., Huang, Y., Wei, H., et al. (2021) DALaneNet: A Dual Attention Instance Segmentation Network for Real- Time Lane Detection. IEEE Sensors Journal, 21, 21730-21739. [Google Scholar] [CrossRef
[20] Neven, D., Brabandere, B.D., Georgoulis, S., Proesmans, M. and Van Gool, L. (2018) Towards End-to-End Lane Detection: An Instance Segmentation Approach. 2018 IEEE Intelligent Vehicles Symposium, Changshu, 26-30 June 2018, 286-291. [Google Scholar] [CrossRef
[21] Tabelini, L., Berriel, R., Paixo, T.M., Badue, C. and Oliveira-Santos, T. (2020) PolyLaneNet: Lane Estimation via Deep Polynomial Regression. arXiv: 2004.10924v2.
https://arxiv.org/abs/2004.10924v2
[22] Chen, Z., Liu, Q. and Lian, C. (2019) PointLaneNet: Efficient End-to-End CNNs for Accurate Real-Time Lane Detection. 2019 IEEE Intelligent Vehicles Symposium, Paris, 9-12 June 2019, 2563-2568. [Google Scholar] [CrossRef