基于YOLOv8n的道路目标检测方法研究
Research on Road Target Detection Method Based on YOLOv8n
DOI: 10.12677/csa.2025.156173, PDF,   
作者: 张佳宠, 于 霞:沈阳工业大学信息与工程学院,辽宁 沈阳
关键词: 自动驾驶目标检测YOLOv8nAutomatic Driving Object Detection YOLOv8n
摘要: 自动驾驶中的感知系统主要使用目标检测算法来获取障道路上目标的分布,以便进行识别和分析。当前的目标检测算法发展迅速,但在实际应用场景中平衡实时检测和高检测精度的要求具有挑战性。为了解决上述问题,本文使用YOLOv8n作为原始模型,并提出了一个名为YOLOv8n-CSS的目标检测网络。首先,引入CBAM混合注意力机制增强关键特征提取并去除冗余,从而提高网络对物体和背景的识别能力。然后,使用SPPCSPC模块替换原始模型骨干网络中的SPPF模块,能够更好地融合来自不同层次和尺度的特征信息,可以有效地捕捉不同尺度物体的特征,提高模型识别物体的准确性。最后,引入SPD-Conv模块替换原始的交错卷积层,进行下采样操作,保留了更多的特征信息,从而提高了不同尺度目标的检测能力。在KITTI数据集和BDD100K数据集上的实验结果表明,改进的网络模型的平均准确率分别达到96.1%和48.0%,比基线模型分别高出3.9%和7.9%,明显优于基线模型。该模型在保证高检测精度的基础上,可以实现一般场景中的实时图像处理。
Abstract: The perception system in autonomous driving mainly uses object detection algorithms to obtain the distribution of objects on the road for identification and analysis. Although current object detection algorithms are developing rapidly, it remains challenging to balance the requirements of real-time detection and high detection accuracy in practical application scenarios. To address the above issues, this paper uses YOLOv8n as the original model and proposes an object detection network named YOLOv8n-CSS. First the CBAM hybrid attention mechanism is introduced to enhance the extraction of key features and remove redundancy, thereby improving the network’s ability to distinguish objects from the background. Then, the SPPF module in the backbone network of the original model is replaced with the SPPCSPC module. This allows for better integration of feature information from different levels and scales, effectively capturing the features of objects of various scales and improving the accuracy of object recognition by the model. Finally, the SPD-Conv module is introduced to replace the original staggered convolution layer for downsampling operations, which retains more feature information and thus enhances the detection ability for objects of different scales. Experimental results on the KITTI dataset and the BDD100K dataset show that the average accuracy of the improved network model reaches 96.1% and 48.0% respectively, which is 3.9% and 7.9% higher than that of the baseline model, significantly outperforming the baseline model. This model can achieve real-time image processing in general scenarios while ensuring high detection accuracy.
文章引用:张佳宠, 于霞. 基于YOLOv8n的道路目标检测方法研究[J]. 计算机科学与应用, 2025, 15(6): 231-243. https://doi.org/10.12677/csa.2025.156173

参考文献

[1] Wang, F., Wang, P., Zhang, X., Li, H. and Himed, B. (2021) An Overview of Parametric Modeling and Methods for Radar Target Detection with Limited Data. IEEE Access, 9, 60459-60469. [Google Scholar] [CrossRef
[2] Zhang, Y., Zhang, W. and Bi, J. (2017) Recent Advances in Driverless Car. Recent Patents on Mechanical Engineering, 10, 30-38. [Google Scholar] [CrossRef
[3] Zhao, R., Tang, S.H., Bin Supeni, E.E., Rahim, S.A. and Fan, L. (2024) Z-YOLOv8s-Based Approach for Road Object Recognition in Complex Traffic Scenarios. Alexandria Engineering Journal, 106, 298-311. [Google Scholar] [CrossRef
[4] Girshick, R., Donahue, J., Darrell, T. and Malik, J. (2016) Region-Based Convolutional Networks for Accurate Object Detection and Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, 142-158. [Google Scholar] [CrossRef] [PubMed]
[5] He, D., Qiu, Y., Miao, J., Zou, Z., Li, K., Ren, C., et al. (2022) Improved Mask R-CNN for Obstacle Detection of Rail Transit. Measurement, 190, Article ID: 110728. [Google Scholar] [CrossRef
[6] Qiu, Z., Bai, H. and Chen, T. (2023) Special Vehicle Detection from UAV Perspective via YOLO-GNS Based Deep Learning Network. Drones, 7, Article 117. [Google Scholar] [CrossRef
[7] Soylu, E. and Soylu, T. (2023) A Performance Comparison of YOLOv8 Models for Traffic Sign Detection in the Robotaxi-Full Scale Autonomous Vehicle Competition. Multimedia Tools and Applications, 83, 25005-25035. [Google Scholar] [CrossRef
[8] Redmon, J. and Farhadi, A. (2018) YOLOv3: An Incremental Improvement. arXiv: 1804.02767.
[9] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C., et al. (2016) SSD: Single Shot Multibox Detector. In: Leibe, B., Matas, J., Sebe, N. and Welling, M., Eds., Computer VisionECCV 2016, Springer, 21-37. [Google Scholar] [CrossRef
[10] Lin, T., Goyal, P., Girshick, R., He, K. and Dollar, P. (2017) Focal Loss for Dense Object Detection. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 2999-3007. [Google Scholar] [CrossRef
[11] Wang, G., Chen, Y., An, P., Hong, H., Hu, J. and Huang, T. (2023) UAV-YOLOv8: A Small-Object-Detection Model Based on Improved Yolov8 for UAV Aerial Photography Scenarios. Sensors, 23, Article 7190. [Google Scholar] [CrossRef] [PubMed]
[12] Redmon, J. and Farhadi, A. (2018) YOLOv3: An Incremental Improvement. arXiv: 1804.02767
[13] Liu, S., Qi, L., Qin, H., Shi, J. and Jia, J. (2018) Path Aggregation Network for Instance Segmentation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 8759-8768. [Google Scholar] [CrossRef
[14] Li, X., Wang, W., Wu, L., Chen, S., Hu, X., Li, J., Tang, J. and Yang, J. (2020) Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection. NeurIPS Proceedings, 33, 21002-21012.
[15] Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R. and Ren, D. (2020) Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 12993-13000. [Google Scholar] [CrossRef
[16] Feng, C., Zhong, Y., Gao, Y., Scott, M.R. and Huang, W. (2021. TOOD: Task-Aligned One-Stage Object Detection. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 3490-3499. [Google Scholar] [CrossRef
[17] Zhu, F., Cui, J., Zhu, B., Li, H. and Liu, Y. (2023) Semantic Segmentation of Urban Street Scene Images Based on Improved U-Net Network. Optoelectronics Letters, 19, 179-185. [Google Scholar] [CrossRef
[18] Tejashwini, P., Thriveni, J. and Venugopal, K. (2023) A Novel SLCA-UNet Architecture for Automatic MRI Brain Tumor Segmentation. arXiv: 2307.08048.
[19] Wang, C., He, W., Nie, Y., Guo, J., Liu, C., Han, K. and Wang, Y. (2023) Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism. Neural Computing and Applications, 36, 1-15.
[20] Sunkara, R. and Luo, T. (2023) No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects. In: Amini, M.R., Canu, S., Fischer, A., Guns, T., Kralj Novak, P. and Tsoumakas, G., Eds., Machine Learning and Knowledge Discovery in Databases, Springer, 443-459. [Google Scholar] [CrossRef
[21] Geiger, A., Lenz, P., Stiller, C. and Urtasun, R. (2013) Vision Meets Robotics: The KITTI Dataset. The International Journal of Robotics Research, 32, 1231-1237. [Google Scholar] [CrossRef
[22] Yu, F., Chen, H., Wang, X., Xian, W., Chen, Y., Liu, F., et al. (2020) BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 2633-2642. [Google Scholar] [CrossRef