基于DCD-YOLO的钢材表面缺陷检测算法
Steel Surface Defect Detection Algorithm Based on DCD-YOLO
摘要: 本文针对钢材表面缺陷检测中多尺度特征适应性不足、上下文信息融合不充分、变形缺陷捕捉能力有限的问题,提出一种基于YOLOv11n的改进模型DCD-YOLO。首先,设计动态混合卷积模块(DCMB),替换主干网络C3k2模块的Bottleneck部分,通过动态卷积核权重机制自适应调整卷积核权重大小,增强对多尺度缺陷的特征提取能力;其次,设计了上下文引导的空间特征重构金字塔网络(CGRFPN),通过矩形自校准模块(RCM)与金字塔上下文提取模块(PCE),加强模型对缺陷前背景的建模能力,提升缺陷与复杂背景的区分度;最后,通过引入可变形注意力机制(DAttention)替换PSA模块中的固定注意力机制,完成注意力的动态采样,强化了对变形缺陷的适应性。实验结果表明,改进后的模型在GC10-DET数据集上的mAP@0.5达到66.9%,较原YOLOv11n提升3.3%。同时,模型检测精度与召回率分别提升1.7%和2.8%,有效解决了多尺度、背景抑制等检测难题,满足了工业场景对准确性与召回率的要求。
Abstract: This paper addresses the issues of insufficient multi-scale feature adaptability, inadequate contextual information fusion, and limited capability in capturing deformed defects in steel surface defect detection by proposing an improved model, DCD-YOLO, based on YOLOv11n. First, a Dynamic Convolutional Mixed Block (DCMB) is designed to replace the Bottleneck part of the C3k2 module in the backbone network. Through a dynamic convolution kernel weight mechanism, it adaptively adjusts the convolution kernel weights, enhancing feature extraction capabilities for multi-scale defects. Second, a Context-Guided Spatial Feature Reconstruction Pyramid Network (CGRFPN) is designed. By using the Rectangular Self-calibration Module (RCM) and Pyramid Context Extraction Module (PCE), the model’s ability to model defect foreground and background is strengthened, improving the distinction between defects and a complex background. Finally, by introducing a Deformable Attention mechanism (DAttention) to replace the fixed attention mechanism in the PSA module, dynamic sampling of attention is achieved, enhancing adaptability to deformed defects. Experimental results show that the improved model achieves a mAP@0.5 of 66.9% on the GC10-DET dataset, an increase of 3.3% compared to the original YOLOv11n. Meanwhile, the model’s detection precision and recall increased by 1.7% and 2.8%, respectively, effectively addressing detection challenges such as multi-scale defects and background suppression, meeting industrial requirements for accuracy and recall.
文章引用:唐世龙, 徐正豪, 候华毅. 基于DCD-YOLO的钢材表面缺陷检测算法[J]. 计算机科学与应用, 2026, 16(2): 15-28. https://doi.org/10.12677/csa.2026.162035

参考文献

[1] Luo, Q., Fang, X., Liu, L., Yang, C. and Sun, Y. (2020) Automated Visual Defect Detection for Flat Steel Surface: A Survey. IEEE Transactions on Instrumentation and Measurement, 69, 626-644. [Google Scholar] [CrossRef
[2] Tang, B., Chen, L., Sun, W. and Lin, Z. (2023) Review of Surface Defect Detection of Steel Products Based on Machine Vision. IET Image Processing, 17, 303-322. [Google Scholar] [CrossRef
[3] Fang, X.X., Luo, Q.W., Zhou, B.X., et al. (2020) Research Progress of Automated Visual Surface Defect Detection for Industrial Metal Planar Materials. Sensors, 20, 5136. [Google Scholar] [CrossRef] [PubMed]
[4] Ren, S., He, K., Girshick, R. and Sun, J. (2017) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1137-1149. [Google Scholar] [CrossRef] [PubMed]
[5] He, K., Gkioxari, G., Dollar, P. and Girshick, R. (2017) Mask R-CNN. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 2961-2969. [Google Scholar] [CrossRef
[6] Lu, J., Zhu, M., Ma, X. and Wu, K. (2024) Steel Strip Surface Defect Detection Method Based on Improved YOLOV5s. Biomimetics, 9, Article 28. [Google Scholar] [CrossRef] [PubMed]
[7] Wei, L., Dragomir, A., Dumitru, E., et al. (2016) SSD: Single Shot MultiBox Detector. Springer.
[8] Redmon, J., Divvala, S., Girshick, R. and Farhadi, A. (2016) You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 779-788. [Google Scholar] [CrossRef
[9] Yang, S., Xie, Y., Wu, J., Huang, W., Yan, H., Wang, J., et al. (2024) CFE-YOLOV8s: Improved YOLOV8s for Steel Surface Defect Detection. Electronics, 13, Article 2771. [Google Scholar] [CrossRef
[10] Zhang, X., Wang, Y. and Fang, H. (2024) Steel Surface Defect Detection Algorithm Based on ESI-YOLOV8. Materials Research Express, 11, Article 056509. [Google Scholar] [CrossRef
[11] Huang, Y., Tan, W., Li, L. and Wu, L. (2023) WFRE-YOLOV8s: A New Type of Defect Detector for Steel Surfaces. Coatings, 13, Article 2011. [Google Scholar] [CrossRef
[12] He, K., Zhang, X., Ren, S. and Sun, J. (2015) Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 1904-1916. [Google Scholar] [CrossRef] [PubMed]
[13] Lin, T., Dollar, P., Girshick, R., He, K., Hariharan, B. and Belongie, S. (2017) Feature Pyramid Networks for Object Detection. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 936-944. [Google Scholar] [CrossRef
[14] Liu, S., Qi, L., Qin, H., Shi, J. and Jia, J. (2018) Path Aggregation Network for Instance Segmentation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake, 18-23 June 2018, 8759-8768. [Google Scholar] [CrossRef
[15] Wang, K., Liu, J. and Cai, X. (2025) C2PSA-Enhanced YOLOv11 Architecture: A Novel Approach for Small Target Detection in Cotton Disease Diagnosis. ArXiv, abs/2508.12219.
[16] Shi, D. (2023) Transnext: Robust Foveal Visual Perception for Vision Transformers. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 16-22 June 2024, 17773-17783. [Google Scholar] [CrossRef
[17] Tolstikhin, I.O., Houlsby, N., Kolesnikov, A., et al. (2021) MLP-Mixer: An All-MLP Architecture for Vision. Advances in Neural Information Processing Systems, 34, 24261-24272.
[18] Li, J., Hassani, A., Walton, S. and Shi, H. (2023) ConvMLP: Hierarchical Convolutional Mlps for Vision. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Vancouver, 17-24 June 2023, 6307-6316. [Google Scholar] [CrossRef
[19] Ni, Z., Chen, X., Zhai, Y., Tang, Y. and Wang, Y. (2024) Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T. and Varol, G., Eds., Lecture Notes in Computer Science, Springer, 239-255. [Google Scholar] [CrossRef
[20] Xia, Z., Pan, X., Song, S., Li, L.E. and Huang, G. (2022) Vision Transformer with Deformable Attention. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 4784-4793. [Google Scholar] [CrossRef
[21] Lv, X., Duan, F., Jiang, J., Fu, X. and Gan, L. (2020) Deep Metallic Surface Defect Detection: The New Benchmark and Detection Network. Sensors, 20, Article 1562. [Google Scholar] [CrossRef] [PubMed]
[22] Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D. and Batra, D. (2020) Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. International Journal of Computer Vision, 128, 336-359. [Google Scholar] [CrossRef