融合分层大核卷积与高效注意力的轻量化RT-DETR道路缺陷检测
Lightweight Road Defect Detection Based on RT-DETR Fusing Hierarchical Large Kernel Convolution and Efficient Attention
摘要: 当前道路缺陷检测算法难以兼顾高精度与实时性要求,且传统卷积神经网络因局部感受野限制,对细长裂缝等特征的捕捉能力有限。为此,本文提出一种基于改进RT-DETR的轻量化道路缺陷检测算法。该方法以端到端的RT-DETR为基线,在消除后处理延迟的同时增强全局特征建模。具体改进包括:1) 采用轻量级StarNet替换原ResNet骨干,结合大核深度可分离卷积与线性门控机制,在显著降低参数量和计算成本的前提下保持多尺度特征提取能力;2) 为缓解下采样导致的细微裂缝特征模糊与丢失,使用分层大核卷积(HLKConv)替换标准下采样层;3) 在骨干网络高层引入高效多尺度注意力模块(EMA),通过跨通道与跨空间交互建模,提升模型对多尺度、不规则缺陷的感知与聚焦能力。在RDD2022数据集上的实验结果表明,改进后模型的mAP@0.5达到85.79%,较原始RT-DETR-R18提升0.67%,而参数量和计算量分别大幅降低55.9%和56.5%。相较于其他主流算法,本方法在维持高检测精度的同时,显著降低了硬件资源需求,更适用于计算资源受限的边缘设备进行实时道路巡检。
Abstract: Current road defect detection algorithms struggle to balance the requirements of high accuracy and real-time performance. Furthermore, traditional Convolutional Neural Networks (CNNs) are constrained by local receptive fields, limiting their ability to capture features of slender cracks. To address these issues, this paper proposes a lightweight road defect detection algorithm based on an improved RT-DETR. Leveraging the end-to-end RT-DETR as a baseline, the proposed method enhances global feature modeling while eliminating post-processing latency. The specific improvements include: 1) Replacing the original ResNet backbone with the lightweight StarNet, which combines large-kernel depth-wise separable convolutions with a linear gating mechanism to maintain multi-scale feature extraction capabilities while significantly reducing parameter count and computational costs; 2) Utilizing Hierarchical Large-Kernel Convolution (HLKConv) to replace standard downsampling layers, thereby mitigating the blurring and loss of fine crack features caused by downsampling; and 3) Incorporating the Efficient Multi-Scale Attention (EMA) module into the high-level layers of the backbone to enhance the model’s perception and focus on multi-scale and irregular defects through cross-channel and cross-spatial interaction modeling. Experimental results on the RDD2022 dataset demonstrate that the improved model achieves an mAP@0.5 of 85.79%, an increase of 0.67% over the original RT-DETR-R18, while the parameter count and computational volume are substantially reduced by 55.9% and 56.5%, respectively. Compared with other mainstream algorithms, this method significantly lowers hardware resource requirements while maintaining high detection accuracy, making it highly suitable for real-time road inspection on resource-constrained edge devices.
文章引用:杨宗才, 李星星. 融合分层大核卷积与高效注意力的轻量化RT-DETR道路缺陷检测[J]. 人工智能与机器人研究, 2026, 15(2): 516-526. https://doi.org/10.12677/airr.2026.152050

参考文献

[1] Yu, J., Jiang, J., Fichera, S., Paoletti, P., Layzell, L., Mehta, D., et al. (2024) Road Surface Defect Detection—From Image-Based to Non-Image-Based: A Survey. IEEE Transactions on Intelligent Transportation Systems, 25, 10581-10603. [Google Scholar] [CrossRef
[2] Mei, Q. and Gül, M. (2020) A Cost-Effective Solution for Pavement Crack Inspection Using Cameras and Deep Neural Networks. Construction and Building Materials, 256, Article 119397. [Google Scholar] [CrossRef
[3] Wang, W., Wu, B., Yang, S. and Wang, Z. (2018) Road Damage Detection and Classification with Faster R-CNN. 2018 IEEE International Conference on Big Data (Big Data), Seattle, 10-13 December 2018, 5220-5223. [Google Scholar] [CrossRef
[4] Fang, F., Li, L., Gu, Y., Zhu, H. and Lim, J. (2020) A Novel Hybrid Approach for Crack Detection. Pattern Recognition, 107, Article 107474. [Google Scholar] [CrossRef
[5] Yang, J., Fu, Q. and Nie, M. (2020) Road Crack Detection Using Deep Neural Network with Receptive Field Block. IOP Conference Series: Materials Science and Engineering, 782, Article 042033. [Google Scholar] [CrossRef
[6] Lu, G., He, X., Wang, Q., Shao, F., Wang, J. and Jiang, Q. (2022) Bridge Crack Detection Based on Improved Single Shot Multi-Box Detector. PLOS ONE, 17, e0275538. [Google Scholar] [CrossRef] [PubMed]
[7] Hu, H., Li, Z., He, Z., Wang, L., Cao, S. and Du, W. (2024) Road Surface Crack Detection Method Based on Improved YOLOv5 and Vehicle-Mounted Images. Measurement, 229, Article 114443. [Google Scholar] [CrossRef
[8] Wu, H.Y., Kong, L.Y. and Liu, D.H. (2024) Crack Detection on Road Surfaces Based on Improved YOLOv8. IEEE Access, 12, 190850-190864. [Google Scholar] [CrossRef
[9] Gao, X., Cao, C. and Yi, X. (2025) Using the Improved YOLOv11 Model to Enhance Computer Vision Applications for Building Crack Detection Algorithms. Scientific Reports, 15, Article No. 38843. [Google Scholar] [CrossRef
[10] Zhao, Y., Lv, W., Xu, S., Wei, J., Wang, G., Dang, Q., et al. (2024) DETRs Beat YOLOs on Real-Time Object Detection. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 16-22 June 2024, 16965-16974. [Google Scholar] [CrossRef
[11] Ma, X., Dai, X., Bai, Y., Wang, Y. and Fu, Y. (2024) Rewrite the Stars. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 16-22 June 2024, 5694-5703. [Google Scholar] [CrossRef
[12] Zhang, G., Xu, G., Chen, S., Wang, H. and Zhang, X. (2025) Learning Dynamic Local Context Representations for Infrared Small Target Detection. IEEE Transactions on Geoscience and Remote Sensing, 63, 1-13. [Google Scholar] [CrossRef
[13] Ouyang, D., He, S., Zhang, G., Luo, M., Guo, H., Zhan, J., et al. (2023) Efficient Multi-Scale Attention Module with Cross-Spatial Learning. 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, 4-10 June 2023, 1-5. [Google Scholar] [CrossRef
[14] Chattopadhay, A., Sarkar, A., Howlader, P. and Balasubramanian, V.N. (2018) Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks. 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, 12-15 March 2018, 839-847. [Google Scholar] [CrossRef