MSFA:通过多尺度融合特征增强的检测框架
MSFA: A Detection Framework Enhanced by Multi-Scale Fusion Features
摘要: 复杂驾驶场景下的交通锥检测面临目标尺寸小、背景复杂、遮挡严重等挑战。传统检测方法计算冗余高、环境适应性差,难以满足自动驾驶系统的实时性和准确性要求。本研究提出SBA-YOLOv11轻量化检测算法,引入三个关键创新:1) 金字塔多尺度特征聚合(MSFA)模块,实现高效跨尺度特征融合;2) 空间双向注意力(CSBA)模块,采用自注意力和交叉注意力机制增强特征表达;3) 门控全维度卷积(GatedFDConv)模块,结合通道分离策略和门控机制优化特征交互。构建包含8000张图像的高质量交通锥数据集,涵盖多样化场景条件。实验结果显示,与基线YOLOv11n相比,SBA-YOLOv11在精确度、召回率和mAP@50方面分别提升3.7%、3.6%和7.4%,达到89.9%、88.8%和94.9%。与最先进方法对比,整体平均精度提升2.3%~27.5%。所提方法成功平衡了检测精度和计算效率,有效解决复杂驾驶场景下的交通锥检测挑战,满足自动驾驶应用需求。
Abstract: Traffic cone detection in complex driving scenarios faces challenges such as small target size, complex background, and severe occlusion. Traditional detection methods have high computational redundancy and poor environmental adaptability, which make it difficult to meet the real-time and accuracy requirements of the auto drive system. This study proposes the SBA-YOLOv11 lightweight detection algorithm, which introduces three key innovations: 1) Pyramid Multi Scale Feature Aggregation (MSFA) module to achieve efficient cross scale feature fusion; 2) The Spatial Bidirectional Attention (CSBA) module utilizes self attention and cross attention mechanisms to enhance feature expression; 3) GatedFDConv module combines channel separation strategy and gating mechanism to optimize feature interaction. Build a high-quality traffic cone dataset containing 8000 images, covering diverse scene conditions. The experimental results show that compared with the baseline YOLOv11n, SBA-YOLOv11 has better accuracy, recall, and mAP@50. The aspects increased by 3.7%, 3.6%, and 7.4% respectively, reaching 89.9%, 88.8%, and 94.9%. Compared with the most advanced methods, the overall average accuracy has improved by 2.3%~27.5%. The proposed method successfully balances detection accuracy and computational efficiency, effectively solving the challenge of traffic cone detection in complex driving scenarios and meeting the requirements of autonomous driving applications.
文章引用:周凌峰, 沈航. MSFA:通过多尺度融合特征增强的检测框架[J]. 人工智能与机器人研究, 2026, 15(1): 242-253. https://doi.org/10.12677/airr.2026.151024

参考文献

[1] Chen, Y., Yuan, X., Wang, J., Wu, R., Li, X., Hou, Q., et al. (2025) YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-Time Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47, 4240-4252. [Google Scholar] [CrossRef] [PubMed]
[2] Jiang, H., Hu, F., Fu, X., Chen, C., Wang, C., Tian, L., et al. (2023) YOLOv8-Peas: A Lightweight Drought Tolerance Method for Peas Based on Seed Germination Vigor. Frontiers in Plant Science, 14, Article 1257947. [Google Scholar] [CrossRef] [PubMed]
[3] Bai, J., Zhang, H. and Li, Z. (2018) The Generalized Detection Method for the Dim Small Targets by Faster R-CNN Integrated with Gan. 2018 IEEE 3rd International Conference on Communication and Information Systems (ICCIS), Singapore, 28-30 December 2018, 1-5. [Google Scholar] [CrossRef
[4] Wang, Z., Cao, Y. and Li, J. (2023) A Detection Algorithm Based on Improved Faster R-CNN for Spacecraft Components. 2023 IEEE International Conference on Image Processing and Computer Applications (ICIPCA), Changchun, 11-13 August 2023, 1-5. [Google Scholar] [CrossRef
[5] He, Z., Ye, X. and Li, Y. (2023) Compact Sparse R-CNN: Speeding up Sparse R-CNN by Reducing Iterative Detection Heads and Simplifying Feature Pyramid Network. AIP Advances, 13, Article ID: 055205. [Google Scholar] [CrossRef
[6] Cai, F., Qu, Z. and Yin, X. (2025) A Feature Fusion Network with Multiscale Adaptively Attentional for Object Detection in Complex Traffic Scenes. IEEE Transactions on Intelligent Vehicles, 10, 4217-4230. [Google Scholar] [CrossRef
[7] Wang, C., Mark Liao, H., Wu, Y., Chen, P., Hsieh, J. and Yeh, I. (2020) CSPNet: A New Backbone That Can Enhance Learning Capability of CNN. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, 14-19 June 2020, 1571-1580. [Google Scholar] [CrossRef
[8] Xu, L., Zhao, Y., Zhai, Y., Huang, L. and Ruan, C. (2024) Small Object Detection in UAV Images Based on YOLOv8n. International Journal of Computational Intelligence Systems, 17, Article No. 223. [Google Scholar] [CrossRef
[9] Huang, S., Lu, Z., Cun, X., Yu, Y., Zhou, X. and Shen, X. (2025) DEIM: DETR with Improved Matching for Fast Convergence. 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 10-17 June 2025, 15162-15171. [Google Scholar] [CrossRef