面向电商无人配送复杂场景的感知模型研究
Research on Perception Models for Complex Scenarios in E-Commerce Unmanned Delivery
DOI: 10.12677/ecl.2025.14113592, PDF,   
作者: 刘予莘:贵州大学大数据与信息工程学院,贵州 贵阳
关键词: 无人配送无人驾驶目标检测RT-DETRAFGCEMAUnmanned Delivery Autonomous Driving Object Detection RT-DETR AFGC EMA
摘要: 在电商高速发展与用工成本上升的背景下,无人配送具备降本增效、全天候运行与无接触服务等优势,但其规模化落地仍受限于复杂环境下的稳健感知。环境感知是保障无人配送车辆安全通行与高效决策的关键,其准确性直接决定配送车对动态场景的理解能力。基于Transformer的RT-DETR依托全局注意力与端到端检测实现了较高效率与精度,但在无人配送典型场景中的多尺度目标与频繁遮挡下,仍存在特征融合与遮挡鲁棒性不足。为此,本文提出面向电商无人配送的RT-DETR改进方案:在骨干网络关键层嵌入自适应聚焦全局上下文注意力模块,通过动态调节感受野增强多尺度表征,从而提升对小目标与遮挡体的可分辨性;并在FPN/PAN中引入指数移动平均增强的跨维度注意力机制,以更稳健地建模长程依赖并优化跨层特征融合。实验结果表明,改进模型在Udacity自动驾驶数据集上实现mAP@50提升25.6%、mAP@50-95提升13%,验证了方法在电商无人配送典型场景中的迁移性与应用价值。
Abstract: Against the backdrop of rapid e-commerce development and rising labor costs, unmanned delivery offers advantages, such as cost reduction, efficiency improvement, 24/7 operation, and contactless services. However, its large-scale deployment remains constrained by robust perception in complex environments. Environmental perception is crucial for ensuring the safe passage and efficient decision-making of unmanned delivery vehicles, with its accuracy directly determining the vehicle’s ability to understand dynamic scenes. While the Transformer-based RT-DETR achieves high efficiency and accuracy through global attention and end-to-end detection, it still suffers from insufficient feature fusion and occlusion robustness when dealing with multi-scale objects and frequent occlusions in typical unmanned delivery scenarios. To address these issues, this paper proposes an improved RT-DETR model tailored for e-commerce unmanned delivery. An adaptive global context attention module is embedded into key layers of the backbone network to enhance multi-scale representation by dynamically adjusting the receptive field, thereby improving discernibility for small objects and occluded targets. Additionally, an exponential moving average-enhanced cross-dimensional attention mechanism is introduced into the FPN/PAN to more robustly model long-range dependencies and optimize cross-layer feature fusion. The experimental results demonstrate that the improved model achieved a 25.6% increase in mAP@50 and a 13% improvement in mAP@50-95 on the Udacity autonomous driving dataset, validating the transferability and application value of the proposed method in typical e-commerce unmanned delivery scenarios.
文章引用:刘予莘. 面向电商无人配送复杂场景的感知模型研究[J]. 电子商务评论, 2025, 14(11): 1533-1541. https://doi.org/10.12677/ecl.2025.14113592

参考文献

[1] 无人配送在国内商业化的现状、挑战及建议[J]. 智能网联汽车, 2020(2): 60-67.
[2] 王世峰, 戴祥, 徐宁, 等. 无人驾驶汽车环境感知技术综述[J]. 长春理工大学学报(自然科学版), 2017, 40(1): 1-6.
[3] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A. and Zagoruyko, S. (2020) End-to-End Object Detection with Transformers. In: European Conference on Computer Vision, Springer International Publishing, 213-229. [Google Scholar] [CrossRef
[4] Redmon, J., Divvala, S., Girshick, R. and Farhadi, A. (2016) You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 779-788. [Google Scholar] [CrossRef
[5] Zhao, Y., Lv, W., Xu, S., Wei, J., Wang, G., Dang, Q., et al. (2024) DETRs Beat YOLOs on Real-Time Object Detection. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 17-18 June 2024, 16965-16974. [Google Scholar] [CrossRef
[6] Sun, H., Wen, Y., Feng, H., Zheng, Y., Mei, Q., Ren, D., et al. (2024) Unsupervised Bidirectional Contrastive Reconstruction and Adaptive Fine-Grained Channel Attention Networks for Image Dehazing. Neural Networks, 176, Article ID: 106314. [Google Scholar] [CrossRef] [PubMed]
[7] Ouyang, D., He, S., Zhang, G., Luo, M., Guo, H., Zhan, J., et al. (2023) Efficient Multi-Scale Attention Module with Cross-Spatial Learning. 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, 4-10 June 2023, 1-5. [Google Scholar] [CrossRef
[8] 伍景琼, 陈子伟, 岑明睿, 等. 无人机配送模式及关键技术研究综述[J]. 交通信息与安全, 2025, 43(3): 112-127.
[9] Ren, S., He, K., Girshick, R., et al. (2015) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1137-1149.
[10] Zhang, H., Li, F., Liu, S., et al. (2022) DINO: DETR with Improved Denoising Anchor Boxes for End-to-End Object Detection.
[11] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., et al. (2021) Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 10012-10022. [Google Scholar] [CrossRef
[12] 伍景琼, 奠然, 字太升, 等. 无人机配送研究: 关于技术、效益、应用的系统综述[J/OL]. 交通运输系统工程与信息, 1-21.
https://link.cnki.net/urlid/11.4520.u.20250905.0958.008, 2025-10-18.