一种用于无人机小目标检测的轻量级多维特征网络
A Lightweight Multidimensional Feature Network for UAV Small Object Detection
摘要: 无人机小目标检测在军事、搜救和智慧城市等领域有重要应用,但现有模型参数量大、计算复杂。本研究提出轻量级TriD-UAV检测器,通过构建轻量级特征提取网络TriD-Net和高效特征融合网络DENet,旨在平衡精度与计算效率。TriD-Net利用神经架构搜索引入双分支通用倒残差瓶颈结构提升效率,DENet通过深度通道部分卷积阶段减少计算并融合特征。模型使用解耦检测头和基于Wasserstein距离的损失函数增强小目标检测能力。实验表明,TriD-UAV在VisDrone数据集上实现了良好的性能与轻量化平衡,mAP50~95达到21.3%,参数量和FLOPs显著降低。
Abstract: Drone-based small object detection has significant applications in military operations, search and rescue missions, and smart city initiatives. However, existing models suffer from large parameter sizes and high computational complexity. This study proposes a lightweight TriD-UAV detector, which aims to balance accuracy and computational efficiency by constructing a lightweight feature extraction network (TriD-Net) and an efficient feature fusion network (DENet). TriD-Net employs neural architecture search (NAS) to introduce a dual-branch generalized inverted residual bottleneck structure, enhancing efficiency. DENet reduces computational overhead and fuses features through a depth wise channel-wise partial convolution stage. The model further improves small object detection capability using a decoupled detection head and a loss function based on Wasserstein distance. Experiments demonstrate that TriD-UAV achieves a favorable trade-off between performance and lightweight design on the VisDrone dataset, attaining an mAP50~95 of 21.3% while significantly reducing both parameters and FLOPs.
文章引用:马宏伟. 一种用于无人机小目标检测的轻量级多维特征网络[J]. 软件工程与应用, 2025, 14(3): 736-750. https://doi.org/10.12677/sea.2025.143065

参考文献

[1] Xue, Y., Jin, G., Shen, T., Tan, L., Wang, N., Gao, J., et al. (2023) SmallTrack: Wavelet Pooling and Graph Enhanced Classification for UAV Small Object Tracking. IEEE Transactions on Geoscience and Remote Sensing, 61, 1-15. [Google Scholar] [CrossRef
[2] Xue, Y., Jin, G., Shen, T., Tan, L. and Wang, L. (2023) Template-Guided Frequency Attention and Adaptive Cross-Entropy Loss for UAV Visual Tracking. Chinese Journal of Aeronautics, 36, 299-312. [Google Scholar] [CrossRef
[3] Ren, S., He, K., Girshick, R. and Sun, J. (2015) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv: 1506.01497.
[4] He, K., Gkioxari, G., Dollar, P. and Girshick, R. (2017) Mask R-CNN. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 2980-2988. [Google Scholar] [CrossRef
[5] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C., et al. (2016) SSD: Single Shot MultiBox Detector. In: Leibe, B., Matas, J., Sebe, N. and Welling, M., Eds., Computer VisionECCV 2016, Springer, 21-37. [Google Scholar] [CrossRef
[6] Redmon, J., Divvala, S., Girshick, R. and Farhadi, A. (2016) You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 779-788 [Google Scholar] [CrossRef
[7] Sayed, A.N., Ramahi, O.M. and Shaker, G. (2024) RDIwS: An Efficient Beamforming-Based Method for UAV Detection and Classification. IEEE Sensors Journal, 24, 15230-15240. [Google Scholar] [CrossRef
[8] Hu, N., Yang, J., Pan, W., Xu, Q., Shao, S. and Tang, Y. (2024) UAV Detection Based on the Variance of Higher-Order Cumulants. IEEE Transactions on Vehicular Technology, 73, 11182-11195. [Google Scholar] [CrossRef
[9] Zhang, X., Zhou, X., Lin, M. and Sun, J. (2018) ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 6848-6856. [Google Scholar] [CrossRef
[10] Ma, N., Zhang, X., Zheng, H. and Sun, J. (2018) ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. In: Ferrari, V., Hebert, M., Sminchisescu, C., and Weiss, Y., Eds., Computer VisionECCV 2018, Springer, 122-138. [Google Scholar] [CrossRef
[11] Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C. and Xu, C. (2020) GhostNet: More Features from Cheap Operations. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 1577-1586. [Google Scholar] [CrossRef
[12] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M. and Adam, H. (2017) MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv: 1704.04861.
[13] Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. and Chen, L. (2018) MobileNetV2: Inverted Residuals and Linear Bottlenecks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 4510-4520. [Google Scholar] [CrossRef
[14] Howard, A., Sandler, M., Chen, B., Wang, W., Chen, L., Tan, M., et al. (2019) Searching for MobileNetV3. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 1314-1324. [Google Scholar] [CrossRef
[15] Qin, D., Leichner, C., Delakis, M., Fornoni, M., Luo, S., Yang, F., et al. (2024) MobileNetV4: Universal Models for the Mobile Ecosystem. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T. and Varol, G., Eds., Computer VisionECCV 2024, Springer, 78-96. [Google Scholar] [CrossRef
[16] Chen, J., Kao, S., He, H., Zhuo, W., Wen, S., Lee, C., et al. (2023) Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 12021-12031. [Google Scholar] [CrossRef
[17] Girshick, R. (2015) Fast R-CNN. 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 7-13 December 2015, 1440-1448. [Google Scholar] [CrossRef
[18] Yu, J., Jiang, Y., Wang, Z., Cao, Z. and Huang, T. (2016) UnitBox: An Advanced Object Detection Network. Proceedings of the 24th ACM International Conference on Multimedia, Amsterdam, 15-19 October 2016, 516-520. [Google Scholar] [CrossRef
[19] Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I. and Savarese, S. (2019) Generalized Intersection over Union: A Metric and a Loss for Bounding Box Regression. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 658-666. [Google Scholar] [CrossRef
[20] Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R. and Ren, D. (2020) Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 12993-13000. [Google Scholar] [CrossRef
[21] Gevorgyan, Z. (2022) SIoU Loss: More Powerful Learning for Bounding Box Regression. arXiv: 2205.12740.
[22] Wang, J., Xu, C., Yang, W. and Yu, L. (2021) A Normalized Gaussian Wasserstein Distance for Tiny Object Detection. arXiv: 2110.13389.
[23] Song, G., Du, H., Zhang, X., Bao, F. and Zhang, Y. (2024) Small Object Detection in Unmanned Aerial Vehicle Images Using Multi-Scale Hybrid Attention. Engineering Applications of Artificial Intelligence, 128, Article ID: 107455. [Google Scholar] [CrossRef
[24] Jiang, L., Yuan, B., Du, J., Chen, B., Xie, H., Tian, J., et al. (2024) MFFSODNet: Multiscale Feature Fusion Small Object Detection Network for UAV Aerial Images. IEEE Transactions on Instrumentation and Measurement, 73, 1-14. [Google Scholar] [CrossRef
[25] Li, Z., He, Q. and Yang, W. (2024) E-FPN: An Enhanced Feature Pyramid Network for UAV Scenarios Detection. The Visual Computer, 41, 675-693. [Google Scholar] [CrossRef
[26] Li, X., Wang, W., Wu, L., Chen, S., Hu, X., Li, J., Tang, J. and Yang, J. (2020) Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection. Advances in Neural Information Processing Systems, 33, 21002-21012.