改进YOLOX中特征融合结构的目标检测方法
Improving Object Detection Method of Feature Fusion Structure in YOLOX
DOI: 10.12677/CSA.2022.126151, PDF,  被引量   
作者: 杨 利, 李允臣, 王家宝*, 李 阳, 苗 壮:陆军工程大学指挥控制工程学院,江苏 南京;赵志杰:31700部队,辽宁 沈阳
关键词: 目标检测YOLOX特征融合路径聚合网络Object Detection YOLOX Feature Fusion Path Aggregation Network
摘要: 因无人机俯拍视角的特殊性,航拍目标在成像中呈现出小尺度/多尺度、外观相似度高、背景复杂干扰大等特点,导致航拍目标检测相对通用目标检测更具挑战和难度。为了解决该问题,针对通用目标检测中常用于融合多尺度特征的路径聚合网络(Path Aggregation Network, PANet)模块,本文提出一种改进PANet的多距离关联依赖MDAD (Multi-Distance Association Dependency)模块,该模块包含跨层连接和同层连接两种连接方式,通过密集的跨尺度交互融合增强不同尺度特征层的弱特征信息。同时,基于YOLOX框架和所提出的MDAD模块,构建了更加适合航拍多尺度复杂目标的检测方法。在公开的典型航拍目标检测数据集VisDroneDet上,实验验证了本文所提方法的有效性。所提模块可适用于在不同模型大小的主干网络上进行扩展,具有较好的实际应用价值。
Abstract: Due to the particular camera view of unmanned aerial vehicles, the captured objects show the characteristics of small-scale/multi-scale, high similar appearance, complex background and large interference in imaging, which makes it more challenging and difficult than general object detection. In order to solve this problem, based on the Path Aggregation Network (PANet) module, which is often used to fuse multi-scale features in general target detection networks, this paper proposes a Multi-Distance Association Dependency (MDAD) module by improving PANet, which includes two connection modes, Connection across Different Layers (CDL) and Connection on the Same Layer (CSL). The weak feature information of different scale feature layers is enhanced through intensive cross-scale interactive fusion. At the same time, based on the YOLOX framework and the proposed MDAD module, an object detection method suitable for aerial multi-scale complex objects is proposed. Experiments verified the effectiveness of the proposed MDAD module on the public aerial object detection dataset (VisDroneDet). The proposed module is suitable for expansion on backbone networks with different model sizes, and has good practical application value.
文章引用:杨利, 李允臣, 王家宝, 赵志杰, 李阳, 苗壮. 改进YOLOX中特征融合结构的目标检测方法[J]. 计算机科学与应用, 2022, 12(6): 1518-1528. https://doi.org/10.12677/CSA.2022.126151

参考文献

[1] Gu, J., Su, T., Wang, Q., Du, X. and Guizani, M. (2018) Multiple Moving Targets Surveillance Based on a Cooperative Network for Multi-UAV. IEEE Communications Magazine, 56, 82-89. [Google Scholar] [CrossRef
[2] Zhu, P., Wen, L., Du, D., Bian, X., Ling, H., Hu, Q., et al. (2018) VisDrone-DET2018: The Vision Meets Drone Object Detection in Image Challenge Results. European Confer-ence on Computer Vision (ECCV) Workshops, Munich, 8-14 September 2018, 437-468. [Google Scholar] [CrossRef
[3] Hird, J.N., Montaghi, A., Mcdermid, G.J., Kariyeva, J., Moorman, B.J., Nielsen, S.E., et al. (2017) Use of Unmanned Aerial Vehicles for Monitoring Recovery of Forest Vege-tation on Petroleum Well Sites. Remote Sensing, 9, Article No. 413. [Google Scholar] [CrossRef
[4] Pajares, G. (2015) Overview and Current Status of Remote Sensing Applications Based on Unmanned Aerial Vehicles (UAVs). Photogrammetric Engineering and Remote Sensing, 81, 281-329. [Google Scholar] [CrossRef
[5] Kellenberger B., Volpi M. and Tuia D. (2017) Fast Animal Detection in UAV Images Using Convolutional Neural Networks. IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Worth, 23-28 July 2017, 866-869. [Google Scholar] [CrossRef
[6] Shao, Z., Li, C., Li, D., Altan, O., Zhang, L. and Ding, L. (2020) An Accurate Matching Method for Projecting Vector Data into Surveillance Video to Monitor and Protect Culti-vated Land. ISPRS International Journal of Geo-Information, 9, Article No. 448. [Google Scholar] [CrossRef
[7] Redmon, J. and Farhadi, A. (2018) YOLOv3: An Incremental Improve-ment.
[8] Bochkovskiy, A., Wang, C.-Y. and Liao, H.-Y.M. (2020) YOLOv4: Optimal Speed and Accuracy of Object Detection.
[9] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., et al. (2016) SSD: Single Shot MultiBox Detector. European Conference on Computer Vision (ECCV), Amsterdam, 11-14 October 2016, 21-37. [Google Scholar] [CrossRef
[10] Lin, G., Milan, A., Shen, C. and Reid, I. (2017) RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 5168-5177. [Google Scholar] [CrossRef
[11] Ren, S., He, K., Girshick, R.B. and Sun, J. (2015) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Ma-chine Intelligence, 39, 1137-1149. [Google Scholar] [CrossRef
[12] He, K., Gkioxari, G., Dollár, P. and Girshick, R. (2020) Mask R-CNN. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 386-397. [Google Scholar] [CrossRef
[13] Porat, B. and Friedlander, B. (1990) A Frequency Domain Algorithm for Multiframe Detection and Estimation of Dim Targets. IEEE Transactions on Pattern Analysis and Ma-chine Intelligence, 12, 398-401. [Google Scholar] [CrossRef
[14] Everingham, M., Eslami, S. M.A., Gool, L.V., Williams, C.K.I., Winn, J. and Zisserman, A. (2014) The Pascal Visual Object Classes Challenge: A Retrospective. International Journal of Com-puter Vision, 111, 98-136. [Google Scholar] [CrossRef
[15] Xie, L., Liu, Y., Jin, L. and Xie, Z. (2019) DeRPN: Taking a Further Step toward More General Object Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 9046-9053. [Google Scholar] [CrossRef
[16] Redmon, J., Divvala, S.K., Girshick, R.B. and Farhadi, A. (2016) You Only Look Once: Unified, Real-Time Object Detection. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 779-788. [Google Scholar] [CrossRef
[17] Redmon, J. and Farhadi, A. (2017) YOLO9000: Better, Faster, Stronger. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 6517-6525. [Google Scholar] [CrossRef
[18] Cai, Y., Li, H., Yuan, G., Niu, W., Li, Y., Tang, X., et al. (2021) YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design. Proceedings of the AAAI Conference on Artificial Intelligence, 35, 955-963.
[19] Chen, Q., Wang, Y., Yang, T., Zhang, X., Cheng, J. and Sun, J. (2021) You Only Look One-level Fea-ture. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 13034-13043. [Google Scholar] [CrossRef
[20] Ge, Z., Liu, S., Wang, F., Li, Z. and Sun, J. (2021) YOLOX: Exceeding YOLO Series in 2021.
[21] Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2012) ImageNet Classification with Deep Convolutional Neural Networks. Communications of the ACM, 60, 84-90. [Google Scholar] [CrossRef
[22] Simonyan, K. and Zisserman, A. (2015) Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR, abs/1409.1556.
[23] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015) Going Deeper with Convolutions. IEEE Conference on Computer Vision and Pattern Recog-nition (CVPR), Boston, 7-12 June 2015, 1-9. [Google Scholar] [CrossRef
[24] He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef
[25] Sun, K., Xiao, B., Liu, D. and Wang, J. (2019) Deep High-Resolution Representation Learning for Human Pose Estimation. IEEE/CVF Conference on Computer Vision and Pattern Recogni-tion (CVPR), Long Beach, 15-20 June 2019, 5686-5696. [Google Scholar] [CrossRef
[26] Huang, G., Liu, Z., Weinberger, K.Q. and Van Der Maaten, L. (2017) Densely Connected Convolutional Networks. IEEE Con-ference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 2261-2269. [Google Scholar] [CrossRef
[27] Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B. and Be-longie, S. (2017) Feature Pyramid Networks for Object Detection. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 936-944. [Google Scholar] [CrossRef
[28] Tan, M., Pang, R. and Le, Q.V. (2020) EfficientDet: Scalable and Efficient Object Detection. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 10778-10787. [Google Scholar] [CrossRef
[29] Guo, C., Fan, B., Zhang, Q., Xiang, S. and Pan, C. (2020) AugFPN: Improving Multi-Scale Feature Learning for Object Detection. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 12592-12601. [Google Scholar] [CrossRef
[30] Qiao, S., Chen, L.-C. and Yuille, A.L. (2021) DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution. IEEE/CVF Conference on Com-puter Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 10208-10219. [Google Scholar] [CrossRef
[31] Jiang, Y., Tan, Z., Wang, J., Sun, X., Lin, M. and Li, H. (2022) GiraffeDet: A Heavy-Neck Paradigm for Object Detection. ArXiv, abs/2202.04256.
[32] Liu, S., Huang, D. and Wang, Y. (2019) Learning Spatial Fusion for Single-Shot Object Detection.
[33] Woo, S., Park, J., Lee, J.-Y. and Kweon, I.S. (2018) CBAM: Convolutional Block Attention Module. European Conference on Computer Vision (ECCV), Munich, 8-14 September 2018, 3-19. [Google Scholar] [CrossRef
[34] Liu, S., Qi, L., Qin, H., et al. (2018) Path Aggregation Network for Instance Segmentation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, 18-23 June 2018, 8759-8768. [Google Scholar] [CrossRef