基于改进YOLOv3的红外影像目标识别算法研究

doi:10.12677/GST.2022.102007

期刊菜单

基于改进YOLOv3的红外影像目标识别算法研究
Research on Infrared Image Target Recognition Algorithm Based on Improved YOLOv3

DOI: 10.12677/GST.2022.102007, PDF,
作者: 王安祺, 梁祺策, 黄鹤：北京建筑大学，测绘与城市空间信息学院，北京
关键词: 目标识别；行人车辆检测；YOLOv3；Backbone；特征融合；损失函数；Target Recognition； Pedestrian Vehicle Detection； YOLOv3； Backbone； Feature Fusion； Loss Function

摘要: 针对于夜间自动驾驶目标检测行人和车辆目标准确率低的问题，本文提出一种基于改进YOLOv3的红外影像目标识别算法。首先，该算法在原有残差单元基础上进行了改进，同时增加backbone中大尺寸图像的卷积次数，提高特征提取能力，并将后续常规卷积更换为深度可分离卷积，降低模型参数量，提高运行速度；其次，将其多尺度特征融合中特征融合结构更换为Panet结构，提高底层信息的利用率；最后，采用Distance-IoU (DIoU)作为archor损失函数，加快模型收敛。在Flir影像数据集上的测试结果表明，所提改进的YOLOv3红外识别算法改进的模型在模型大小几乎不变的情况下在准确率和召回率上获得较好的检测精度，相比于YOLOv3在行人和汽车两类上分别有2.94%和3.12%提升，平均AP也有3.03%的提升。实验证明，本方法改进后在提高检测精度的同时，还减少了模型量，提高了检测速度。

Abstract: Aiming at the problem of low accuracy of pedestrian and vehicle target detection in automatic driving at night, this paper proposes an infrared image target recognition algorithm based on improved YOLOv3. First, the algorithm improves the feature extraction ability by increasing the number of convolutions of large-size images in the backbone, and replaces subsequent conventional convolutions with depthwise separable convolutions to reduce the amount of model parameters and improve the running speed; In the scale feature fusion, the feature fusion structure is replaced by the Panet structure to improve the utilization of the underlying information; finally, Distance-IoU (DIoU) is used as the archor loss function to speed up the model convergence. The test results on the Flir image data set show that the improved model of the proposed improved YOLOv3 infrared recognition algorithm achieves better detection accuracy in terms of precision and recall when the model size is almost unchanged. Compared with YOLOv3, there are 2.94% and 3.12% increases in pedestrians and cars, respectively, and the average AP also increases by 3.03%. Experiments show that the improved method not only improves the detection accuracy, but also reduces the amount of models and improves the detection speed.

文章引用：王安祺, 梁祺策, 黄鹤. 基于改进YOLOv3的红外影像目标识别算法研究[J]. 测绘科学技术, 2022, 10(2): 61-68. https://doi.org/10.12677/GST.2022.102007

参考文献

[1]	Dalal, N. and Triggs, B. (2005) Histograms of Oriented Gradients for Human Detection. IEEE Computer Society Conference on Computer Vision & Pattern Recognition, San Diego, 20-25 June 2005, 886-893.
[2]	Ojala, T., Pietikäinen, M. and Harwood, D. (1996) A Comparative Study of Texture Measures with Classification Based on Feature Distributions. Pattern Recognition, 29, 51-59. [Google Scholar] [CrossRef]
[3]	Lowe, D.G. (2004) Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 60, 91-110. [Google Scholar] [CrossRef]
[4]	Wren, C.R., Azarbayejani, A.J., Darrell, T.J., et al. (1996) Pfinder: Real-Time Tracking of the Human Body. Proceedings of the Second International Conference on Automatic Face and Gesture Recognition, Killington, 14-16 October 1996, 51-59.
[5]	Yin, J., Lei, L., He, L., et al. (2016) The Infrared Moving Object Detection and Security Detection Related Algorithms Based on W4 and Frame Difference. Infrared Physics & Technology, 77, 302-315. [Google Scholar] [CrossRef]
[6]	Horn, B.K.P. and Schunck, B.G. (1981) Determining Optical Flow. Artificial Intelligence, 17, 185-203. [Google Scholar] [CrossRef]
[7]	Girshick, R., Donahue, J., Darrell, T., et al. (2014) Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 23-28 June 2014, 580-587. [Google Scholar] [CrossRef]
[8]	He, K., Zhang, X., Ren, S., et al. (2014) Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Transactions on Pattern Analysis & Machine Intelligence, 37, 1904-1916. [Google Scholar] [CrossRef]
[9]	Girshick, R. (2015) Fast R-CNN. 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 7-13 December 2015, 1440-1448. [Google Scholar] [CrossRef]
[10]	Ren, S., He, K., Girshick, R., et al. (2017) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1137-1149.
[11]	Dai, J., Li, Y., He, K., et al. (2016) R-FCN: Object Detection via Region-Based Fully Convolutional Networks. Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, 379.
[12]	Sermanet, P., Eigen, D., Zhang, X., et al. (2013) OverFeat: Integrated Recognition, Localization and Detection Using Convolutional Networks. arXiv:1312.6229.
[13]	Redmon, J., Divvala, S., Girshick, R., et al. (2016) You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 779-788. [Google Scholar] [CrossRef]
[14]	Redmon, J. and Farhadi, A. (2017) YOLO9000: Better, Faster, Stronger. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 6517-6525. [Google Scholar] [CrossRef]
[15]	Redmon, J. and Farhadi, A. (2018) YOLOv3: An Incremental Improvement. arXiv: 1804.02767..
[16]	Liu, W., Anguelov, D., Erhan, D., et al. (2016) SSD: Single Shot MultiBox Detector. Springer, Cham.
[17]	Lin, T.Y., Goyal, P., Girshick, R., et al. (2017) Focal Loss for Dense Object Detection. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 2999-3007. [Google Scholar] [CrossRef]
[18]	Lin, T.Y., Dollar, P., Girshick, R., et al. (2017) Feature Pyramid Networks for Object Detection. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 936-944. [Google Scholar] [CrossRef]
[19]	Liu, S., Qi, L., Qin, H., et al. (2018) Path Aggregation Network for Instance Segmentation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 8759-8768. [Google Scholar] [CrossRef]
[20]	Howard, A.G., Zhu, M., Chen, B., et al. (2017) MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv:1704.04861.
[21]	Zheng, Z., Wang, P., Ren, D., et al. (2020) Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation. IEEE Transactions on Cybernetics. [Google Scholar] [CrossRef]
[22]	Zheng, Z., Wang, P., Liu, W., et al. (2020) Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 12993-13000.

为你推荐

友情链接