基于卷积神经网络的目标检测方法
Object Detection Based on Convolutional Neural Network
摘要: 基于候选区域的目标检测算法检测精度较高,但检测速度较慢,无法达到实时检测的效果。针对这一问题,本文提出了一种新型的基于深度可分离卷积神经网络的目标检测方法。首先通过ResNet-101和深度可分离卷积层,提取目标的精简特征图,减少计算量,以提高检测速度;与此同时为了弥补提高检测速度带来的精度损失,提出采用关键点导向的策略,代替传统的回归方法,该策略利用全卷积神经网络对物体位置的敏感特性,有效保留物体的空间信息,使得算法对目标物体定位更精确。最后,为了提高算法对小目标物体的检测能力,使用PS-RoI Align方法代替传统的池化方法,在一定程度上提高算法对小目标物体的检测能力。实验结果表明,在COCO数据集上,该方法能够取得较好的检测效果。
Abstract: Object detection algorithms based on regions proposals have higher detection accuracy, but the detection speed is slower, which cannot achieve the effect of real-time detection. Aiming at this problem, this paper proposes a new type of target detection method based on deep separable convolutional neural network. First, through ResNet-101 and a deep separable convolutional layer, a simplified feature map of the target is extracted to reduce the amount of calculation to increase the detection speed; at the same time, in order to compensate for the accuracy loss caused by the increased detection speed, a key-point-oriented strategy is proposed. Instead of the traditional regression method, this strategy uses the sensitivity of the full convolutional neural network to the position of the object, effectively retains the spatial information of the object, and makes the algorithm locate the target object more accurately. Finally, in order to improve the algorithm’s ability to detect small targets, the PS-RoI Align method is used instead of the traditional pooling method to improve the algorithm’s ability to detect small targets. Experimental results show that the method can achieve better detection results on the COCO dataset.
文章引用:钟文鑫. 基于卷积神经网络的目标检测方法[J]. 软件工程与应用, 2020, 9(1): 36-48. https://doi.org/10.12677/SEA.2020.91005

参考文献

[1] Redmon, J., Divvala, S., Girshick, R., et al. (2016) You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 779-788. [Google Scholar] [CrossRef
[2] Redmon, J. and Farhadi, A. (2017) YOLO9000: Better, Faster, Stronger. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 7263-7271. [Google Scholar] [CrossRef
[3] Redmon, J. and Farhadi, A. (2018) Yolov3: An Incremental Improvement.
[4] Liu, W., Anguelov, D., Erhan, D., et al. (2016) SSD: Single Shot Multibox Detector. In: European Conference on Computer Vision, Springer, Cham, 21-37. [Google Scholar] [CrossRef
[5] Girshick, R., Donahue, J., Darrell, T., et al. (2014) Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 23-28 June 2014, 580-587. [Google Scholar] [CrossRef
[6] Girshick, R. (2015) Fast R-CNN. Proceedings of the IEEE International Conference on Computer Vision, Santiago, 7-13 December 2015, 1440-1448. [Google Scholar] [CrossRef
[7] Dai, J., Li, Y., He, K., et al. (2016) R-FCN: Object Detection via Region-Based Fully Convolutional Networks. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, Curran Associates Inc., Red Hook, 379-387.
[8] Ren, S., He, K., Girshick, R., et al. (2015) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Proceedings of the 28th International Conference on Neural Information Processing Systems, Volume 1, 91-99.
[9] Uijlings, J.R.R., Van De Sande, K.E.A., Gevers, T., et al. (2013) Selective Search for Object Recognition. International Journal of Computer Vision, 104, 154-171. [Google Scholar] [CrossRef
[10] He, K., Zhang, X., Ren, S., et al. (2015) Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 1904-1916. [Google Scholar] [CrossRef
[11] Lu, X., Li, B., Yue, Y., et al. (2019) Grid R-CNN. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 15-21 June 2019, 7363-7372.
[12] Long, J., Shelhamer, E. and Darrell, T. (2015) Fully Convolutional Networks for Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 7-12 June 2015, 3431-3440. [Google Scholar] [CrossRef
[13] He, K., Zhang, X., Ren, S., et al. (2016) Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef
[14] Russakovsky, O., Deng, J., Su, H., et al. (2015) Imagenet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 115, 211-252. [Google Scholar] [CrossRef
[15] Howard, A.G., Zhu, M., Chen, B., et al. (2017) Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications.
[16] Szegedy, C., Liu, W., Jia, Y., et al. (2015) Going Deeper with Convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 7-12 June 2015, 1-9. [Google Scholar] [CrossRef
[17] Li, Z., Peng, C., Yu, G., et al. (2017) Light-Head R-CNN: In Defense of Two-Stage Object Detector.
[18] Lin, T.Y., Maire, M., Belongie, S., et al. (2014) Microsoft Coco: Common Objects in Context. In: European Conference on Computer Vision, Springer, Cham, 740-755. [Google Scholar] [CrossRef
[19] Goyal, P., Dollár, P., Girshick, R., et al. (2017) Accurate, Large Minibatch SGD: Training Imagenet in 1 Hour.
[20] Deng, J., Dong, W., Socher, R., et al. (2009) Imagenet: A Large-Scale Hierarchical Image Database. IEEE Conference on Computer Vision and Pattern Recognition, Miami, 20-25 June 2009, 248-255. [Google Scholar] [CrossRef
[21] Hosang, J., Benenson, R. and Schiele, B. (2017) Learning Non-Maximum Suppression. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 4507-4515. [Google Scholar] [CrossRef
[22] He, K., Gkioxari, G., Dollár, P., et al. (2017) Mask R-CNN. Proceedings of the IEEE International Conference on Computer Vision, Venice, 22-29 October 2017, 2961-2969. [Google Scholar] [CrossRef
[23] Dai, J., Qi, H., Xiong, Y., et al. (2017) Deformable Convolutional Networks. Proceedings of the IEEE International Conference on Computer Vision, Venice, 22-29 October 2017, 764-773. [Google Scholar] [CrossRef
[24] Lin, T.Y., Goyal, P., Girshick, R., et al. (2017) Focal Loss for Dense Object Detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, 22-29 October 2017, 2980-2988. [Google Scholar] [CrossRef
[25] Lin, T.Y., Dollár, P., Girshick, R., et al. (2017) Feature Pyramid Networks for Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 2117-2125. [Google Scholar] [CrossRef
[26] Fu, C.Y., Liu, W., Ranga, A., et al. (2017) DSSD: Deconvolutional Single Shot Detector.