基于轻量级神经网络的食材识别方法研究
Research on Food Ingredients Identification Method Based on Lightweight Neural Network
DOI: 10.12677/CSA.2021.114099, PDF,    国家自然科学基金支持
作者: 黄颖康, 曾 碧:广东工业大学,广东 广州
关键词: 深度学习轻量化非极大抑制食材识别目标检测Deep Learning Lightweight NMS Food Ingredients Identification Object Detection
摘要: 随着计算机视觉技术的快速发展,基于深度学习的目标检测技术已广泛应用于诸多领域。由于目前目标检测算法模型复杂,计算量大,无法应用于嵌入式设备中,为了满足在嵌入式设备中使用食材识别功能的需求,提出了对目标识别模型YOLOv3的改进方法,将轻量化神经网络MobileNet应用于YOLOv3中,把YOLOv3的主干网络darknet53替换为MobileNet;然后采用Cluster-NMS算法,配合中心距离法和加权平均法提升网络的准确度。通过收集得来的食材数据集和VOC 2007数据集对网络进行对比实验。实验表明,改进后的网络模型,既能满足其迁移到嵌入式设备的轻量级需求,而且无论在识别速度和精度上都有提升,满足在嵌入式设备实现食材识别的功能。
Abstract: With the rapid development of computer vision technology, object detection technology based on deep learning has been widely used in many fields. Due to the complexity of the current target detection algorithm model, the amount of calculation is large, model can not be applied to embedded devices. In order to meet the needs of using food ingredients identification function in embedded devices, an improved method of object recognition model YOLOv3 is proposed. The lightweight neural network MobileNet is applied in YOLOv3, and the main network darknet53 of YOLOv3 is replaced by MobileNet. And then Cluster Non-Maximum Suppression (Cluster-NMS), Distance NMS and Weighted NMS are used to improve the accuracy of the neural network. Through comparative experiments by testing neural network with food ingredients data set and VOC 2007 data set, the improved network model meets the needs of network migration to embedded devices, improves the accuracy of the neural network and the ability of food ingredients recognition, and realizes the function of food identification in embedded devices.
文章引用:黄颖康, 曾碧. 基于轻量级神经网络的食材识别方法研究[J]. 计算机科学与应用, 2021, 11(4): 962-974. https://doi.org/10.12677/CSA.2021.114099

参考文献

[1] Girshick, R., Donahue, J., Darrell, T. and Malik, J. (2013) Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 23-28 June 2014, 580-587. [Google Scholar] [CrossRef
[2] Girshick, R. (2015) Fast R-CNN. 2015 IEEE International Conference on Computer Vision, Santiago, 7-13 December 2015, 1440-1448. [Google Scholar] [CrossRef
[3] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., et al. (2016) SSD:Single Shot MultiBox Detector. Proceedings of European Conference on Computer Vision, Amster-dam, 8-16 October 2016, 21-37. [Google Scholar] [CrossRef
[4] Redmon, J., Divvala, S., Girshick, R. and Farhadi, A. (2016) You Only Look Once: Unified, Real-Time Object Detectiono. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 779-788. [Google Scholar] [CrossRef
[5] Redmon, J. and Farhadi, A. (2017) YOLO9000: Better, Faster, Stronger. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 6517-6525. [Google Scholar] [CrossRef
[6] Redmon, J. and Farhadi, A. (2018) YOLOv3: An Incremental Im-provement. arXiv: 1804.02767.
[7] He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef
[8] Simonyan, K. and Zisserman, A. (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv: 1409.1556.
[9] Szegedy, C., Ioffe, S., Vanhoucke, V. and Alemi, A.A. (2017) Inception-V4, Inception-Resnet and the Impact of Residual Connections on Learning. Proceedings of the 31th AAAI Conference on Artificial Intelligence, San Francisco, 4-9 February 2017, 4278–4284.
[10] Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. and Wojna, Z. (2016) Rethinking the Inception Architecture for Computer Vision. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 2818-2826. [Google Scholar] [CrossRef
[11] Wang, M., Liu, B. and Foroosh, H. (2017) Factorized Convolutional Neural Networks. Proceedings of the 2017 IEEE International Conference on Computer Vision Work-shops, Venice, 22-29 October 2017, 545-553. [Google Scholar] [CrossRef
[12] Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J. and Keutzer, K. (2016) SqueezeNet: AlexNet-Level Accuracy with 50x Fewer Parameters and< 0.5 MB Model Size. arXiv: 1602.07360.
[13] Wu, J., Leng, C., Wang, Y., Hu, Q. and Cheng, J. (2016) Quantized Convolutional Neural Networks for Mobile Devices. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 4820-4828. [Google Scholar] [CrossRef
[14] Rastegari, M., Ordonez, V., Redmon, J. and Farhadi, A. (2016) XNOR-Net: Imagenet Classification Using Binary convolutional Neural Networks. European Conference on Computer Vision, Amsterdam, 8-16 October, 525-542. [Google Scholar] [CrossRef
[15] Han, S., Mao, H. and Dally, W.J. (2015) Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv: 1510.00149.
[16] He, Y., Lin, J., Liu, Z., Wang, H., Li, L.-J. and Han, S. (2018) AMC: AutoML for Model Compres-sion and Acceleration on Mobile Devices. Proceedings of the European Conference on Computer Vision, Munich, 8-14 September, 815-532. [Google Scholar] [CrossRef
[17] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., et al. (2017) MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv: 1704.04861,.
[18] Sifre, L. and Mallat, S. (2014) Rigid-Motion Scattering for Texture Classification. arXiv: 1403.1687.
[19] Jin, J., Dundar, A. and Culurciello, E. (2014) Flattened Convolutional Neural Networks for Feedfor-ward Acceleration. arXiv: 1412.5474,.
[20] Bolya, D., Zhou, C., Xiao, F. and Lee, Y.J. (2019) YOLACT: Real-Time Instance Segmentation. Proceedings of the 2019 IEEE International Conference on Computer Vision, Seoul, 27 Octo-ber-2 November 2019, 9157-9166. [Google Scholar] [CrossRef
[21] Zheng, Z., Wang, P., Ren, D., Liu, W., Ye, R., Hu, Q., et al. (2020) Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation. arXiv: 2005.03572.
[22] Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R. and Ren, D. (2020) Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 12993-13000. [Google Scholar] [CrossRef
[23] Ning, C., Zhou, H., Song, Y. and Tang, J. (2017) Inception Single Shot Multibox Detector for Object Detection. 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), Hong Kong, 10-14 July 2017, 549-554. [Google Scholar] [CrossRef