改进胶囊网络在图像识别中的应用
Application of Improved Capsule Network in Image Recognition
DOI: 10.12677/AAM.2022.114189, PDF,    国家自然科学基金支持
作者: 巩瑞鑫, 贺 衎:太原理工大学数学学院,山西 晋中
关键词: 图像识别卷积神经网络胶囊网络手写数字识别Image Recognition Convolutional Neural Network Capsule Network Handwritten Numeral Recognition
摘要: 图像识别是指利用计算机对图像进行处理、分析和理解,以识别各种不同模式的目标和对象的技术,并对质量不佳的图像进行一系列的增强与重建技术手段,从而有效改善图像质量。本文用改进胶囊网络对MNIST数据集进行训练。胶囊是一组神经元,其活动向量表示一种特定类型的实体的实例化参数,它的长度代表实体存在的概率,方向代表实体的实例化参数,低层的活性胶囊,依据转移矩阵对高层胶囊的实例化参数进行预测,当多个预测一致时,高层胶囊被激活。本文利用spread损失来代替margin损失,避免过早出现“失活”胶囊,并且在不添加重构子网络的情况下,对不同路由迭代次数进行研究,确定路由迭代次数对分类准确率的影响,并确定模型最优参数。研究表明该模型在未做增强和扩展处理的MNIST数据集上的误分率低至0.32%。同时,改进胶囊网络在Fashion-MNIST,CIFAR-10数据集上也表现出了良好的性能。
Abstract: Image recognition refers to the technology of using computer to process, analyze and understand images in order to identify targets and objects in different modes. And carry out a series of enhancement and reconstruction technical means for the poor quality image, so as to effectively improve the image quality. In this paper, the improved capsule network is used to train MNIST data set. Capsule is a group of neurons, and its activity vector represents the instantiation parameters of a specific type of entity. Its length represents the probability of entity existence, and its direction represents the instantiation parameters of entity. For low-level active capsule, the instantiation parameters of high-level capsule are predicted according to the transfer matrix; when multiple predictions are consistent, the high-level capsule is activated. In this paper, spread loss is used to replace margin loss to avoid premature “inactivation” capsule, without adding reconstruction sub network, different routing iteration times are studied to determine the impact of routing iteration times on classification accuracy and determine the optimal parameters of the model. The research shows that the misclassification rate of the model on MNIST data set without enhancement and expansion is as low as 0.32%. At the same time, the improved capsule network also shows good performance on Fashion-MNIST and CIFAR-10 data sets.
文章引用:巩瑞鑫, 贺衎. 改进胶囊网络在图像识别中的应用[J]. 应用数学进展, 2022, 11(4): 1728-1739. https://doi.org/10.12677/AAM.2022.114189

参考文献

[1] Peduzzi, P., Concato, J., Kemper, E., et al. (1996) A Simulation Study of the Number of Events per Variable in Logistic Regression Analysis. Journal of Clinical Epidemiology, 49, 1373-1379. [Google Scholar] [CrossRef
[2] Lawrence, S., Giles, C.L., et al. (1997) Face Recognition: A Convolutional Neural Network Approach. IEEE Transactions on Neural Networks, 8, 98-113. [Google Scholar] [CrossRef] [PubMed]
[3] Zeiler, M.D. and Fergus, R. (2013) Visualizing and Understanding Convolutional Neural Networks. European Conference on Computer Vision, Sydney, 1-8 December 2013, 818-833.
[4] Zeiler, M.D. and Fergus, R. (2013) Stochastic Pooling for Regularization of Deep Convolutional Neural Networks.
[5] Technicolor, T., Related, S., Technicolor, T., et al. (2012) ImageNet Classification with Deep Convolutional Neural Networks.
[6] Simonyan, K. and Zisserman, A. (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition. 3rd International Conference on Learning Representations, San Diego, 7-9 May 2015, 1-12.
[7] Szegedy, C., Liu, W., Jia, Y., et al. (2014) Going Deeper with Convolutions. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 7-12 June 2015, 1-9. [Google Scholar] [CrossRef
[8] Szegedy, C., Vanhoucke, V., Ioffe, S., et al. (2016) Rethinking the Inception Architecture for Computer Vision. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 2818-2826. [Google Scholar] [CrossRef
[9] He, K., Zhang, X., Ren, S., et al. (2016) Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef
[10] Huang, G., Liu, Z., Laurens, V., et al. (2016) Densely Connected Convolutional Networks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 2261-2269.
[11] Sabour, S., Frosst, N. and Hinton, G.E. (2017) Dynamic Routing between Capsules. arXiv:1710.09829 [cs.CV]
[12] Ba, J., Mnih, V. and Kavukcuoglu, K. (2014) Multiple Object Recognition with Visual Attention. 3rd International Conference on Learning Representations, ICLR 2015, San Diego, 7-9 May 2015, 1-10.
[13] Lecun, Y. and Cortes, C. (2010) The MNIST Database of Handwritten Digits. http://yann.lecun.com/exdb/mnist
[14] (2016) TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems.
[15] Kingma, D. and Ba, J. (2014) Adam: A Method for Stochastic Optimization. 3rd International Conference on Learning Representations, ICLR 2015, San Diego, 7-9 May 2015, 1-15.
[16] Chang, J.R. and Chen, Y.S. (2015) Batch-Normalized Maxout Network in Network. Proceedings of the 33rd International Conference on Machine Learning, New York, 20-22 June 2016, 1-9.
[17] Wan, L., Zeiler, M., Zhang, S., et al. (2013) Regularization of Neural Networks Using Dropconnect. International Conference on Machine Learning, PMLR, Atlanta, 17-19 June 2013, 1058-1066.
[18] Xiao, H., Rasul, K. and Vollgraf, R. (2017) Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv:1708.07747 [cs.LG]
[19] Lecun, Y., Fu, J.H. and Bottou, L. (2004) Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting. Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington DC, 27 June-2 July 2004, II-104. [Google Scholar] [CrossRef