面向图像处理的深度学习算法在文本识别中的应用
Application of Deep Learning Algorithms for Image Processing in Text Recognition
DOI: 10.12677/SEA.2022.114074, PDF,   
作者: 李冬妮:南京林业大学信息科学技术学院,江苏 南京
关键词: 图像处理深度学习文字识别神经网络Image Processing Deep Learning Text Recognition Neural Networks
摘要: 文字识别的难点不仅表现在文字的出现形式千差万别、汉字笔画多样以及字体种类繁多;也表现在实际生活中可能出现文字被遮盖或者有复杂的背景等各种各样的情况。为更有效的进行文字识别,基于用于图文识别的卷积递归神经网络(Convolutional Recurrent Neural Network, CRNN)模型提出了一种文字识别模型,并使用Python语言和Keras进行文字识别系统的实现。首先是数据增强算法的设计;其次是特征提取网络的设计;然后是对网络的决策层设计;最后采用一个卷积层去替换最初 CRNN 模型里参数量大以及不易收敛的长短期记忆网络(Long Short-Term Memorynetworks, LSTM)层。运用此方法一方面可以提高文字的识别准确率,另一方面可以降低网络参数以及提高网络的收敛速度。实验结果显示,使用该方法设计的文字识别系统不仅可以对各种字符进行识别,而且识别准确率较高。
Abstract: The difficulties of character recognition are not only in the various forms of characters, various strokes of Chinese characters and various types of fonts, but also in real life, there may be various situations such as text being covered or complex background. In order to carry out a more effective character recognition, a character recognition model is proposed based on the Convolutional Recurrent Neural Network (CRNN) model for character recognition, and the character recognition system is realized by Python language and Keras. The first is the design of a data enhancement algorithm; Secondly, the design of a feature extraction network; Then the design of the decision-making layer of the network; Finally, a convolution layer is used to replace the Long Short-Term Memory networks (LSTM) layer with large parameters and difficult convergence in the initial CRNN model. On one hand, this method can improve the accuracy of character recognition. On the other hand, it can reduce the network parameters and improve the convergence speed of the network. The ex-perimental results show that the character recognition system designed by this method can not only recognize various characters, but also has high recognition accuracy.
文章引用:李冬妮. 面向图像处理的深度学习算法在文本识别中的应用[J]. 软件工程与应用, 2022, 11(4): 712-720. https://doi.org/10.12677/SEA.2022.114074

参考文献

[1] 武子毅, 刘亮亮, 张再跃. 基于集成注意力层卷积神经网络的汉字识别[J]. 计算机技术与发展, 2018, 28(8): 100-103.
[2] Ha, I., Kim, H., Park, S., et al. (2018) Image Retrieval Using BIM and Features from Pretrained VGG Network for Indoor Localization. Building & Environment, 140, 23-31. [Google Scholar] [CrossRef
[3] 付发, 未建英, 张丽娜. 基于卷积网络的遥感图像建筑物提取技术研究[J]. 软件工程, 2018, 228(6): 8-11.
[4] Dolz, J., Gopinath, K., Jing, Y., et al. (2018) Hyper Dense-Net: A Hyper-Densely Connected CNN for Multi-Modal Image Segmentation. IEEE Transactions on Medical Imaging, 38, 1116-1126.
https://ieeexplore.ieee.org/document/8515234/
[5] Zhou, F., Li, X. and Li, Z. (2018) High-Frequency Details Enhancing DenseNet for Super-Resolution. Neurocomputing, 290, 34-42. [Google Scholar] [CrossRef
[6] 夏昌新, 莫浩泓, 王成鑫, 等. 基于深度学习的图像文字识别技术研究与应用[J]. 软件导刊, 2020, 19(2): 127-131.
[7] Khened, M., Alex, V. and Krishnamurthi, G. (2019) Fully Convolutional Multi-Scale Residual DenseNets for Cardiac Segmentation and Automated Cardiac Diagnosis using Ensemble of Classifiers. Medical Image Analysis, 51, 21-45. [Google Scholar] [CrossRef] [PubMed]
[8] 李文英, 曹斌, 曹春水, 等. 一种基于深度学习的青铜器铭文识别方法[J]. 自动化学报, 2018, 44(11): 105-112.
[9] Ma, J., Shao, W., Ye, H., et al. (2018) Arbitrary-Oriented Scene Text Detection via Rotation Proposals. IEEE Transactions on Multimedia, 20, 3111-3122.
[10] 白翔, 杨明锟, 石葆光, 等. 基于深度学习的场景文字检测与识别[J]. 中国科学: 信息科学, 2018, 48(5): 51-64.
[11] Liao, M., Zhu, Z., Shi, B., et al. (2018) Rotation-Sensitive Regression for Oriented Scene Text Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 11, 5909-5918. [Google Scholar] [CrossRef
[12] Deng, D., Liu, H., Li, X., et al. (2018) Pixel Link: Detecting Scene Text via Instance Segmentation. arXiv Preprint, 18, 13-15.
[13] 李新炜, 殷韶坤. 深度学习在文字识别领域的应用[J]. 电子技术与软件工程, 2018, 11(24): 40.
[14] Bai, F., Cheng, Z., Niu, Y., et al. (2018) Edit Probability for Scene Text Recognition. arXiv Preprint, 18, 33-34. [Google Scholar] [CrossRef