基于门控卷积和堆叠自注意力的离线手写汉字识别算法研究
Research on Offline Handwritten Chinese Character Recognition Algorithm Based on Gated Convolution and Stacked Self-Attention
摘要: 针对离线手写文本识别(HTR)在自然语言处理领域中的重要性以及其广泛应用于帮助视障用户、人机交互和自动录入等方面的实际需求,本研究提出了一个全新的模型。该模型在门控卷积网络的基础上引入了堆叠自注意力编码器–解码器,用于离线识别手写的汉字文本。由于书写风格的多样性、不同字符之间的视觉相似性、字符重叠以及原始文档中的噪音等挑战,设计准确且灵活的HTR系统具有相当大的难度,特别是当处理较为复杂、包含大量字符的文本时,算法的学习能力显得不足。为了解决这一问题,我们提出的模型包括特征提取层、编码器层和解码器层。其中,特征提取层从输入的手写图像中提取高纬度的不变特征图,而编码器和解码器层则相应地转录出文本。实验结果显示,该模型在HCTD数据集上的字符错误率(CER)为6.72,单词错误率(WER)为11.11;在HCWD数据集上的实验结果CER为6.22和WER为7.17。相对于其他研究者的模型,本文设计的模型在手写汉字识别率上提升了11%。
Abstract: In light of the significance of offline handwritten text recognition (HTR) in the field of natural language processing and its wide-ranging applications in meeting the practical needs of assisting visually impaired users, enabling human-computer interaction, and facilitating automated data entry, this study proposes a novel model. The model integrates the stacked self-attention encoder-decoder on the basis of gated convolution networks for recognizing offline handwritten Chinese characters. Given the challenges posed by diverse writing styles, visual similarities among different characters, character overlap, and noise in original documents, designing an accurate and flexible HTR system is notably difficult, especially when dealing with complex text containing a large number of characters, where algorithms often demonstrate limited learning capabilities. To address this issue, our proposed model comprises feature extraction, encoder, and decoder layers. The feature extraction layer extracts high-dimensional invariant feature maps from the input handwritten images, while the encoder and decoder layers transcribe the text accordingly. Experimental results demonstrate that the model achieves a character error rate (CER) of 6.72 and a word error rate (WER) of 11.11 on the HCTD dataset; and on the HCWD dataset, the CER is 6.22 and the WER is 7.17. Compared to models developed by other researchers, our designed model shows an 11% improvement in handwritten Chinese character recognition accuracy.
文章引用:罗序良, 吴毅良, 刘翠媚, 郭凤婵. 基于门控卷积和堆叠自注意力的离线手写汉字识别算法研究[J]. 计算机科学与应用, 2024, 14(5): 48-60. https://doi.org/10.12677/csa.2024.145113

参考文献

[1] Liu, C.-L., Yin, F., Wang, D.-H. and Wang, Q.-F. (2013) Online and Offline Handwritten Chinese Character Recognition: Benchmarking on New Databases. Pattern Recognition, 46, 155-162. [Google Scholar] [CrossRef
[2] Natarajan, P., Saleem, S., Prasad, R., MacRostie, E. and Subramanian, K. (2008) Multi-Lingual Offline Handwriting Recognition Using Hidden Markov Models: A Script-Independent Approach. In: Doermann, D. and Jaeger, S., Eds., Arabic and Chinese Handwriting Recognition, Springer, Berlin, 231-250. [Google Scholar] [CrossRef
[3] España-Boquera, S., Castro-Bleda, M.J., Gorbe-Moya, J. and Zamora-Martinez, F. (2011) Improving Offline Handwritten Text Recognition with Hybrid HMM/ANN Models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33, 767-779. [Google Scholar] [CrossRef
[4] Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2012) ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems, 25, 1-9.
[5] Zhao, Y., Zhang, X., Fu, B., Zhan, Z., Sun, H., Li, L. and Zhang, G. (2022) Evaluation and Recognition of Handwritten Chinese Characters Based on Similarities. Applied Sciences, 12, Article No. 8521. [Google Scholar] [CrossRef
[6] Flor, A., Neto, D.S., Leite, B., Bezerra, D. and Toselli, A.H. (2020) HTR-Flor : A Handwritten Text Recognition System Based on a Pipeline of Optical and Language Models. Association for Computing Machinery, New York.
[7] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł. and Polosukhin, I. (2017) Transformer: Attention Is All You Need. Proceedings of the Advances in Neural Information Processing Systems 30, Long Beach, 4-9 December 2017, 5998-6008.
[8] Puigcerver, J. (2017) Are Multidimensional Recurrent Layers Really Necessary for Handwritten Text Recognition? Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, Kyoto, 2 July 2017, Volume 1, 67-72. [Google Scholar] [CrossRef
[9] Bluche, T. and Messina, R. (2017) Gated Convolutional Recurrent Neural Networks for Multilingual Handwriting Recognition. Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, Kyoto, 2 July 2017, Volume 1, 646-651. [Google Scholar] [CrossRef
[10] Huang, T.S., Yang, G.J. and Tang, G.Y. (1979) A Fast Two-Dimensional Median Filtering Algorithm. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27, 13-18. [Google Scholar] [CrossRef
[11] Praveen, K.S., Babu, K.P. and Sreenivasulu, M. (2016) Implementation of Image Sharpening and Smoothing Using Filters. International Journal of Scientific Engineering and Applied Science, 2, 7-14.
[12] Xu, S., Wu, Q. and Zhang, S. (2020) Application of Neural Network in Handwriting Recognition. IEEE Transactions on International Conference of Stanford University, Stanford, 20-22 December 2020, 1-3.
[13] Bluche, T., Louradour, J. and Messina, R. (2017) Scan, Attend and Read: End-to-End Handwritten Paragraph Recognition with MDLSTM Attention. Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, Kyoto, 9-15 November 2017, Volume 1, 1050-1055. [Google Scholar] [CrossRef
[14] Soomro, M., Farooq, M.A. and Raza, R.H. (2017) Performance Evaluation of Advanced Deep Learning Architectures for Offline Handwritten Character Recognition. Proceedings of the 2017 International Conference on Frontiers of Information Technology, Islamabad, 18-20 December 2017, 362-367. [Google Scholar] [CrossRef
[15] Assabie, Y. and Bigun, J. (2008) Writer-Independent Offline Recognition of Handwritten Ethiopic Characters. Proceedings of the 11th International Conference on Frontiers in Handwriting Recognition (ICFHR), Montréal, 19-21 August 2008, 652-657.
[16] Assabie, Y. and Bigun, J. (2009) HMM-Based Handwritten Amharic Word Recognition with Feature Concatenation. Proceedings of the 2009 10th International Conference on Document Analysis and Recognition, Barcelona, 26-29 July 2009, 961-965. [Google Scholar] [CrossRef
[17] 陈站, 邱卫根, 张立臣. 基于改进inception的脱机手写汉字识别[J]. 计算机应用研究, 2020, 37(4): 1244-1246. [Google Scholar] [CrossRef
[18] 张静娴, 冷青轩, 陈航, 等. 基于图像滤波预处理的卷积神经网络汉字识别[J]. 电工技术, 2023(24): 69-73. [Google Scholar] [CrossRef
[19] Tieleman, T. and Hinton, G. (2012) Lecture 6.5-rmsprop: Divide the Gradient by a Running Average of Its Recent Magnitude. COURSERA: Neural Networks for Machine Learning, 4, 26-31.
[20] Dutta, A. and Zisserman, A. (2021) The VIA Annotation Software for Images, Audio and Video. Proceedings of the 27th ACM International Conference on Multimedia, Nice, 1 January 2021, 2276-2279. [Google Scholar] [CrossRef
[21] Breuel, T.M. (2008) The OCRopus Open Source OCR System. Proceedings of the Document Recognition and Retrieval XV, SPIE, San Jose, 27 January 2008, 120-134.