融合区域特征和表格线识别的医学化验单布局识别方法研究
Research on Medical Laboratory Sheet Lay-out Recognition Method Combining Regional Features and Form Line Recognition
DOI: 10.12677/CSA.2022.121008, PDF,    科研立项经费支持
作者: 张泽毅, 傅湘玲*:北京邮电大学计算机学院(国家示范性软件学院),北京;北京邮电大学可信分布式计算与服务教育部重点实验室,北京;郭辰仪:清华大学电子工程系,北京
关键词: 医学化验单文字识别布局识别特征融合Medical Laboratory Sheet Text Recognition Layout Recognition Feature Fusion
摘要: 为解决医学化验单版式繁多,通用文字识别系统输出正确率低,识别结果黏连严重的问题,在医学化验单文字识别全过程中,借鉴文档版面分析的思路,提出医学化验单进行布局识别的研究,以提高文字识别的输出正确率。针对医学化验单布局上属于少线表和其呈明显区域分布的特点,本文提出融合区域特征和表格线识别的医学化验单布局识别算法。首先,利用基于Unet的表格线提取网络,提前化验单图像的表格线。其次,引入Mask R-CNN区域特征提取网络,将表格线特征与原始化验单图像一同作为网络的输入。最后,实现对上述区域特征与表格线特征的融合,对待识别区域之间的关系进行建模,并生成最终的精确坐标和语义标签。实验表明,本算法能够较为明显提高医学化验单布局识别的准确度。
Abstract: In order to solve the problem that the general character recognition system has a low output accuracy rate due to the large number of formats of medical test sheets, and the recognition results are severely stuck. In the whole process of text recognition of medical test sheets, drawing lessons from the idea of document layout analysis, research on the layout recognition of medical test sheets is proposed to improve the output accuracy of text recognition. Aiming at the characteristics of a few-line tables and obvious regional distribution in the layout of medical test sheets, this paper proposes a medical test sheet layout recognition algorithm that integrates regional features and table line recognition. First, use the Unet-based table line extraction network to test the table line of the single image in advance. Second, the Mask R-CNN regional feature extraction network is introduced, and the table line features and the original test sheet image are used as the input of the network. Finally, the fusion of the above-mentioned regional features and table line features is realized, the relationship between the regions to be recognized is modeled, and the final precise coordinates and semantic labels are generated. Experiments show that this algorithm can significantly improve the accuracy of medical test sheet layout recognition.
文章引用:张泽毅, 傅湘玲, 郭辰仪. 融合区域特征和表格线识别的医学化验单布局识别方法研究[J]. 计算机科学与应用, 2022, 12(1): 63-71. https://doi.org/10.12677/CSA.2022.121008

参考文献

[1] Li, X., Yin, F., Xue, T., Liu, L., Ogier, J. and Liu, C. (2019) Instance Aware Document Image Segmentation Using Label Pyramid Networks and Deep Watershed Transformation. 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney, 20-25 September 2019, 514-519. [Google Scholar] [CrossRef
[2] Li, K., Wigington, C., Tensmeyer, C., Zhao, H., Barmpalios, N., Morariu, V.I., Manjunatha, V., Sun, T. and Fu, Y. (2020) Cross-Domain Document Object Detection: Bench-Mark Suite and Method. Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 12915-12924. [Google Scholar] [CrossRef
[3] Chen, K., Seuret, M., Liwicki, M., Hennebert, J. and Ingold, R. (2015) Page Segmentation of Historical Document Images with Convolutional Autoencoders. 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, 23-26 August 2015, 1011-1015. [Google Scholar] [CrossRef
[4] Wick, C. and Puppe, F. (2018) Fully Convolutional Neural Networks for Page Segmentation of Historical Document Images. DAS 2018: 13th IAPR International Workshop on Document Analysis Systems, Vienna, 24-27 April 2018, 287-292. [Google Scholar] [CrossRef
[5] Gatos, B., Louloudis, G. and Stamatopoulos, N. (2014) Segmentation of Historical Handwritten Documents into Text Zones and Text Lines. 2014 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), Hersonissos, 1-4 September 2014, 464-469. [Google Scholar] [CrossRef
[6] Lee, J., Hayashi, H., Ohyama, W. and Uchida, S. (2019) Page Segmentation Using a Convolutional Neural Network with Trainable Co-Occurrence Features. 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney, 20-25 September 2019, 1023-1028. [Google Scholar] [CrossRef
[7] He, D., Cohen, S., Price, B.L., Kifer, D. and Giles, C.L. (2017) Multi-Scale Multi-Task FCN for Semantic Page Segmentation and Table Detection. 2017 14th IAPR Interna-tional Conference on Document Analysis and Recognition (ICDAR), Kyoto, 9-15 November 2017, 254-261. [Google Scholar] [CrossRef
[8] Krishnamoorthy, M.S., Nagy, G., Seth, S.C. and Viswanathan, M. (1993) Syntactic Segmentation and Labeling of Digitized Pages from Technical Journals. The IEEE Transactions on Pat-tern Analysis and Machine Intelligence, 15, 737-747. [Google Scholar] [CrossRef
[9] Shilman, M., Liang, P. and Viola, P.A. (2005) Learning Non-Generative Grammatical Models for Document Analysis. 10th IEEE International Conference on Computer Vision (ICCV 2005), Beijing, 17-20 October 2005, 962-969.
[10] Zhong, X., Tang, J. and Yepes, A.J. (2020) PubLayNet: Largest Dataset Ever for Document Layout Analysis. 2019 International Conference on Document Analysis and Recognition (ICDAR) IEEE, Sydney, 20-25 September 2019, 1015-1022. [Google Scholar] [CrossRef
[11] Aggarwal, M., Sarkar, M., Gupta, H. and Krishnamurthy, B. (2020) Multi-Modal Association-Based Grouping for form Structure Extraction. 2020 IEEE Winter Conference on Ap-plications of Computer Vision (WACV), Snowmass Village, 1-5 March 2020, 2064-2073. [Google Scholar] [CrossRef
[12] Yang, X., Yumer, E., Asente, P., Kraley, M., Kifer, D. and Giles, C.L. (2017) Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 4342-4351. [Google Scholar] [CrossRef
[13] Technicolor, T., Related, S., Technicolor, T., et al. (2017) ImageNet Classification with Deep Convolutional Neural Networks. Communications of the ACM, 60, 84-90.
[14] He, K., Zhang, X., Ren, S., et al. (2016) Deep Residual Learning for Image Recognition. 2016 IEEE Con-ference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef