基于分割的自然场景文本检测技术应用综述
A Review on the Application of Segmentation-Based Text Detection Techniques for Natural Scenes
DOI: 10.12677/airr.2024.132041, PDF,    科研立项经费支持
作者: 陈伟杰, 夏易行, 杜世杰:浙江万里学院,信息与智能工程学院,浙江 宁波
关键词: 文本检测分割综述Text Detection Segmentation Overview
摘要: 场景文本检测旨在从自然场景中准确检测出存在的文本。目前基于分割的场景文本检测技术面临文字种类多样、背景复杂、形状不规则等挑战,但是缺少相应的综合技术,因此,本文将对自然场景文本检测技术进行综述。以下是本文主要内容:1) 阐述场景文本检测领域基于分割的检测算法,包括语义分割和实例分割。2) 介绍一些经典模型和近年提出的创新模型,对其进行分析整合。3) 介绍常用自然场景文本数据集以及对比不同算法的优缺点、性能等。4) 展望基于分割的自然场景文本检测算法未来发展趋势。
Abstract: Scene text detection aims to accurately detect the presence of text from natural scenes. The current segmentation-based scene text detection technology faces challenges such as diverse text types, complex backgrounds, irregular shapes, etc., but lacks the corresponding comprehensive technology; therefore, this paper will review the natural scene text detection technology. The following is the main content of this paper: 1) Explaining the segmentation-based detection algorithms in the field of scene text detection, including semantic segmentation and instance segmentation. 2) Introducing some classical models and innovative models proposed in recent years, and analyzing and integrating them. 3) Introducing the commonly used natural scene text datasets as well as comparing the strengths and weaknesses of different algorithms and their performances, etc. 4) Prospecting the future development of segmentation-based natural scene text detection algorithms, looking ahead to the future development trends of segmentation-based natural scene text detection algorithms.
文章引用:陈伟杰, 夏易行, 杜世杰. 基于分割的自然场景文本检测技术应用综述[J]. 人工智能与机器人研究, 2024, 13(2): 399-407. https://doi.org/10.12677/airr.2024.132041

参考文献

[1] Tian, Z., Huang, W., He, T., et al. (2016) Detecting Text in Natural Image with Connectionist Text Proposal Network. Computer VisionECCV 2016, Amsterdam, 11-14 October 2016, 56-72. [Google Scholar] [CrossRef
[2] Ren, S., He, K., Girshick, R., et al. (2017) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1137-1149. [Google Scholar] [CrossRef] [PubMed]
[3] Liao, M., Shi, B., Bai, X., et al. (2022) TextBoxes: A Fast Text Detector with a Single Deep Neural Network. Proceedings of the AAAI Conference on Artificial Intelligence, 31. [Google Scholar] [CrossRef
[4] Liao, M., Shi, B. and Bai, X. (2018) TextBoxes : A Single-Shot Oriented Scene Text Detector. IEEE Transactions on Image Processing, 27, 3676-3690. [Google Scholar] [CrossRef] [PubMed]
[5] Guo, L., Chen Z. and Chen, X. (2022) Arbitrary-Shaped Text Detection with Gaussian Probability Distance Distribution. 2022 IEEE 5th International Conference on Computer and Communication Engineering Technology (CCET), Beijing, 19-21 August 2022, 58-64. [Google Scholar] [CrossRef
[6] Cui, C., Lu, L., Tan, Z. and Hussain, A. (2021) Conceptual Text Region Network: Cognition-Inspired Accurate Scene Text Detection. Neurocomputing, 464, 252-264.
[7] Liu, F., Gu, D. and Chen, C. (2019) IoU-Related Arbitrary Shape Text Scoring Detector. IEEE Access, 7, 180428-180437. [Google Scholar] [CrossRef
[8] Wu, Y., Kong, Q., Lai, Y., Narducci, F. and Wan, S. (2023) CDText: Scene Text Detector Based on Context-Aware Deformable Transformer. Pattern Recognition Letters, 172, 8-14. [Google Scholar] [CrossRef
[9] Naim, S. and Moumkine, N. (2023) Semantic Segmentation Architecture for Text Detection with an Attention Module. In: Kacprzyk, J., Ezziyyani, M. and Balas, V.E., Eds., International Conference on Advanced Intelligent Systems for Sustainable Development, Springer, Cham, 359-367. [Google Scholar] [CrossRef
[10] Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. Lecture Notes in Computer Science. Medical Image Computing and Computer-Assisted InterventionMICCAI 2015, Munich, 5-9 October 2015, 234-241. [Google Scholar] [CrossRef
[11] Wang, Z., et al. (2022) A Robust Method: Arbitrary Shape Text Detection Combining Semantic and Position Information. Sensors, 22, Article 9982. [Google Scholar] [CrossRef] [PubMed]
[12] Zhang, Z, et al. (2016) Multi-Oriented Text Detection with Fully Convolutional Networks. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 4159-4167. [Google Scholar] [CrossRef
[13] Chen, J., et al. (2019) Irregular Scene Text Detection via Attention Guided Border Labeling. Science China Information Sciences, 62, Article No. 220103. [Google Scholar] [CrossRef
[14] Baek, Y., et al. (2019) Character Region Awareness for Text Detection. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 9357-9366. [Google Scholar] [CrossRef
[15] Zhao, L., et al. (2022) Background-Insensitive Scene Text Recognition with Text Semantic Segmentation. Springer, Cham. [Google Scholar] [CrossRef
[16] Liao, M., et al. (2020) Real-Time Scene Text Detection with Differentiable Binarization. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 11474-11481. [Google Scholar] [CrossRef
[17] Liu, Y., et al. (2022) Efficient and Accurate Text Detection Combining Differentiable Binarization with Semantic Segmentation. Lecture Notes in Computer Science. Artificial Neural Networks and Machine LearningICANN 2022, Bristol, 6-9 September 2022, 630-642. [Google Scholar] [CrossRef
[18] Liao, M., et al. (2023) Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45, 919-931. [Google Scholar] [CrossRef] [PubMed]
[19] Liu, C., et al. (2019) Enhancing Scene Text Detection via Fused Semantic Segmentation Network with Attention. MultiMedia Modeling, Thessaloniki, 8-11 January 2019, 531-542. [Google Scholar] [CrossRef
[20] He, K., et al. (2020) Mask R-CNN. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 386-397. [Google Scholar] [CrossRef] [PubMed]
[21] Liao, M., et al. (2021) Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 532-548. [Google Scholar] [CrossRef] [PubMed]
[22] Liao, M.H., Lyu, P.Y., He, M.H., et al. (2019) Mask TextSpotter: An End-to End Trainable Neural Network for Spotting Text with Arbitrary Shapes. IEEE Trans Pattern Anal Machine Intelligence, 43, 532-548. [Google Scholar] [CrossRef
[23] Xie, E., et al. (2019) Scene Text Detection with Supervised Pyramid Context Network. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 9038-9045. [Google Scholar] [CrossRef
[24] Deng, D., et al. (2022) PixelLink: Detecting Scene Text via Instance Segmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 32. [Google Scholar] [CrossRef
[25] Wang, W., et al. (2019) Shape Robust Text Detection with Progressive Scale Expansion Network. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 9328-9337. [Google Scholar] [CrossRef
[26] Wang, W., et al. (2019) Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 8439-8448. [Google Scholar] [CrossRef
[27] Liu, Y., et al. (2021) FCENet: An Instance Segmentation Model for Extracting Figures and Captions from Material Documents. IEEE Access, 9, 551-564. [Google Scholar] [CrossRef
[28] Chen, H., et al. (2020) BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 8570-8578. [Google Scholar] [CrossRef
[29] Wang, W., et al. (2021) PAN : Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 5349-5367. [Google Scholar] [CrossRef] [PubMed]
[30] Qian, X., et al. (2020) MGPAN: Mask Guided Pixel Aggregation Network. 2020 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, 25-28 October 2020, 1981-1985. [Google Scholar] [CrossRef
[31] Fu, Z., et al. (2023) Learning Pixel Affinity Pyramid for Arbitrary-Shaped Text Detection. ACM Transactions on Multimedia Computing, Communications, and Applications, 19, Article No. 29. [Google Scholar] [CrossRef
[32] Li, H., et al. (2023) Arbitrary Shape Scene Text Detector with Accurate Text Instance Generation Based on Instance-Relevant Contexts. Multimedia Tools and Applications, 82, 17827-17852. [Google Scholar] [CrossRef
[33] Zhang, S.-X., et al. (2022) Arbitrary Shape Text Detection via Segmentation with Probability Maps. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45, 2736-2750. [Google Scholar] [CrossRef] [PubMed]
[34] Ye, J, et al. (2020) TextFuseNet: Scene Text Detection with Richer Fused Features. Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence (IJCAI-PRICAI-20 2020), 516-522. [Google Scholar] [CrossRef
[35] Xu, Y., et al. (2019) TextField: Learning a Deep Direction Field for Irregular Scene Text Detection. IEEE Transactions on Image Processing, 28, 5566-5579. [Google Scholar] [CrossRef] [PubMed]
[36] Liu, Z., et al. (2021) MFECN: Multi-Level Feature Enhanced Cumulative Network for Scene Text Detection. ACM Transactions on Multimedia Computing, Communications, and Applications, 17, Article No. 78. [Google Scholar] [CrossRef
[37] Song, X., et al. (2020) TK-Text: Multi-Shaped Scene Text Detection via Instance Segmentation. MultiMedia Modeling, Daejeon, 5-8 January 2020, 201-213. [Google Scholar] [CrossRef
[38] Wu, Y., et al. (2021) Multiple Attention Encoded Cascade R-CNN for Scene Text Detection. Journal of Visual Communication and Image Representation, 80, Article 103261. [Google Scholar] [CrossRef
[39] Yang, P., et al. (2020) Instance Segmentation Network with Self-Distillation for Scene Text Detection. IEEE Access, 8, 45825-45836. [Google Scholar] [CrossRef
[40] Sheng, T., et al. (2021) CentripetalText: An Efficient Text Instance Representation for Scene Text Detection. [Google Scholar] [CrossRef
[41] Zhu, Y. and Du, J. (2021) TextMountain: Accurate Scene Text Detection via Instance Segmentation. Pattern Recognition, 110, Article 107336. [Google Scholar] [CrossRef
[42] Hu, Z., et al. (2021) TCATD: Text Contour Attention for Scene Text Detection. 2020 25th International Conference on Pattern Recognition (ICPR), Milan, 10-15 January 2021, 1083-1088. [Google Scholar] [CrossRef