基于字形增强和Transformer的损坏手写东巴字符图像修复
Text Image Inpainting for Damaged Handwritten Dongba Characters via Glyph Enhancement and Transformer
DOI: 10.12677/csa.2026.161026, PDF,    科研立项经费支持
作者: 张 扬*, 刘弈楠, 牟大中:北京印刷学院信息工程学院,北京
关键词: 东巴文字图像修复小波变换Transformer深度学习Dongba Character Image Inpainting Wavelet Transform Transformer Deep Learning
摘要: 受自然老化、环境侵蚀等因素影响,古籍普遍存在文字损坏的现象,现有通用图像修复算法因使用非专用的掩码且难以理解文字结构,导致出现算法泛化性不足和字形结构修复错误问题。针对上述问题,构建了一个模拟真实损坏的专用掩码数据集(其中包含九种损坏类型),同时提出了一种融合字形增强和Transformer的东巴字符图像修复算法。该算法首先引入字形增强模块,利用小波分解在低频域提取字形结构特征信息;然后,利用字形结构特征信息引导Transformer对缺失区域进行结构推断,重建结构完整、纹理清晰的东巴字符图像。在损坏手写东巴字符图像数据集上的对比实验表明,所提方法在不同掩码比例上在多项指标上均优于现有算法。实验结果验证了所提算法的有效性,并为古籍文字及少数民族象形文字的修复提供了可行方案。
Abstract: Influenced by factors such as natural aging and environmental erosion, ancient documents universally suffer from text degradation. Existing general inpainting algorithms, due to the use of non-specialized masks and a failure to capture character structures, result in weak generalization and structural errors in reconstructed glyphs. To address these issues, we construct a specialized mask dataset simulating real-world damage (encompassing nine damage types) and propose a Dongba character inpainting algorithm that integrates a glyph enhancement module with a Transformer. The algorithm first employs the glyph enhancement module, which utilizes wavelet decomposition to extract structural feature information from the low-frequency domain. Subsequently, these structural features guide the Transformer to perform structural inference and content generation for missing regions, thereby reconstructing Dongba character images with complete structures and clear textures. Comparative experiments on a damaged handwritten Dongba character dataset demonstrate that the proposed method significantly outperforms existing algorithms across multiple metrics at various mask ratios. Experimental results validate the effectiveness of the proposed algorithm and offer a viable solution for the inpainting of ancient texts and ethnic minority pictographic scripts.
文章引用:张扬, 刘弈楠, 牟大中. 基于字形增强和Transformer的损坏手写东巴字符图像修复[J]. 计算机科学与应用, 2026, 16(1): 317-327. https://doi.org/10.12677/csa.2026.161026

参考文献

[1] Wu, G., Ding, C., Xu, X., et al. (2016) Intelligent Recognition on Dongba Manuscripts Hieroglyphs. Journal of Electronic Measurement and Instrumentation, 30, 1774-1779.
[2] Lugmayr, A., Danelljan, M., Romero, A., Yu, F., Timofte, R. and Van Gool, L. (2022) RePaint: Inpainting Using Denoising Diffusion Probabilistic Models. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 11451-11461. [Google Scholar] [CrossRef
[3] Krishnan, P., Kovvuri, R., Pang, G., Vassilev, B. and Hassner, T. (2023) TextStyleBrush: Transfer of Text Aesthetics from a Single Example. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45, 9122-9134. [Google Scholar] [CrossRef] [PubMed]
[4] Dong, Q., Cao, C. and Fu, Y. (2022) Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 11348-11358. [Google Scholar] [CrossRef
[5] Cao, C., Dong, Q. and Fu, Y. (2023) ZITS++: Image Inpainting by Improving the Incremental Transformer on Structural Priors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45, 12667-12684. [Google Scholar] [CrossRef] [PubMed]
[6] Liu, Y., Zhao, Q., Pan, F., Gao, D. and Danzeng, P. (2023) Structure Prior Guided Text Image Inpainting Model. Journal of Image and Graphics, 28, 3699-3712. [Google Scholar] [CrossRef
[7] Sun, J., Xue, F., Li, J., Zhu, L., Zhang, H. and Zhang, J. (2023) TSINIT: A Two-Stage Inpainting Network for Incomplete Text. IEEE Transactions on Multimedia, 25, 5166-5177. [Google Scholar] [CrossRef
[8] Zhu, S., Fang, P., Zhu, C., Zhao, Z., Xu, Q. and Xue, H. (2024) Text Image Inpainting via Global Structure-Guided Diffusion Models. Proceedings of the AAAI Conference on Artificial Intelligence, 38, 7775-7783. [Google Scholar] [CrossRef
[9] Li, H., Du, C., Jiang, Z., Zhang, Y., Ma, J. and Ye, C. (2024) Towards Automated Chinese Ancient Character Restoration: A Diffusion-Based Method with a New Dataset. Proceedings of the AAAI Conference on Artificial Intelligence, 38, 3073-3081. [Google Scholar] [CrossRef
[10] Zhang, Y,. Zhao, L., Wang, Z., et al. (2024) Incomplete Handwritten Dongba Character Image Recognition by Multiscale Feature Restoration. Pattern Recognition, 148, Article ID: 110309.
[11] Luo, Y., Bi, X., Wu, L., et al. (2022) Dongba Pictographs Recognition Based on Improved Residual Learning. CAAI Transactions on Intelligent Systems, 17, 79-87.
[12] Li, M., Lv, T., Chen, J., Cui, L., Lu, Y., Florencio, D., et al. (2023) TrOCR: Transformer-Based Optical Character Recognition with Pre-Trained Models. Proceedings of the AAAI Conference on Artificial Intelligence, 37, 13094-13102. [Google Scholar] [CrossRef
[13] Wang, T., Zhu, Y., Jin, L., Luo, C., Chen, X., Wu, Y., et al. (2020) Decoupled Attention Network for Text Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 12216-12224. [Google Scholar] [CrossRef
[14] Singh, S.S. and Karayev, S. (2021) Full Page Handwriting Recognition via Image to Sequence Extraction. In: Lladós, J., Lopresti, D. and Uchida, S., Eds., Document Analysis and RecognitionICDAR 2021, Springer, 55-69. [Google Scholar] [CrossRef
[15] Finder, S.E., Amoyal, R., Treister, E. and Freifeld, O. (2024) Wavelet Convolutions for Large Receptive Fields. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T. and Varol, G., Eds., Computer VisionECCV 2024, Springer, 363-380. [Google Scholar] [CrossRef
[16] Deng, Y., Hui, S., Zhou, S., Meng, D. and Wang, J. (2022) T-Former: An Efficient Transformer for Image Inpainting. Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, 10-14 October 2022, 6559-6568. [Google Scholar] [CrossRef
[17] Huang, W., Deng, Y., Hui, S., Wu, Y., Zhou, S. and Wang, J. (2024) Sparse Self-Attention Transformer for Image Inpainting. Pattern Recognition, 145, Article ID: 109897. [Google Scholar] [CrossRef
[18] Vaswani, A., Shazeer, N., Parmar, N., et al. (2017) Attention Is All You Need. arXiv: 1706.03762.
[19] Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X. and Huang, T. (2019) Free-Form Image Inpainting with Gated Convolution. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 4470-4479. [Google Scholar] [CrossRef
[20] Hendrycks, D. and Gimpel, K. (2016) Gaussian Error Linear Units (Gelus). arXiv: 1606.08415.
[21] Zeng, Y., Fu, J., Chao, H. and Guo, B. (2023) Aggregated Contextual Transformations for High-Resolution Image Inpainting. IEEE Transactions on Visualization and Computer Graphics, 29, 3266-3280. [Google Scholar] [CrossRef] [PubMed]
[22] Zhu, M., He, D., Li, X., Li, C., Li, F., Liu, X., et al. (2021) Image Inpainting by End-To-End Cascaded Refinement with Mask Awareness. IEEE Transactions on Image Processing, 30, 4855-4866. [Google Scholar] [CrossRef] [PubMed]
[23] Liu, R., Wang, X., Lu, H., Wu, Z., Fan, Q., Li, S., et al. (2021) SCCGAN: Style and Characters Inpainting Based on CGAN. Mobile Networks and Applications, 26, 3-12. [Google Scholar] [CrossRef