语义一致性大孔洞图像空间频率修复方法
Semantic Consistency Large Mask Image Inpainting in Spatial Frequency
摘要: 目的:解决大孔洞图像难以修复且修复过程中语义信息、感受野利用不足,导致修复后的孔洞区域与背景之间出现结构、纹理、风格不一致的问题。方法:提出语义一致性大孔洞图像空间频率修复方法,多头注意力双向自回归模型能更好提取上下文结构信息,得到具有上下文语义一致性的低分辨率修复结果,快速傅里叶卷积具有全局感受野,并且能从空间频率角度上处理图像,获得细节纹理丰富的修复结果。结果:在Places365-Standard数据集上,将文中方法与其他图像修复方法进行了对比实验,经过测试,各类指标均得到明显改善,弗雷歇初始距离下降了6.21,结构相似性提高了7%,峰值信噪比提高了7.4%,学习感知图像块相似度降低了13.3%。结论:语义一致性大孔洞图像空间频率修复方法不仅能保持上下文结构一致,同时保证细节纹理丰富、细腻,得到具有整体一致性的图像修复结果。
Abstract: To solve the difficulty in inpainting large-hole images and the insufficient utilization of semantic information and receptive field during the inpainting process, resulting in structural, textural, and stylistic inconsistencies between the inpainting hole area and the background. This paper proposes a Semantic Consistency Large Mask Image Inpainting in Spatial Frequency. The multi-head attention bidirectional autoregressive model can better extract contextual structural information, obtaining low-resolution inpainting results with contextual semantic consistency. Fast Fourier convolution has a global receptive field and can process images from the spatial frequency perspective to obtain detailed and texture-rich inpainting results. On the Places365-Standard dataset, the method proposed in this paper was compared with other image inpainting methods through experimental tests. The various indicators showed significant improvements, with a 6.21 decrease in Fréchet Inception Distance, a 7% increase in structural similarity, a 7.4% increase in peak signal-to-noise ratio, and a 13.3% reduction in learned perceptual image patch similarity. The spatial frequency image inpainting method for large-hole images with semantic consistency not only maintains consistent contextual structure but also ensures rich and delicate detailed textures, obtaining image inpainting results with overall consistency.
文章引用:刘倩倩, 孙刘杰, 王文举. 语义一致性大孔洞图像空间频率修复方法[J]. 建模与仿真, 2024, 13(4): 4976-4986. https://doi.org/10.12677/mos.2024.134450

参考文献

[1] Criminisi, A., Pérez, P. and Toyama, K. (2003) Object Removal by Exemplar-Based Inpainting. 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2003), Madison, 16-22 June 2003, 2. [Google Scholar] [CrossRef
[2] Criminisi, A., Perez, P. and Toyama, K. (2004) Region Filling and Object Removal by Exemplar-Based Image Inpainting. IEEE Transactions on Image Processing, 13, 1200-1212. [Google Scholar] [CrossRef] [PubMed]
[3] Efros, A.A. and Leung, T.K. (1999) Texture Synthesis by Non-Parametric Sampling. Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, 20-27 September 1999, 1033-1038. [Google Scholar] [CrossRef
[4] Bertalmio, M., Sapiro, G., Caselles, V. and Ballester, C. (2000) Image Inpainting. Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, New Orleans, 23-28 July 2000, 417-424. [Google Scholar] [CrossRef
[5] Ballester, C., Bertalmio, M., Caselles, V., Sapiro, G. and Verdera, J. (2001) Filling-in by Joint Interpolation of Vector Fields and Gray Levels. IEEE Transactions on Image Processing, 10, 1200-1211. [Google Scholar] [CrossRef] [PubMed]
[6] 周密, 彭进业, 赵健, 等. 改进的整体变分法在图像修复中的应用[J]. 计算机工程与应用, 2007, 43(27): 88-90.
[7] Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T. and Efros, A.A. (2016) Context Encoders: Feature Learning by Inpainting. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 2536-2544. [Google Scholar] [CrossRef
[8] Liu, G., Reda, F.A., Shih, K.J., Wang, T., Tao, A. and Catanzaro, B. (2018) Image Inpainting for Irregular Holes Using Partial Convolutions. In: Computer Vision-ECCV 2018, Munich, 8-14 September 2018, 89-105. [Google Scholar] [CrossRef
[9] Liu, H., Jiang, B., Xiao, Y. and Yang, C. (2019) Coherent Semantic Attention for Image Inpainting. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 4169-4178. [Google Scholar] [CrossRef
[10] Yi, Z., Tang, Q., Azizi, S., Jang, D. and Xu, Z. (2020) Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 7505-7514. [Google Scholar] [CrossRef
[11] Gulrajani, I., Ahmed, F., Arjovsky, M., et al. (2017) Improved Training of Wasserstein GANs. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 5769-5779.
[12] Mao, X., Li, Q., Xie, H., Lau, R.Y.K., Wang, Z. and Smolley, S.P. (2017) Least Squares Generative Adversarial Networks. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 2813-2821. [Google Scholar] [CrossRef
[13] Isola, P., Zhu, J., Zhou, T. and Efros, A.A. (2017) Image-to-image Translation with Conditional Adversarial Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 5967-5976. [Google Scholar] [CrossRef
[14] Wan, Z., Zhang, J., Chen, D. and Liao, J. (2021) High-fidelity Pluralistic Image Completion with Transformers. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 4672-4681. [Google Scholar] [CrossRef
[15] Zheng, C., Cham, T., Cai, J. and Phung, D. (2022) Bridging Global Context Interactions for High-Fidelity Image Completion. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 11502-11512. [Google Scholar] [CrossRef
[16] Li, W., Lin, Z., Zhou, K., et al. (2022) MAT: Mask-Aware Transformer for Large Hole Image Inpainting. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, 18-24 June 2022, 10758-10768. [Google Scholar] [CrossRef
[17] Lugmayr, A., Danelljan, M., Romero, A., Yu, F., Timofte, R. and Van Gool, L. (2022) RePaint: Inpainting Using Denoising Diffusion Probabilistic Models. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 11451-11461. [Google Scholar] [CrossRef
[18] Kingma, D.P. and Welling, M. (2013) Auto-Encoding Variational Bayes. arXiv: 1312.6114. [Google Scholar] [CrossRef
[19] Peng, J., Liu, D., Xu, S. and Li, H. (2021) Generating Diverse Structure for Image Inpainting with Hierarchical VQ-VAE. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 10770-10779. [Google Scholar] [CrossRef
[20] Wan, Z., Zhang, J., Chen, D. and Liao, J. (2021) High-Fidelity Pluralistic Image Completion with Transformers. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 4672-4681. [Google Scholar] [CrossRef
[21] Devlin, J., Chang, M.W., Lee, K., et al. (2018) BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arxiv: 1810.04805. [Google Scholar] [CrossRef
[22] Vaswani, A., Shazeer, N., Parmar, N., et al. (2017) Attention Is All You Need. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 600-6010.
[23] Chi, L., Jiang, B. and Mu, Y. (2020) Fast Fourier Convolution. Advances in Neural Information Processing Systems, 33, 4479-4488.
[24] Wang, X., Yu, K., Dong, C. and Change Loy, C. (2018) Recovering Realistic Texture in Image Super-Resolution by Deep Spatial Feature Transform. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 606-615. [Google Scholar] [CrossRef
[25] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al. (2020) Generative Adversarial Networks. Communications of the ACM, 63, 139-144. [Google Scholar] [CrossRef
[26] Johnson, J., Alahi, A. and Fei-Fei, L. (2016) Perceptual Losses for Real-Time Style Transfer and Super-Resolution. Computer Vision-ECCV 2016, Amsterdam, 11-14 October 2016, 694-711. [Google Scholar] [CrossRef
[27] Gatys, L.A., Ecker, A.S. and Bethge, M. (2016) Image Style Transfer Using Convolutional Neural Networks. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 2414-2423. [Google Scholar] [CrossRef
[28] Zhou, B., Lapedriza, A., Khosla, A., Oliva, A. and Torralba, A. (2018) Places: A 10 Million Image Database for Scene Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 1452-1464. [Google Scholar] [CrossRef] [PubMed]
[29] Karras, T., Aila, T., Laine, S., et al. (2017) Progressive Growing of GANs for Improved Quality, Stability, and Variation. arXiv: 1710.10196. [Google Scholar] [CrossRef
[30] Kingma, D.P. and Ba, J. (2014) Adam: A Method for Stochastic Optimization. arXiv: 1412.6980. [Google Scholar] [CrossRef
[31] Heusel, M., Ramsauer, H., Unterthiner, T., et al. (2017) GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 6629-6640.
[32] Wang, Z., Bovik, A.C., Sheikh, H.R. and Simoncelli, E.P. (2004) Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Transactions on Image Processing, 13, 600-612. [Google Scholar] [CrossRef] [PubMed]
[33] Korhonen, J. and You, J. (2012) Peak Signal-to-Noise Ratio Revisited: Is Simple Beautiful? 2012 Fourth International Workshop on Quality of Multimedia Experience, Melbourne, 5-7 July 2012, 37-38. [Google Scholar] [CrossRef
[34] Zhang, R., Isola, P., Efros, A.A., Shechtman, E. and Wang, O. (2018) The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 586-595. [Google Scholar] [CrossRef