多源特征增益编码的图像修复网络
Image Inpainting Networks with Multi-Source Feature Encoding
DOI: 10.12677/MOS.2024.132111, PDF,   
作者: 王晓红, 徐世豪, 赵 徐, 徐 锟:上海理工大学出版印刷与艺术设计学院,上海
关键词: 图像修复Vision TransformerUnet通道注意力感知风格Image Inpainting Vision Transformer Unet Channel Attention Perceptual Style
摘要: 图像修复是一种利用现有图像信息,对其缺失或损坏部分进行重构的技术。针对当前图像修复方法中存在的结构逻辑不一致性和纹理细节模糊问题,本文基于视觉信息处理原理对修复网络进行设计。在本文中,图像的结构信息首先被解析并传递至处理单元,随后细致的纹理信息被补充,以此逐步构建出对物体的完整视觉认知。通过系统性地编码图像的结构、纹理以及感知特性,构建了多源特征增益的图像修复网络。该网络通过串联ViT (Vision Transformer)和Unet网络,逐级处理全分辨率图像的结构和纹理。为了提升全局关键特征的编码能力,设计了基于通道和稀疏双自注意力的ViT对结构特征进行整合增强,提高图像语义修复能力。采用Unet结构对多源特征进行多尺度融合,并进一步完善修复的细节。此外,还引入了感知风格编码来提高修复效果的感知相似度。通过在Places-365和CelebA-HQ数据集上进行定性实验和常用评价指标的验证,说明了本文方法的优越性。
Abstract: Image inpainting is a technique that utilizes existing image information to effectively reconstruct its missing or damaged parts. In light of the issues of structural inconsistency and blurred texture de-tails present in current image restoration methods, this paper designs a restoration network based on the principles of visual information processing. In our model, the structural information of an image is initially analyzed and transmitted to the processing unit, followed by the supplementa-tion of detailed texture information, thereby gradually building a complete visual perception of the object. By systematically encoding the structure, texture, and perceptual characteristics of the im-age, an image inpainting network with multi-source feature encoding has been developed. The network employs a concatenation of Vision Transformer (ViT) and Unet networks to progressively process the structure and texture of images at full resolution. The ViT, designed based on channel and sparse dual self-attention mechanisms, integrates and amplifies features to augment the global key feature encoding capability, improving the semantic restoration capacity of the encoder. The Unet structure enables multiscale fusion of multisource features and further refinement of image inpainting details. Additionally, perceptual style encoding is introduced to heighten the perceptual similitude of the restoration effect. Qualitative experiments conducted on the Places-365 and Cele-bA-HQ datasets, along with validation using common evaluation metrics, underscore the supe-riority of the proposed method.
文章引用:王晓红, 徐世豪, 赵徐, 徐锟. 多源特征增益编码的图像修复网络[J]. 建模与仿真, 2024, 13(2): 1183-1194. https://doi.org/10.12677/MOS.2024.132111

参考文献

[1] Park J., Jeon I.B., Yoon S.E., et al. (2021) Instant Panoramic Texture Mapping with Semantic Object Matching for Large-Scale Urban Scene Reproduction. IEEE Transactions on Visualization and Computer Graphics, 27, 2746-2756. [Google Scholar] [CrossRef
[2] Bescos B., Neira J., Siegwart R., et al. (2019) Empty Cities: Image Inpainting for a Dynamic-Object-Invariant Space. 2019 International Conference on Robotics and Automation (ICRA), Montreal, 20-24 May 2019, 5460-5466. [Google Scholar] [CrossRef
[3] Ge S., Li C., Zhao S. and Zeng, D. (2020) Occluded Face Recognition in the Wild by Identity-Diversity Inpainting. IEEE Transactions on Circuits and Systems for Video Technology, 30, 3387-3397. [Google Scholar] [CrossRef
[4] Hosen, M.I. and Islam, M.B. (2022) Masked Face Inpainting through Residual Attention UNet. Proceedings 2022 Innovations in Intelligent Systems and Applications Conference (ASYU), Antalya, 7-9 September 2022, 1-5. [Google Scholar] [CrossRef
[5] Ren, H., Zhao, F., Li, Z., et al. (2022) Research on Mural Res-toration Method Based on Generative Multi-Column Transformer. 2022 IEEE 5th Advanced Information Management, Com-municates, Electronic and Automation Control Conference (IMCEC), Chongqing, 16-18 December 2022, 544-548. [Google Scholar] [CrossRef
[6] Xiang, H., Zou, Q., Nawaz, M.A., et al. (2022) Deep Learn-ing for Image Inpainting: A Survey. Pattern Recognition, 134, Article ID: 109046. [Google Scholar] [CrossRef
[7] Yu, J.H., Lin, Z., Yang, J.M., et al. (2018) Generative Image Inpainting with Contextual Attention. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 5505-5514 [Google Scholar] [CrossRef
[8] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., et al. (2014) Generative Adversarial Nets. arXiv: 1406.2661.
[9] Rares, A., Reinders, M.J.T. and Biemond, J. (2005) Edge-Based Image Restoration. IEEE Transactions on Image Processing, 14, 1454-1468. [Google Scholar] [CrossRef
[10] Qin, J., Bai, H. and Zhao, Y. (2021) Multi-Scale Attention Network for Image Inpainting. Computer Vision and Image Understanding, 204, Article ID: 103155. [Google Scholar] [CrossRef
[11] Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. Lecture Notes in Computer Science, 9351, 234-241. [Google Scholar] [CrossRef
[12] YanZ., Li X., Li M., et al. (2018) Shift-Net: Image Inpainting Via Deep Feature Rearrangement. Lecture Notes in Computer Science, 11218, 3-19. [Google Scholar] [CrossRef
[13] Liu, G., Reda, F.A., Shih, K.J., et al. (2018) Image Inpainting for Ir-regular Holes Using Partial Convolutions. Lecture Notes in Computer Science, 11215, 89-105. [Google Scholar] [CrossRef
[14] Yu, J., Lin, Z., Yang, J., et al. (2019) Free-Form Image Inpainting with Gated Convolution. Proceedings of the IEEE International Conference on Computer Vision, Seoul, 27 October-2 Novem-ber 2019, 4470-4479. [Google Scholar] [CrossRef
[15] Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2021) An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. ICLR 2021-9th International Conference on Learning Rep-resentations, Vienna, 3-7 May 2021.
[16] Dong, Q., Cao, C. and Fu, Y. (2022) Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 11348-11358. [Google Scholar] [CrossRef
[17] Wan, Z., Zhang, J., Chen, D., et al. (2021) High-Fidelity Pluralistic Image Completion with Transformers. Proceedings of the IEEE In-ternational Conference on Computer Vision, Montreal, 10-17 October 2021, 4672-4681. [Google Scholar] [CrossRef
[18] Li, W., Lin, Z., Zhou, K., et al. (2022) MAT: Mask-Aware Trans-former for Large Hole Image Inpainting. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pat-tern Recognition, New Orleans, 18-24 June 2022, 10748-10758. [Google Scholar] [CrossRef
[19] Calvetti, D., Sgallari, F. and Somersalo, E. (2006) Image Inpainting with Structural Bootstrap Priors. Image and Vision Computing, 24, 782-793. [Google Scholar] [CrossRef
[20] Song, Y., Yang, C., Lin, Z., et al. (2018) Contextual-Based Image Inpainting: Infer, Match, and Translate. Lecture Notes in Computer Science, 11206, 3-18. [Google Scholar] [CrossRef
[21] Quan, W., Zhang, R., Zhang, Y., et al. (2022) Image Inpainting with Local and Global Refinement. IEEE Transactions on Image Processing, 31, 2405-2420. [Google Scholar] [CrossRef
[22] Guo, X., Yang, H. and Huang, D. (2021) Image Inpainting via Condition-al Texture and Structure Dual Generation. Proceedings of the IEEE International Conference on Computer Vision, Montreal, 10-17 October 2021, 14114-14123. [Google Scholar] [CrossRef
[23] Wang, W., Zhang, J., Niu, L., et al. (2021) Parallel Multi-Resolution Fusion Network for Image Inpainting. 2011 IEEE/CVF International Conference on Computer Vision, Montreal, 10-17 October 2021, 14539-14548. [Google Scholar] [CrossRef
[24] Karras, T., Laine, S., Aittala, M., et al. (2020) Analyzing and Im-proving the Image Quality of Stylegan. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Seattle, 13-19 June 2020, 8107-8116. [Google Scholar] [CrossRef
[25] Nazeri, K., Ng, E., Joseph, T., et al. (2019) EdgeConnect: Structure Guided Image Inpainting Using Edge Prediction. Proceedings of 2019 In-ternational Conference on Computer Vision Workshop, Seoul, 27-28 October 2019, 3265-3274. [Google Scholar] [CrossRef
[26] Wang, Q., Wu, B., Zhu, P., et al. (2020) ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. Proceedings of the IEEE Computer Society Conference on Computer Vi-sion and Pattern Recognition, Seattle, 13-19 June 2020, 11531-11539. [Google Scholar] [CrossRef
[27] Huang, X. and Belongie S. (2017) Arbitrary Style Transfer in Re-al-Time with Adaptive Instance Normalization. Proceedings of the IEEE International Conference on Computer Vision, Venice, 22-29 October 2017, 1510-1519. [Google Scholar] [CrossRef
[28] Zhao, H., Gallo, O., Frosio, I., et al. (2017) Loss Functions for Image Res-toration with Neural Networks. IEEE Transactions on Computational Imaging, 3, 47-57. [Google Scholar] [CrossRef
[29] Johnson, J., Alahi, A. and Fei-Fei, L. (2016) Perceptual Losses for Re-al-Time Style Transfer and Super-Resolution. Lecture Notes in Computer Science, 9906, 694-711. [Google Scholar] [CrossRef
[30] Zhou, B., Lapedriza, A., Khosla, A., et al. (2018) Places: A 10 Mil-lion Image Database for Scene Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 1452-1464. [Google Scholar] [CrossRef
[31] Liu, Z., Luo, P., Wang, X. and Tang, X.O. (2015) Deep Learning Face Attributes in the Wild. Proceedings of the IEEE International Conference on Computer Vision, Santiago, 7-13 December 2015, 3730-3738. [Google Scholar] [CrossRef
[32] Kingma, D.P. and Ba, J.L. (2015) Adam: A Method for Stochastic Optimiza-tion. 3rd International Conference on Learning Representations, San Diego, 7-9 May 2015.
[33] Heusel, M., Ramsauer, H., Unterthiner, T., et al. (2017) GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. Pro-ceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 6629-6640.
[34] Zamir, S.W., Arora, A., Khan, S., et al. (2022) Restormer: Efficient Transformer for High-Resolution Image Restoration. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, New Orleans, 18-24 June 2022, 5718-5729. [Google Scholar] [CrossRef