基于CSWin-Transformer和WGAN技术的人脸遮挡修复研究
Study on Face Occlusion Repair Based on CSWin-Transformer and WGAN Techniques
DOI: 10.12677/jisp.2025.142027, PDF,    科研立项经费支持
作者: 黄施豪, 金 钊:云南大学信息学院,云南 昆明
关键词: 图像识别人脸修复生成对抗网络WGANCSWin-TransformerImage Recognition Face Repair Generative Adversarial Network WGAN CSWin-Transformer
摘要: 针对于当前人脸遮挡修复方法中出现修复图像信息不完整、纹理模糊、产生伪影、细节欠佳以及模型训练不稳定等问题,提出一种基于CSWin-Transformer和WGAN的人脸遮挡修复方法。该方法以Encoder-Decoder结构作为生成器,在生成器中引入CSWin-Transformer Block来精细识别和处理被遮挡的面部区域,以提高处理的针对性和效率,解码器通过跳跃连接与编码器多尺度特征融合,更好学习图像的细节特征,优化最终效果。在判别器中引入Wasserstein距离,来提高模型训练稳定性以及生成图像的真实性,同时在判别器中引入CSWinSelf-Attention,增强判别器对图像全局结构和细节信息的理解。实验结果显示,文章方法在所使用的CelebA的数据集上有良好的修复效果,在峰值信噪比(PSNR)和结构相似性指数(SSIM)指标上与目前一些图像修复方法相比表现更优。
Abstract: In view of the problems of incomplete repair image information, blurred texture, artifacts, poor details and unstable model training, a face occlusion repair method based on CSWin-Transformer and WGAN is proposed. This method takes Encoder-Decoder, structure as the generator, and introduces CSWin-Transformer Block in the generator to finely identify and process the occluded face areas, so as to improve the pertinacity and efficiency of processing. The decoder integrates with the encoder multi-scale features through jump connection to better learn the detailed features of the image and optimize the final effect. The Wasserstein distance is introduced into the discriminator to improve the stability of the model training and the authenticity of the generated image. Meanwhile, CSWin Self-Attention is introduced in the discriminator to enhance the understanding of the global structure and details of the image. The experimental results show that the method has good repair effect on the data set of CelebA used, and better than some current image repair methods in peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) indicators.
文章引用:黄施豪, 金钊. 基于CSWin-Transformer和WGAN技术的人脸遮挡修复研究 [J]. 图像与信号处理, 2025, 14(2): 299-309. https://doi.org/10.12677/jisp.2025.142027

参考文献

[1] 石雪梅, 朱克亮, 张祥民, 张树涛, 陈良锋. 基于生成对抗网络的有遮挡人脸修复方法[J]. 数据与计算发展前沿, 2022, 4(4): 123-131.
[2] Chen, D. and Hashimoto, T. (2004) Transmit Diversity Schemes for an Overloaded Space-Time Spreading System over a Flat Rayleigh Fading Channel. 2004 IEEE 15th International Symposium on Personal, Indoor and Mobile Radio Communications, Barcelona, 5-8 September 2004, 2664-2668.
[3] Barakova, E.I. and Lourens, T. (2004) Novelty Gated Episodic Memory Formation for Robot Exploration. 2004 2nd International IEEE Conference onIntelligent Systems’, Varna, 22-24 June 2004, 116-121. [Google Scholar] [CrossRef
[4] Simonyan, K. and Zisserman, A. (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556.
[5] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A.C. and Bengio, Y. (2014) Generative Adversarial Nets. In Neural Information Processing Systems, arXiv:1406.2661.
[6] Radford, A., Metz, L. and Chintala, S. (2015) Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv:1511.06434.
[7] Mirza, M. and Osindero, S. (2014) Conditional Generative Adversarial Nets. arXiv:1411.1784.
[8] Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T. and Efros, A.A. (2016) Context Encoders: Feature Learning by Inpainting. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 2536-2544. [Google Scholar] [CrossRef
[9] Li, Y., Liu, S., Yang, J. and Yang, M. (2017) Generative Face Completion. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 5892-5900. [Google Scholar] [CrossRef
[10] 刘波宁, 翟东海. 基于双鉴别网络的生成对抗网络图像修复方法[J]. 计算机应用, 2018, 38(12): 3557-3562, 3595.
[11] Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X. and Huang, T.S. (2018) Generative Image Inpainting with Contextual Attention. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 5505-5514. [Google Scholar] [CrossRef
[12] Dong, X., Bao, J., Chen, D., Zhang, W., Yu, N., Yuan, L., et al. (2022) Cswin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 12114-12124. [Google Scholar] [CrossRef
[13] Arjovsky, M. and Bottou, L. (2017) Towards Principled Methods for Training Generative Adversarial Networks. ArXiv abs/1701.04862.
[14] Arjovsky, M., Chintala, S. and Bottou, L. (2017) Wasserstein Gan. ArXiv abs/1701.07875.
[15] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q. and Wang, M. (2021) Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation. In ECCV Workshops. arXiv:2105.05537.
[16] Taigman, Y., Yang, M., Ranzato, M. and Wolf, L. (2015) Web-Scale Training for Face Identification. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 7-12 June 2015, 2746-2754. [Google Scholar] [CrossRef
[17] Gohring, M., Shulman, H. and Waidner, M. (2018) Path MTU Discovery Considered Harmful. 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), Vienna, 2-6 July 2018, 866-874. [Google Scholar] [CrossRef
[18] Sankur, B. (2002) Statistical Evaluation of Image Quality Measures. Journal of Electronic Imaging, 11, 206-213. [Google Scholar] [CrossRef
[19] Wang, Z., Bovik, A.C., Sheikh, H.R. and Simoncelli, E.P. (2004) Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Transactions on Image Processing, 13, 600-612. [Google Scholar] [CrossRef] [PubMed]