动态注意力语义特征增强的人脸风格迁移方法研究
Research on Facial Style Transfer Method for Dynamic Attention Semantic Feature Enhancement
DOI: 10.12677/mos.2024.134417, PDF,    科研立项经费支持
作者: 潘海燕, 孙刘杰, 王文举:上海理工大学出版学院,上海
关键词: 风格迁移动态注意力归一化傅里叶卷积Style Transfer Dynamic Attention Normalized Fourier Convolution
摘要: 为了解决基于StyleGAN2网络的人脸风格迁移算法存在的风格迁移效果不明显、不准确,局部纹理不合理等问题。本文提出动态注意力语义特征增强的人脸风格迁移方法。本文提出归一化傅里叶卷积将人脸特征转化到频域中进行处理,使网络更好地捕捉图像纹理、细节和结构等频率信息,提出了一个新的动态注意力结构控制块使生成器更高效地学习风格图像的结构特征并较好保留原始人脸特征;此外,还引入色彩一致性损失函数,该损失函数可以有效保持生成图像和风格图像的颜色相似性。实验结果表明,本文提出的网络在定性和定量实验研究中与最先进的方法相比都取得了最好得分。本文方法在保留内容图像语义信息和细节特征的同时有效学习了风格图像的特征,更好地实现了人脸风格迁移准确性和艺术性。
Abstract: To address the issues of unclear and inaccurate style transfer effects, as well as unreasonable local textures in the StyleGAN2-based face style transfer algorithm, this paper introduces a dynamic attention semantic feature enhancement approach. The normalization Fourier convolution is proposed to process facial features in the frequency domain, allowing the network to more effectively capture image texture, detail, and structural frequencies. A new dynamic attention control block is introduced to enable the generator to more efficiently learn the structural features of style images and better preserve the original facial features. Additionally, a color consistency loss function is introduced to ensure the color similarity between the generated and style images. Experimental results demonstrate that our network achieves the highest scores in both qualitative and quantitative evaluations compared to the leading methods. This approach effectively learns the features of style images while preserving the semantic information and detailed features of content images, enhancing the accuracy and artistry of face style transfer.
文章引用:潘海燕, 孙刘杰, 王文举. 动态注意力语义特征增强的人脸风格迁移方法研究[J]. 建模与仿真, 2024, 13(4): 4602-4613. https://doi.org/10.12677/mos.2024.134417

参考文献

[1] Gooch, B. and Gooch, A. (2001) Non-Photorealistic Rendering. CRC Press.
[2] Strothotte, T. and Schlechtweg, S. (2002) Non-Photorealistic Computer Graphics: Modeling, Rendering, and Animation. Morgan Kaufmann.
[3] Rosin, P. and Collomosse, J. (2013) Image and Video-Based Artistic Stylisation. Springer.
[4] Efros, A.A. and Freeman, W.T. (2023) Image Quilting for Texture Synthesis and Transfer. Seminal Graphics Papers: Pushing the Boundaries, Volume 2, 571-576. [Google Scholar] [CrossRef
[5] IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (2017) Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2017, July 21, 2017-July 26, 2017, Honolulu, HI, United States.
[6] Frigo, O., Sabater, N., Delon, J. and Hellier, P. (2016) Split and Match: Example-Based Adaptive Patch Sampling for Unsupervised Style Transfer. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 553-561. [Google Scholar] [CrossRef
[7] Gatys, L., Ecker, A. and Bethge, M. (2016) A Neural Algorithm of Artistic Style. Journal of Vision, 16, 326. [Google Scholar] [CrossRef
[8] Simonyan, K. and Zisserman, A. (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition.
[9] Johnson, J., Alahi, A. and Li, F.-F. (2016) Perceptual Losses for Real-Time Style Transfer and Super-Resolution. Computer VisionECCV 2016, Amsterdam, 11-14 October 2016, 694-711. [Google Scholar] [CrossRef
[10] Karras, T., Laine, S. and Aila, T. (2019) A Style-Based Generator Architecture for Generative Adversarial Networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 4396-4405. [Google Scholar] [CrossRef
[11] Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J. and Aila, T. (2020) Analyzing and Improving the Image Quality of Stylegan. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 8107-8116. [Google Scholar] [CrossRef
[12] Yang, S., Jiang, L., Liu, Z. and Loy, C.C. (2022) Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 7683-7692. [Google Scholar] [CrossRef
[13] Chi, L., Jiang, B. and Mu, Y. (2020) Fast Fourier Convolution. Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, 6-12 December 2020, 4479-4488.
[14] Misra, D., Nalamada, T., Arasanipalai, A.U. and Hou, Q. (2021) Rotate to Attend: Convolutional Triplet Attention Module. 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, 3-8 January 2021, 3138-3147. [Google Scholar] [CrossRef
[15] He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef
[16] Mechrez, R., Talmi, I. and Zelnik-Manor, L. (2018) The Contextual Loss for Image Transformation with Non-Aligned Data. Computer VisionECCV 2018, Munich, 8-14 September 2018, 800-815. [Google Scholar] [CrossRef
[17] Huang, X. and Belongie, S. (2017) Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 1510-1519. [Google Scholar] [CrossRef
[18] Deng, J., Guo, J., Xue, N. and Zafeiriou, S. (2019) Arcface: Additive Angular Margin Loss for Deep Face Recognition. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 4685-4694. [Google Scholar] [CrossRef
[19] Huo, J., Gao, Y., Shi, Y. and Yin, H. (2017) Variation Robust Cross-Modal Metric Learning for Caricature Recognition. Proceedings of the on Thematic Workshops of ACM Multimedia 2017, Mountain View, 23-27 October 2017, 340-348. [Google Scholar] [CrossRef
[20] Huo, J., Li, W., Shi, Y., et al. (2017) Webcaricature: A Benchmark for Caricature Recognition.
[21] Karras, T., Aila, T., Laine, S., et al. (2017) Progressive Growing of Gans for Improved Quality, Stability, and Variation.
[22] Liu, Z., Luo, P., Wang, X. and Tang, X. (2015) Deep Learning Face Attributes in the Wild. 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 7-13 December 2015, 3730-3738. [Google Scholar] [CrossRef
[23] Gu, Z., Dong, C., Huo, J., Li, W. and Gao, Y. (2022) Carime: Unpaired Caricature Generation with Multiple Exaggerations. IEEE Transactions on Multimedia, 24, 2673-2686. [Google Scholar] [CrossRef
[24] Men, Y., Yao, Y., Cui, M., Lian, Z. and Xie, X. (2022) DCT-Net: Domain-Calibrated Translation for Portrait Stylization. ACM Transactions on Graphics, 41, 1-9. [Google Scholar] [CrossRef
[25] Zhang, C., Xu, X., Wang, L., Dai, Z. and Yang, J. (2024) S2WAT: Image Style Transfer via Hierarchical Vision Transformer Using Strips Window Attention. Proceedings of the AAAI Conference on Artificial Intelligence, 38, 7024-7032. [Google Scholar] [CrossRef
[26] Pinkney, J.N.M. and Adler, D. (2020) Resolution Dependent GAN Interpolation for Controllable Image Synthesis between Domains.