基于改进StyleGAN的高分辨率可控肖像视频风格迁移网络

doi:10.12677/mos.2024.134415

期刊菜单

基于改进StyleGAN的高分辨率可控肖像视频风格迁移网络
High Resolution Controllable Portrait Video Style Transfer Network Based on Improved StyleGAN

DOI: 10.12677/mos.2024.134415, PDF,
作者: 钱洋洋：上海理工大学光电信息与计算机工程学院，上海
关键词: StyleGAN；肖像风格迁移；视频风格迁移；时间一致性建模；StyleGAN； Portrait Style Transfer； Video Style Transfer； Temporal Consistency

摘要: 肖像风格迁移是计算机视觉和图形学的一个重要领域。然而，当前很多肖像风格迁移算法在很大程度上未能捕捉到不同肖像风格的重要几何依赖。因为肖像风格迁移需要更注重特征的细化和风格的融合。与此同时，数据的稀缺性也是风格化的挑战之一，同时面向图像的方法在应用于视频时也会存在闪烁伪影等缺陷。针对肖像风格迁移，本文提出了基于改进StyleGAN的高分辨率可控肖像视频风格迁移算法HcpGAN (style transfer network for High-resolution Controllable Portrait video based on StyleGAN)。具体来说，HcpGAN由生成器和鉴别器组成，生成器采用内外双支路风格路径网络结构进行肖像风格迁移，分层式网络结构可以对风格程度进行可控与微调，通过扩张卷积对生成器第一层特征模块进行微调，解除了肖像输入固定裁剪限制。与此同时，在生成器的尾部集成处理视频帧的特征扭曲层，不使用额外的网络和光流预测，通过引入特征扭曲层直接对视频帧的时间一致性信息进行建模，从而输出时间序列平滑的风格化视频。在公开数据集上对比试验和消融实验显示，HcpGAN在当前肖像风格迁移算法中处于先进水平。

Abstract: Portrait style transfer is an important field in computer vision and graphics. However, many current algorithms largely fail to capture the important geometric dependencies of different portrait styles. Because the transfer of portrait style needs to pay more attention to the refinement of features and the fusion of styles. At the same time, the scarcity of data is also one of the challenges of stylization, and image style transfer methods also have defects such as flicker artifacts when applied to video. For the transfer of portrait style, this paper proposed a High-resolution Controllable Portrait video style transfer network based on StyleGAN (HcpGAN). Specifically, HcpGAN consists of a generator and discriminator. The generator adopts an internal and external double-branch style path network structure to carry out portrait style migration. The hierarchical network structure can control and fine-tune the style degree, and fine-tune the first-layer feature module of the generator by expanding convolution, which eliminates the restriction of fixed portrait input clipping. At the same time, the feature distortion layer of the video frame is integrated in the tail of the generator, and the temporal consistency of the video frames is directly modeled by introducing the feature distortion layer without using additional network and optical flow prediction, so as to output the time series smooth stylized video. Comparison and ablation experiments on open datasets show that HcpGAN is at an advanced level in the current portrait style transfer.

文章引用：钱洋洋. 基于改进StyleGAN的高分辨率可控肖像视频风格迁移网络[J]. 建模与仿真, 2024, 13(4): 4577-4590. https://doi.org/10.12677/mos.2024.134415

参考文献

[1]	Gatys, L.A., Ecker, A.S. and Bethge, M. (2016) Image Style Transfer Using Convolutional Neural Networks. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 2414-2423. [Google Scholar] [CrossRef]
[2]	Selim, A., Elgharib, M. and Doyle, L. (2016) Painting Style Transfer for Head Portraits Using Convolutional Neural Networks. ACM Transactions on Graphics, 35, 1-18. [Google Scholar] [CrossRef]
[3]	Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al. (2014) Generative Adversarial Networks. Advances in Neural Information Processing Systems, 3. https://www.researchgate.net/publication/263012109
[4]	Isola, P., Zhu, J., Zhou, T. and Efros, A.A. (2017) Image-to-image Translation with Conditional Adversarial Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 5967-5976. [Google Scholar] [CrossRef]
[5]	Wang, T., Liu, M., Zhu, J., Tao, A., Kautz, J. and Catanzaro, B. (2018) High-Resolution Image Synthesis and Semantic Manipulation with Conditional Gans. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 8798-8807. [Google Scholar] [CrossRef]
[6]	Chen, J., Liu, G. and Chen, X. (2020) Animegan: A Novel Lightweight GAN for Photo Animation. In: Li, K., Li, W., Wang, H. and Liu, Y., Eds., Communications in Computer and Information Science, Springer, 242-256. [Google Scholar] [CrossRef]
[7]	Cao, K., Liao, J. and Yuan, L. (2018) CariGANs: Unpaired Photo-to-Caricature Translation. ACM Transactions on Graphics, 37, 1-14. [Google Scholar] [CrossRef]
[8]	Zhu, J., Park, T., Isola, P. and Efros, A.A. (2017) Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 2242-2251. [Google Scholar] [CrossRef]
[9]	Kim, J., Kim, M., Kang, H., et al. (2019) U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation. arXiv: 190710830.
[10]	Li, B., Zhu, Y., Wang, Y., Lin, C., Ghanem, B. and Shen, L. (2022) AniGAN: Style-Guided Generative Adversarial Networks for Unsupervised Anime Face Generation. IEEE Transactions on Multimedia, 24, 4077-4091. [Google Scholar] [CrossRef]
[11]	Chong, M.J. and Forsyth, D. (2021) GANs N’Roses: Stable, Controllable, Diverse Image to Image Translation (Works for Videos Too!) arXiv: 210606561.
[12]	Karras, T., Laine, S. and Aila, T. (2019) A Style-Based Generator Architecture for Generative Adversarial Networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 4396-4405. [Google Scholar] [CrossRef]
[13]	Pinkney, J.N. and Adler, D. (2020) Resolution Dependent GAN Interpolation for Controllable Image Synthesis between Domains. arXiv: 201005334.
[14]	Song, G., Luo, L., Liu, J., Ma, W., Lai, C., Zheng, C., et al. (2021) AgileGAN: Stylizing Portraits by Inversion-Con-sistent Transfer Learning. ACM Transactions on Graphics, 40, 1-13. [Google Scholar] [CrossRef]
[15]	Men, Y., Yao, Y., Cui, M., Lian, Z. and Xie, X. (2022) DCT-Net: Domain-Calibrated Translation for Portrait Stylization. ACM Transactions on Graphics, 41, 1-9. [Google Scholar] [CrossRef]
[16]	Yang, S., Jiang, L., Liu, Z. and Loy, C.C. (2022) Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 7683-7692. [Google Scholar] [CrossRef]
[17]	Radford, A., Kim, J.W., Hallacy, C., et al. (2021) Learning Transferable Visual Models from Natural Language Supervision. arXiv: 2103.0002.
[18]	Fox, G., Tewari, A., Elgharib, M., et al. (2021) Stylevideogan: A Temporal Generative Model Using a Pretrained StyleGAN. arXiv: 210707224.
[19]	Yao, X., Newson, A., Gousseau, Y. and Hellier, P. (2021) A Latent Transformer for Disentangled Face Editing in Images and Videos. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 13769-13778. [Google Scholar] [CrossRef]
[20]	Tzaban, R., Mokady, R., Gal, R., Bermano, A. and Cohen-Or, D. (2022) Stitch It in Time: GAN-Based Facial Editing of Real Videos. SIGGRAPH Asia 2022 Conference Papers, Daegu, 6-9 December 2022, 1-9. [Google Scholar] [CrossRef]
[21]	Zhao, H., Shi, J., Qi, X., Wang, X. and Jia, J. (2017) Pyramid Scene Parsing Network. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 6230-6239. [Google Scholar] [CrossRef]
[22]	Ulyanov, D., Vedaldi, A. and Lempitsky, V. (2016) Instance Normalization: The Missing Ingredient for Fast Stylization. arXiv: 160708022.
[23]	Huo, J., Li, W., Shi, Y., et al. (2017) WebCaricature: A Benchmark for Caricature Recognition. arXiv: 170303230.
[24]	Branwen, G. Anonymous, and The Danbooru Community (2019) Danbooru2019 Portraits: A Large-Scale Anime Head Illustration Dataset.
[25]	Mechrez, R., Talmi, I. and Zelnik-Manor, L. (2018) The Contextual Loss for Image Transformation with Non-Aligned Data. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., Computer Vision—ECCV 2018, Springer International Publishing, 800-815. [Google Scholar] [CrossRef]
[26]	Huang, X. and Belongie, S. (2017) Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 1510-1519. [Google Scholar] [CrossRef]
[27]	Deng, J., Guo, J., Xue, N. and Zafeiriou, S. (2019) ArcFace: Additive Angular Margin Loss for Deep Face Recognition. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 4685-4694. [Google Scholar] [CrossRef]
[28]	Yang, S., Jiang, L., Liu, Z. and Loy, C.C. (2022) VToonify: Controllable High-Resolution Portrait Video Style Transfer. ACM Transactions on Graphics, 41, 1-15. [Google Scholar] [CrossRef]

为你推荐

友情链接