基于深度学习的照片主体动态化的视频生成技术研究

doi:10.12677/csa.2026.164109

期刊菜单

基于深度学习的照片主体动态化的视频生成技术研究
Research on Lightweight Video Generation Method for Photo Subject Dynamicization Based on Deep Learning

DOI: 10.12677/csa.2026.164109, PDF,
作者: 张扬：东北林业大学计算机与控制工程学院，黑龙江哈尔滨
关键词: 动态化生成框架；照片主体动态化；视频生成；GAN；光流估计；Mask R-CNN；Dynamic Generation Framework； Photo Subject Dynamicization； Video Generation； GAN； Optical Flow Estimation； Mask R-CNN

摘要: 随着数字内容形式日益丰富，从单张静态照片生成主体动态视频成为社交媒体、广告营销等领域的需求。本文针对该任务中运动可控性与真实性不足的难题，提出一种基于显式光流规划的动态化生成框架，构建“分割–运动蓝图构建–运动执行–时序优化”的端到端流程。该框架将光流视为运动蓝图，生成对抗网络(GAN)作为执行器，融合Mask R-CNN实例分割、RAFT光流估计、GAN与RIFE帧插值等技术，通过光流引导提升运动可控性，并借助光流循环一致性损失增强视觉真实性。实验表明，所提方法能够生成视觉连贯的动态视频，PSNR、SSIM等指标持续优化，且光流误差、运动平滑度等动态指标表现良好。本研究为照片动态化提供了有效的技术路径，并对数字内容创作相关技术研究具有参考价值。

Abstract: With the increasing diversity of digital content formats, generating dynamic videos of subjects from a single static photo has become a practical need in fields such as social media and advertising. To address the challenges of inadequate motion controllability and visual authenticity in this task, this paper proposes a dynamic generation framework based on explicit optical flow planning, which constructs an end-to-end pipeline of “segmentation—motion blueprint construction—motion execution—temporal optimization.” In this framework, optical flow is treated as a motion blueprint and a Generative Adversarial Network (GAN) serves as the executor. Technologies including Mask R-CNN instance segmentation, RAFT optical flow estimation, GAN, and RIFE frame interpolation are integrated, with optical flow guidance enhancing motion controllability and optical flow cycle consistency loss improving visual realism. Experiments demonstrate that the proposed method can generate visually coherent dynamic videos, with continuous improvement in metrics such as PSNR and SSIM, and satisfactory performance in dynamic indicators including optical flow error and motion smoothness. This study provides an effective technical approach for photo animation and offers referential value for digital content creation industries.

文章引用：张扬. 基于深度学习的照片主体动态化的视频生成技术研究[J]. 计算机科学与应用, 2026, 16(4): 56-63. https://doi.org/10.12677/csa.2026.164109

参考文献

[1]	余可. 基于对抗网络的文本引导图像生成方法研究[D]: [硕士学位论文]. 西安: 西安石油大学, 2025.
[2]	He, K., Gkioxari, G., Dollar, P. and Girshick, R. (2017). Mask R-CNN. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017.[CrossRef]
[3]	Teed, Z. and Deng, J. (2020) RAFT: Recurrent All-Pairs Field Transforms for Optical Flow. In: Lecture Notes in Computer Science, Springer International Publishing, 402-419. [Google Scholar] [CrossRef]
[4]	Gulrajani, I., Ahmed, F., Arjovsky, M., et al. (2017) Improved Training of Wasserstein GANs. Advances in Neural Information Processing Systems (NeurIPS). Long Beach, 4-9 December 2017, 5767-5777.
[5]	李雨航, 谢良彬, 董超. 深度学习的二维动画视觉领域修复综述[J]. 计算机科学与探索, 2023, 17(12): 2808-2826.

为你推荐

友情链接