文本生成图像技术的控制机制与研究综述
Control Mechanisms and Research Review of Text-to-Image Generation Technology
摘要: 本文综述文本生成图像技术的控制机制与研究进展。文生图技术在计算机视觉与自然语言处理交叉领域意义重大,随GAN、VAE、Transformer等技术发展而进步,且在多行业有广泛应用前景。文中详细阐述图像生成控制机制,包括纯文本控制(GAN、VAE、扩散模型)和多模态控制(草图、语音、布局与文本融合),介绍了IS、FID、CLIP Score等图像质量评价指标。同时指出当前技术存在语义一致性缺失、多模态控制协同性与易用性失衡等挑战,最后展望未来技术发展方向。
Abstract: This article reviews the control mechanism and research progress of text generated image technology. This technology is of great significance in the intersection of computer vision and natural language processing, advancing with the development of GAN, VAE, Transformer and other technologies, and has broad application prospects in multiple industries. The article elaborates on the image generation control mechanism in detail, including pure text control (GAN, VAE, diffusion model) and multimodal control (sketch, speech, layout and text fusion), and introduces image quality evaluation indicators such as IS, FID, CLIP Score, etc. At the same time, it is pointed out that there are challenges in the current technology, such as the lack of semantic consistency and the imbalance between multimodal control synergy and usability. Finally, the future direction of technological development is discussed.
文章引用:张金虹, 罗文秋, 曹鹏. 文本生成图像技术的控制机制与研究综述[J]. 计算机科学与应用, 2026, 16(1): 20-27. https://doi.org/10.12677/csa.2026.161003

参考文献

[1] Zhang, M., Liu, F., Li, B., Liu, Z., Ma, W. and Ran, C. (2024) CrePoster: Leveraging Multi-Level Features for Cultural Relic Poster Generation via Attention-Based Framework. Expert Systems with Applications, 245, Article ID: 123136. [Google Scholar] [CrossRef
[2] Rathod, V.S., Tiwari, A. and Kakde, O.G. (2024) Folded Ensemble Deep Learning Based Text Generation on the Brain Signal. Multimedia Tools and Applications, 83, 69019-69047. [Google Scholar] [CrossRef
[3] Huang, L.G. and Li, H.Y. (2023) Research on Image Generation Based on VAE and CGAN Fusion Model. In Proceedings of the 2023 International Conference on Computer Vision and Pattern Recognition, Jingdezhen Ceramic Institute.
[4] Wang, P. and Yang, W. (2023) Text to Multi-Object Images Synthesis Based on Non-Local Self-Attention. In: Proceedings of the 2023 International Conference on Artificial Intelligence and Pattern Recognition, Chongqing University of Technology, 340-347. [Google Scholar] [CrossRef
[5] Jiang, H., Kazi, R.H., Dontcheva, M., Zhao, S. and Shi, K. (2021) Automatic Layout for Interactive UI Elements. ACM Transactions on Graphics (TOG), 40.
[6] Kang, M., Zhu, J., Zhang, R., Park, J., Shechtman, E., Paris, S., et al. (2023) Scaling up GANs for Text-to-Image Synthesis. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 18-22 June 2023, 1-10. [Google Scholar] [CrossRef
[7] Li, J., Yang, J., Zhang, J., Liu, C., Wang, C. and Xu, T. (2021) Attribute-Conditioned Layout GAN for Automatic Graphic Design. IEEE Transactions on Visualization and Computer Graphics, 27, 4039-4048. [Google Scholar] [CrossRef] [PubMed]
[8] Sun, P., Liu, X., Weng, L. and Liu, Z. (2025) Generative Adversarial Network Based on Self-Attention Mechanism for Automatic Page Layout Generation. Applied Sciences, 15, Article No. 2852. [Google Scholar] [CrossRef
[9] Wang, Y., Pu, G., Luo, W., Wang, Y., Xiong, P., Kang, H., et al. (2022) Aesthetic Text Logo Synthesis via Content-Aware Layout Inferring. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 1-10. [Google Scholar] [CrossRef
[10] Li, J., Yang, J., Hertzmann, A., Zhang, J. and Xu, T. (2021) LayoutGAN: Synthesizing Graphic Layouts with Vector-Wireframe Adversarial Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 2388-2399. [Google Scholar] [CrossRef] [PubMed]
[11] Xu, Y., Xia, M., Hu, K., Zhou, S. and Weng, L. (2025) Style Transfer Review: Traditional Machine Learning to Deep Learning. Information, 16, 157-168. [Google Scholar] [CrossRef
[12] Tan, Y. (2022) Feature Recognition and Style Transfer of Painting Image Using Lightweight Deep Learning. Computational Intelligence and Neuroscience, 2022, Article ID: 1478371. [Google Scholar] [CrossRef] [PubMed]