融合大模型微调的工业产品可控生成设计研究
Research on Controllable Generation Design of Industrial Products Based on Fine-Tuning of Large Models
DOI: 10.12677/airr.2026.153069, PDF,   
作者: 滕 龙:浙江师范大学工学院,浙江 金华
关键词: 工业设计AIGCFlux模型LoRA微调可控生成Industrial Design AIGC Flux Model LoRA Fine-Tuning Controllable Generation
摘要: 针对传统工业设计迭代低效,以及现有AIGC图像模型(如U-Net架构)在处理严苛机械几何约束时易发生透视失真的技术痛点,本文以工具车为验证载体,提出一种高精度、强约束的工业产品可控生成设计方法。研究确立具备全局空间感知能力的Flux大模型为基座,通过构建融合单图多视角重构技术的领域数据集,并应用LoRA微调技术进行深度微调,成功内化了品牌家族化特征。为突破单一文本提示的控制局限,本文进一步构建了融合显式几何结构锁定与隐式视觉风格迁移的双向多模态控制矩阵。实验表明,该生成范式在高保真地维持产品三维几何结构的前提下,实现了广域的材质与美学泛化,为装备制造企业的数字化研发与敏捷创新提供了切实可行的系统工程方案。
Abstract: In response to the inefficiency of traditional industrial design iterations and the technical challenges of existing AIGC image models (such as the U-Net architecture) when dealing with strict mechanical geometric constraints, which often lead to perspective distortion, this paper uses a tool vehicle as a verification platform and proposes a high-precision and strongly constrained controllable generation design method for industrial products. The study establishes the Flux large model with global spatial perception capabilities as the base. By constructing a domain dataset that integrates single-image multi-view reconstruction technology and applying LoRA fine-tuning technology for deep fine-tuning, the brand family characteristics have been successfully internalized. To break through the control limitations of a single text prompt, this paper further constructs a bidirectional multimodal control matrix that integrates explicit geometric structure locking and implicit visual style transfer. Experiments show that this generation paradigm can maintain the three-dimensional geometric structure of the product with high fidelity, and achieve wide-area material and aesthetic generalization, providing a practical and feasible system engineering solution for the digital R&D and agile innovation of equipment manufacturing enterprises.
文章引用:滕龙. 融合大模型微调的工业产品可控生成设计研究[J]. 人工智能与机器人研究, 2026, 15(3): 731-742. https://doi.org/10.12677/airr.2026.153069

参考文献

[1] 殷艳娜, 徐剑. 面向服务型制造的区域物流体系要素关系与特征识别——基于多案例的扎根理论分析[J]. 沈阳工业大学学报(社会科学版), 2020, 13(4): 332-339.
[2] Tang, X., Windham, J. and Bush, B. (2024) Pre-AI and Post-AI Design: Balancing Human Creativity and AI Tools in the Industrial Design Process. Proceeding of the 2024 International Conference on Artificial Intelligence and Future Education, Shanghai, 1-2 November 2024, 100-108. [Google Scholar] [CrossRef
[3] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al. (2020) Generative Adversarial Networks. Communications of the ACM, 63, 139-144. [Google Scholar] [CrossRef
[4] Ho, J., Jain, A. and Abbeel, P. (2020) Denoising Diffusion Probabilistic Models. Advances in Neural Information Processing Systems, 33, 6840-6851.
[5] Hu, E.J., Shen, Y., Wallis, P., et al. (2022) LoRA: Low-Rank Adaptation of Large Language Models. 2022 International Conference on Learning Representations, Online, 25-29 April 2022, 1-20.
[6] Zhang, L., Rao, A. and Agrawala, M. (2023) Adding Conditional Control to Text-to-Image Diffusion Models. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, 1-6 October 2023, 3836-3847. [Google Scholar] [CrossRef
[7] Ye, H., Zhang, J., Liu, S., et al. (2023) IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models. [Google Scholar] [CrossRef
[8] Heusel, M., Ramsauer, H., Unterthiner, T., et al. (2017) GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. Advances in Neural Information Processing Systems, 30, 6629-6640.
[9] Peebles, W. and Xie, S. (2023) Scalable Diffusion Models with Transformers. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, 1-6 October 2023, 4199-4209. [Google Scholar] [CrossRef
[10] Jun, W., Tianliang, Z., Jiahui, Z., Tianyi, L. and Chunzhi, W. (2023) Hierarchical Multiples Self-Attention Mechanism for Multi-Modal Analysis. Multimedia Systems, 29, 3599-3608. [Google Scholar] [CrossRef
[11] Lipman, Y., Chen, R.T.Q., Ben-Hamu, H., et al. (2023) Flow Network Matching: Generating Infinite Resolution Continuous-Normalizing Flows. 2023 International Conference on Learning Representations, Kigali, 1-5 May 2023, 1-28.
[12] Liu, X., Gong, C. and Qi, L. (2023) Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow. 2023 International Conference on Learning Representations, Kigali, 1-5 May 2023, 1-41.
[13] Labs, B.F., Batifol, S., Blattmann, A., et al. (2025) FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space.