面向3D高斯泼溅质量评估的多模态智能体系统
A Multimodal Agent System for Quality Assessment of 3D Gaussian Splatting
摘要: 三维内容生成(AIGC-3D)正经历从隐式神经辐射场(NeRF)向显式3D高斯泼溅(3D Gaussian Splatting, 3DGS)的范式转移。尽管3DGS在渲染效率与视觉保真度方面取得突破,其特有的结构性伪影(如高斯膨胀、针状畸变、浮点噪声等)对现有图像质量评估(IQA)体系构成严峻挑战。传统指标(如PSNR、SSIM)缺乏对三维几何失真的感知能力,而主观评价成本高昂且难以集成至自动化流程,导致评估领域存在显著“语义鸿沟”。为弥合这一鸿沟,本文提出Agentic-IQA——一种融合感知、记忆与认知能力的多模态智能体质量评估系统。该系统首次将MEt3R几何一致性度量引入3DGS评估框架,并结合基于CLIP/SigLIP嵌入的检索增强生成(RAG)机制与LangGraph驱动的思维树(Tree of Thoughts, ToT)推理引擎,构建“指标提取–检索增强–结构化推理”的混合评估范式。我们在自建的多源融合数据集(含3DGS-IEval-15K、MUGSQA与NeRF-QA)上开展系统性实验。结果表明,Agentic-IQA在整体测试集上达到0.892的PLCC与0.876的SRCC,显著优于当前最先进方法Q-Align (提升6.1个百分点);在几何退化子集上优势更为突出(PLCC提升11.2个百分点)。消融实验与案例分析进一步验证了各模块的有效性及系统在可解释性、鲁棒性方面的综合优势。本工作不仅为3DGS提供了首个具备几何理解能力的质量诊断工具,更推动了AIGC-3D评估从“看得像”迈向“看得懂”的范式演进。
Abstract: The field of 3D content generation (AIGC-3D) is undergoing a paradigm shift from implicit representations like Neural Radiance Fields (NeRF) to explicit modeling via 3D Gaussian Splatting (3DGS). While 3DGS achieves remarkable advances in rendering efficiency and photorealism, it introduces novel structural artifacts—such as Gaussian popping, needle-like distortions, view-dependent flickering, and floaters—that challenge conventional Image Quality Assessment (IQA) frameworks. Traditional metrics (e.g., PSNR, SSIM) fail to capture 3D geometric distortions, while subjective evaluations (e.g., MOS) are costly and non-differentiable, resulting in a significant “semantic gap” in quality assessment. To bridge this gap, we propose Agentic-IQA, a multimodal agent-based quality evaluation system that integrates perception, memory, and cognition. Our approach uniquely incorporates the MEt3R geometric consistency metric into the 3DGS evaluation pipeline and combines it with a Retrieval-Augmented Generation (RAG) module—built upon CLIP/SigLIP embeddings—and a LangGraph-orchestrated Tree-of-Thoughts (ToT) reasoning engine, establishing a hybrid assessment paradigm of “metric extraction–retrieval augmentation-structured reasoning.” We conduct comprehensive experiments on a curated multi-source dataset comprising 3DGS-IEval-15K, MUGSQA, and NeRF-QA. Results show that Agentic-IQA achieves a PLCC of 0.892 and an SRCC of 0.876 on the full test set, significantly outperforming the current state-of-the-art method Q-Align by 6.1 percentage points in PLCC; the advantage widens to 11.2 points on the geometry-degradation subset. Ablation studies and qualitative case analyses further confirm the necessity of each component and demonstrate the system’s superior interpretability and robustness. This work not only delivers the first geometry-aware quality diagnostic tool for 3DGS but also advances AIGC-3D evaluation from “looking realistic” toward “understanding structure.”
文章引用:叶咏华, 齐亚莉. 面向3D高斯泼溅质量评估的多模态智能体系统[J]. 计算机科学与应用, 2026, 16(1): 353-363. https://doi.org/10.12677/csa.2026.161029

参考文献

[1] Liu, J., Huang, X., Huang, T., et al. (2024) A Comprehensive Survey on 3D Content Generation. arXiv:2402.01166.
[2] Kerbl, B., Kopanas, G., Leimkuehler, T. and Drettakis, G. (2023) 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Transactions on Graphics, 42, 1-14. [Google Scholar] [CrossRef
[3] 卢丽华, 张晓辉, 魏辉, 等. 以神经辐射场和三维高斯泼溅为基础的文本指导三维编辑综述[J]. 中国图象图形学报, 2025, 30(5): 1238-1256.
[4] Daly, E., Zhu, H., Wu, M., et al. (2024) Artifacts in 3D Gaussian Splatting: A Survey and Benchmark. arXiv:2406.18378.
[5] 张桦. 基于视觉感知的图像质量评价方法研究[D]: [硕士学位论文]. 杭州: 浙江大学, 2009.
[6] ITU-R (2019) Recommendation ITU-R BT.500-14: Methodologies for the Subjective Assessment of the Quality of Television Images. International Telecommunication Union.
[7] 窦越. 无参考标准的空间目标图像质量评估方法研究[D]: [硕士学位论文]. 哈尔滨: 哈尔滨工业大学, 2021.
[8] Paszke, A., Gross, S., Massa, F., et al. (2019) PyTorch: An Imperative Style, High-Performance Deep Learning Library. Advances in Neural Information Processing Systems (NeurIPS), Vancouver, December 2019, 8024-8035.
[9] Lu, H., Yang, Z., Li, Z., et al. (2024) GSP-QA: A Dataset for Quality Assessment of Gaussian Splatting Primitives. arXiv:2407.12345.
[10] Li, Z., Wu, Q., Chen, Y., et al. (2024) AGIQA-3K: A Database for AI-Generated Image Quality Assessment. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 123-132.
[11] Wang, Z., Bai, Y., Wang, K., et al. (2023) NeRF-QA: Neural Radiance Fields Quality Assessment Database. arXiv:2305.02672.
[12] Cherti, M., Beaumont, R., Wightman, R., Wortsman, M., Ilharco, G., Gordon, C., et al. (2023) Reproducible Scaling Laws for Contrastive Language-Image Learning. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 2818-2829. [Google Scholar] [CrossRef
[13] 孙雨生, 曾俊皓. 向量数据库及其应用研究[J]. 科技情报研究, 2024, 6(4): 11-24.
[14] Yang, L., Kang, B., Huang, Z., et al. (2024) Depth Anything V2. arXiv:2406.09414.
[15] Bae, J., Moon, T. and Im, S. (2024) Deep Surface Normal Estimation with Learnable Truncation (DSINE). Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, 9535-9545.
[16] 程恺. 几何和结构指导的场景可微辐射场渲染方法研究[D]: [硕士学位论文]. 合肥: 中国科学技术大学, 2025.
[17] Rombach, R., Blattmann, A., Lorenz, D., Esser, P. and Ommer, B. (2022) High-Resolution Image Synthesis with Latent Diffusion Models. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 10684-10695. [Google Scholar] [CrossRef
[18] Wang, Z. and Bovik, A.C. (2009) Mean Squared Error: Love It or Leave It? A New Look at Signal Fidelity Measures. IEEE Signal Processing Magazine, 26, 98-117. [Google Scholar] [CrossRef
[19] Gonzalez, R.C. and Woods, R.E. (2008) Digital Image Processing. 3rd ed. Pearson Prentice Hall.
[20] Radford, A., Kim, J.W., Hallacy, C., et al. (2021) Learning Transferable Visual Models from Natural Language Supervision. Proceedings of the 38th International Conference on Machine Learning, PMLR, 8748-8763.
[21] Zhai, X., Mustafa, B., Kolesnikov, A. and Beyer, L. (2023) Sigmoid Loss for Language Image Pre-Training. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, 1-6 October 2023, 11975-11986. [Google Scholar] [CrossRef
[22] Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2021) An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. International Conference on Learning Representations (ICLR). Vienna, 1-21.
[23] 陈涛, 杨启亮, 陈寅. 神经辐射场技术及应用综述[J]. 计算机辅助设计与图形学学报, 2025, 37(1): 51-74.
[24] Wang, Z., Bovik, A.C., Sheikh, H.R. and Simoncelli, E.P. (2004) Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Transactions on Image Processing, 13, 600-612. [Google Scholar] [CrossRef] [PubMed]
[25] 陈丹丹. 基于不同双目颜色分配方案的立体视频视觉舒适度评价研究[D]: [硕士学位论文]. 昆明: 云南师范大学, 2023.
[26] Wang, S., Leroy, V., Cabon, Y., Chidlovskii, B. and Revaud, J. (2024) DUSt3R: Geometric 3D Vision Made Easy. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 16-22 June 2024, 20697-20709. [Google Scholar] [CrossRef
[27] Wu, H., Zhang, Z., Zhang, W., et al. (2024) Q-Align: Teaching LMMs for Visual Scoring via Discretizable Multi-Choice Alignment. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 2551-2561.
[28] Wu, H., Chen, K., Zhang, W., et al. (2024) Q-Bench: A Benchmark for General-Purpose Visual Quality Assessment with Multimodal Large Language Models. International Conference on Learning Representations (ICLR), Vienna, 1-26.
[29] Chase, H. (2024) LangChain: Building Applications with LLMs.
https://python.langchain.com/
[30] Yao, S., Yu, D., Zhao, J., et al. (2023) Tree of Thoughts: Deliberate Problem Solving with Large Language Models. Advances in Neural Information Processing Systems (NeurIPS), New Orleans, 11830-11843.
[31] Su, S., Yan, Q., Zhu, Y., Zhang, C., Ge, X., Sun, J., et al. (2020) Blindly Assess Image Quality in the Wild Guided by a Self-Adaptive Hyper Network. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 3667-3676. [Google Scholar] [CrossRef
[32] Yang, S., Wu, T., Shi, S., Lao, S., Gong, Y., Cao, M., et al. (2022) MANIQA: Multi-Dimension Attention Network for No-Reference Image Quality Assessment. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), New Orleans, 19-20 June 2022, 1191-1200. [Google Scholar] [CrossRef
[33] Golestaneh, S.A., Dadsetan, S. and Kitani, K.M. (2022) No-Reference Image Quality Assessment via Transformers, Relative Ranking, and Self-Consistency. 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, 3-8 January 2022, 3220-3230. [Google Scholar] [CrossRef
[34] 李玉洁, 马子航, 王艺甫, 等. 视觉Transformer (ViT)发展综述[J]. 计算机科学, 2025, 52(1): 194-209.
[35] Liu, Y., Duan, H., Pu, Y., et al. (2024) Q-Bench+: A Benchmark for Multi-Modal Learning in Low-Level Vision. arXiv:2404.18567.
[36] You, Z., Li, Z., Gu, J., et al. (2023) Depicting beyond Scores: Advancing Image Quality Assessment with Natural Language Descriptors (DepictQA). Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, 3514-3524.