基于沙箱技术的大模型应用安全漏洞检测系统
A Sandbox-Based Security Vulnerability Detection Platform for Large Model Applications
摘要: 由于大语言模型(Large Language Model, LLM)在智能问答、自动化运维、代码生成等场景中的应用日趋普遍,其安全问题愈加突出,例如工具投毒、提示注入、未授权操作及命令执行等安全风险。而传统Web漏洞扫描工具尚不能可靠地检测此类新型漏洞,针对以上问题综合应用沙箱隔离机制、攻击链还原算法、Python自动化分析技术及控制流图(CFG)分析等技术,在安全隔离环境中对大模型工具代码及其运行行为进行动态检测,既能检测SQL注入、XSS、CSRF等传统漏洞,又能检测MCP协议特有的工具投毒攻击及复杂提示注入漏洞,并能够输出包含完整数据流污染路径的可视化报告。实验结果表明所提出方案能提升大模型应用安全检测能力的有效性,也为此类应用提供了自动化、低成本的安全监控及漏洞修复参考方案。
Abstract: As the application of Large Language Models (LLMs) in scenarios such as intelligent question answering, automated operation and maintenance, and code generation becomes increasingly prevalent, their security issues have become more prominent, including security risks such as tool poisoning, prompt injection, unauthorized operations, and command execution. Traditional web vulnerability scanning tools are still unable to reliably detect these new types of vulnerabilities. To address these issues, we comprehensively apply sandbox isolation mechanisms, attack chain reconstruction algorithms, Python automated analysis techniques, and Control Flow Graph (CFG) analysis to dynamically detect the tool code and its operational behavior of large models in a secure isolation environment. This approach not only detects traditional vulnerabilities such as SQL injection, XSS, and CSRF but also identifies tool poisoning attacks and complex prompt injection vulnerabilities specific to the MCP protocol. It can also output a visual report containing the complete data flow pollution path. Experimental results show that the proposed solution enhances the effectiveness of security detection capabilities for large model applications and provides an automated, low-cost reference solution for security monitoring and vulnerability remediation for such applications.
文章引用:陈聪, 谢晟宇, 陈皓, 陈鑫宇, 于越, 徐亚峰. 基于沙箱技术的大模型应用安全漏洞检测系统[J]. 计算机科学与应用, 2026, 16(3): 136-147. https://doi.org/10.12677/csa.2026.163093

参考文献

[1] 赵月, 何锦雯, 朱申辰, 等. 大语言模型安全现状与挑战[J]. 计算机科学, 2024, 51(1): 68-71.
[2] Gulyamov, S., Gulyamov, S., Rodionov, A., Khursanov, R., Mekhmonov, K., Babaev, D., et al. (2026) Prompt Injection Attacks in Large Language Models and AI Agent Systems: A Comprehensive Review of Vulnerabilities, Attack Vectors, and Defense Mechanisms. Information, 17, Article No. 54. [Google Scholar] [CrossRef
[3] 任奎, 王志波, 秦湛, 等. 大语言模型越狱攻击与防御[J]. 计算机与通信, 2025, 42(11): 112-125.
[4] 姜毅, 杨勇, 等. 大语言模型安全与隐私风险综述[J]. 浙江大学网络安全学报, 2025(专题卷): 13-30.
[5] 梁光辉, 摆亮, 庞建民, 等. 一种基于混合学习的恶意代码检测方法[J]. 电子学报, 2021, 49(2): 286-291.
[6] 秦臻, 庄添铭, 朱国淞, 等. 面向人工智能模型的安全攻击与防御策略综述[J]. 计算机研究与发展, 2024, 61(10): 2627-2648.
[7] 台建玮, 杨双宁, 王佳佳. 大语言模型对抗性攻击与防御综述[J]. 计算机研究与发展, 2025, 62(3): 563-588.