国产生成式人工智能解决物理问题能力研究——以“智谱AI”、“讯飞星火认知大模型”、“天工”、“360智脑”、“文心一言”为例
Research on the Ability of Domestic Generative Artificial Intelligence to Solve Physical Problems—Taking “Zhipu AI”, “SparkDesk”, “Tiangong”, “360 Wisdom Brain”, “ERNIE Bot” as Examples
DOI: 10.12677/airr.2026.151030, PDF,    科研立项经费支持
作者: 庞付豪, 陈美娜*, 孙雨心:山东师范大学物理与电子科学学院,山东 济南
关键词: 人工智能物理教育ChatGPT生成式语言模型Artificial Intelligence Physics Education ChatGPT Generative Language Model
摘要: ChatGPT是一款功能强大的预训练语言模型,自2022年发布以来,引发了人们的广泛关注。为紧跟人工智能的发展潮流,我国也相继出品了自己的生成式人工智能模型。为了检测国产模型解决实际物理问题的能力,本文选取了“智谱AI”、“讯飞星火”、“天工”、“360智脑”和“文心一言”等五大模型,以经典力学问题为例,分别测试了其概念理解、推理计算和实验设计能力。研究发现,上述五个模型在概念理解方面的解题能力最强,推理计算次之,实验设计最差,实际解题过程存在“计算失误”、“前后回答不一致”、“情境分析能力欠缺”等问题。横向比较:“天工”在概念理解方面的表现占优,而“文心一言”在推理计算方面的表现最好。总的来说,我国国产模型实现替代人类解题似乎还有很长的一段路要走。
Abstract: ChatGPT is a powerful pre-trained language model that has attracted widespread attention since its release in 2022. Following the development trend of artificial intelligence, China has also successively produced its own generative artificial intelligence models. In order to test the ability of domestic models to solve practical physical problems, this paper selects five domestic models, namely, “Zhipu AI”, “SparkDesk”, “Tiangong”, “360 Wisdom Brain” and “ERNIE Bot”, to test their conceptual understanding, reasoning and calculation, and experimental design capabilities, taking classical mechanical problems as examples. Research has found that the above five models have the strongest problem-solving ability in concept understanding, followed by reasoning and calculation, and the worst experimental design. In the actual problem-solving process, there are problems such as “calculation errors”, “inconsistent answers before and after”, and “lack of situational analysis ability”. Horizontal comparison: “Tiangong” performs better in concept understanding, while “ERNIE Bot” performs best in reasoning and calculation. Overall, it seems that there is still a long way to go for domestically produced models in China to replace manual problem-solving.
文章引用:庞付豪, 陈美娜, 孙雨心. 国产生成式人工智能解决物理问题能力研究——以“智谱AI”、“讯飞星火认知大模型”、“天工”、“360智脑”、“文心一言”为例[J]. 人工智能与机器人研究, 2026, 15(1): 305-317. https://doi.org/10.12677/airr.2026.151030

参考文献

[1] 童大振, 任红梅. ChatCPT-3.5解决物理问题的表现研究[J]. 中学物理, 2023, 41(9): 11-14.
[2] 钱彦, 梅影. 从理念到实践: 生成式人工智能在智慧图书馆中的应用探索[J]. 图书馆研究与工作, 2023(12): 27-34.
[3] 吴冰蓝, 周丽萍, 岳昌君. ChatGPT/生成式人工智能与就业替代: 基于高校大学生能力供求的视角[J]. 教育发展研究, 2023, 43(19): 40-48.
[4] 曹开研. 当前生成式人工智能治理面临的挑战[J]. 青年记者, 2023(22): 95-96.
[5] 教育部考试中心. 中国高考评价体系说明[M]. 北京: 人民教育出版社, 2019.
[6] 刘冰冰. 力学概念测试卷中文版修订、检验与应用[D]: [硕士学位论文]. 济宁: 曲阜师范大学, 2020.
[7] Likert, R. (1932) A Technique for the Measurement of Attitudes. Archives of Psychology, 22, 55.
[8] 丁晓蔚, 周孟博. ChatGPT的内在与外在矛盾探析——兼及矛盾化解之道[J]. 当代传播, 2023(6): 65-70.
[9] 曾以恒, 童大振. ChatGPT4解决科学问题能力的研究——以高考全国物理卷为例[J]. 中学物理, 2024, 42(5): 22-27.
[10] Guu, K., Lee, K., Tung, Z., et al. (2020) Retrieval Augmented Language Model Pre-Training. The 37th International Conference on Machine Learning, 13-18 July 2020, 3929-3938.
[11] Lewis, P., Perez, E., Piktus, A., et al. (2020) Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems, 33, 9459-9474.
[12] 张熙, 杨小汕, 徐常胜. ChatGPT及生成式人工智能现状及未来发展方向[J]. 中国科学基金, 2023, 37(5): 743-750.