生成式人工智能融入STEM课堂教学的国外实证研究综述
A Review of Empirical Studies in the International Journals on Integrating Generative AI into STEM Instruction
DOI: 10.12677/ae.2026.161082, PDF,    国家自然科学基金支持
作者: 俞 快:上海交通大学媒体与传播学院,上海;陈 斌*:上海交通大学化学化工学院,上海;李亭萱*:上海交通大学教育学院,上海
关键词: STEM教育课堂教学文献研究生成式人工智能STEM Education Instructional Practice Literature Review Generative AI
摘要: 随着生成式人工智能的快速发展,大模型在课堂教学中的应用备受关注。本文以近三年国际期刊发表的29篇相关实证研究为分析对象,系统梳理大模型在STEM课堂教学中的应用研究现状。文献统计结果显示,当前该领域研究呈现明显的教育阶段聚焦特征:以高等教育场景为主,相关文献占比达62%;基础教育场景的研究相对薄弱,占比为38%。值得注意的是,截至目前,尚未检索到涉及学前教育或特殊教育领域的相关实证研究成果。从学科分布来看,基础教育的文献主要关注数学和物理,鲜少关注化学、生物、工程等学科。此外,在STEM课堂教学中,常用的3种大模型增强策略包括提示语工程,模型微调和检索增强生成。其中,提示语工程最为常用。最后,大模型在STEM课堂教学中的应用主要涵盖至教师教学(教)、学生学习(学)、学业评价(评)三个维度中。其中,学习评价相关的文献数量较多。总体而言,现有文献表明,大模型能够提升课堂互动性、促进深度学习和增强学生学习动机。未来研究应进一步关注大模型在具体学科中的优化实践,以推动STEM课堂教学向智能化、个性化与高质量方向发展。
Abstract: With the rapid development of generative artificial intelligence, the integration of large language models (LLMs) into STEM education has become an emerging research trend. This review synthesizes 29 empirical studies published in international journals for the past three years. The results of this review paper indicate that LLMs are mainly applied in higher education (62%), with comparatively fewer studies in K-12 contexts (38%), and no empirical studies were found in the context of preschool or special education. At the disciplinary level, K-12 studies tend to focus on mathematics and physics, whereas chemistry, biology, and engineering remain underexplored. Across these reviewed articles, three optimization methods in LLMs are often used: prompt engineering, model fine-tuning, and retrieval-augmented generation (RAG). The most often used method is prompt engineering. The integrations of LLMs into STEM education align with teaching, learning, and assessment in instructional practice. The assessment gains the greatest research attention. Overall, current empirical studies reveal that LLMs can increase instructional effectiveness, promote deeper cognitive engagement, and strengthen students’ motivation. Future research should prioritize discipline-specific instructional design to advance more customized and higher-quality STEM instruction.
文章引用:俞快, 陈斌, 李亭萱. 生成式人工智能融入STEM课堂教学的国外实证研究综述[J]. 教育进展, 2026, 16(1): 585-592. https://doi.org/10.12677/ae.2026.161082

参考文献

[1] 中华人民共和国教育部. 中共中央 国务院印发《教育强国建设规划纲要(2024-2035年)》[EB/OL]. 2025-01-19.
http://www.moe.gov.cn/jyb_xxgk/moe_1777/moe_1778/202501/t20250119_1176193.html?zbb=true, 2025-11-20.
[2] 林成华, 张维佳. 世界主要发达国家STEM战略布局与借鉴建议[J]. 中国高等教育, 2024(7): 59-64.
[3] 王志军, 龙帅, 张吉. 人机协同智能课堂教学评价层级模型构建研究[J]. 远程教育杂志, 2025, 43(5): 32-40.
[4] Hwang, W. and Utami, I.Q. (2024) Using GPT and Authentic Contextual Recognition to Generate Math Word Problems with Difficulty Levels. Education and Information Technologies, 29, 1-29. [Google Scholar] [CrossRef
[5] Patel, N., Nagpal, P., Shah, T., Sharma, A., Malvi, S. and Lomas, D. (2023) Improving Mathematics Assessment Readability: Do Large Language Models Help? Journal of Computer Assisted Learning, 39, 804-822. [Google Scholar] [CrossRef
[6] Malik, R., Abdi, D., Wang, R. and Demszky, D. (2025) Scaffolding Middle School Mathematics Curricula with Large Language Models. British Journal of Educational Technology, 56, 999-1027. [Google Scholar] [CrossRef
[7] Bitzenbauer, P. (2023) ChatGPT in Physics Education: A Pilot Study on Easy-to-Implement Activities. Contemporary Educational Technology, 15, ep430. [Google Scholar] [CrossRef
[8] Wu, T., Lee, H., Chen, P., Lin, C. and Huang, Y. (2025) Integrating Peer Assessment Cycle into ChatGPT for Stem Education: A Randomised Controlled Trial on Knowledge, Skills, and Attitudes Enhancement. Journal of Computer Assisted Learning, 41, e13085. [Google Scholar] [CrossRef
[9] Reddy, M.R., Walter, N.G. and Sevryugina, Y.V. (2024) Implementation and Evaluation of a ChatGPT-Assisted Special Topics Writing Assignment in Biochemistry. Journal of Chemical Education, 101, 2740-2748. [Google Scholar] [CrossRef
[10] Urrutia, F. and Araya, R. (2024) Who’s the Best Detective? Large Language Models vs. Traditional Machine Learning in Detecting Incoherent Fourth Grade Math Answers. Journal of Educational Computing Research, 61, 1723-1754. [Google Scholar] [CrossRef
[11] Chen, Z. and Wan, T. (2025) Grading Explanations of Problem-Solving Process and Generating Feedback Using Large Language Models at Human-Level Accuracy. Physical Review Physics Education Research, 21, Article 010126. [Google Scholar] [CrossRef
[12] Tsai, M., Ong, C.W. and Chen, C. (2023) Exploring the Use of Large Language Models (LLMS) in Chemical Engineering Education: Building Core Course Problem Models with Chat-GPT. Education for Chemical Engineers, 44, 71-95. [Google Scholar] [CrossRef
[13] Yang, J., Latif, E., He, Y. and Zhai, X. (2025) Fine-Tuning ChatGPT for Automatic Scoring of Written Scientific Explanations in Chinese. Journal of Science Education and Technology, 34, 719-736.
[14] Long, Y., Luo, H. and Zhang, Y. (2024) Evaluating Large Language Models in Analysing Classroom Dialogue. npj Science of Learning, 9, Article No. 60. [Google Scholar] [CrossRef] [PubMed]
[15] Fussell, R.K., Flynn, M., Damle, A., Fox, M.F.J. and Holmes, N.G. (2025) Comparing Large Language Models for Supervised Analysis of Students’ Lab Notes. Physical Review Physics Education Research, 21, Article 010128. [Google Scholar] [CrossRef
[16] Yu, J., Yu, S. and Chen, L. (2025) Using Hybrid Intelligence to Enhance Peer Feedback for Promoting Teacher Reflection in Video‐Based Online Learning. British Journal of Educational Technology, 56, 569-594. [Google Scholar] [CrossRef
[17] Ng, D.T.K., Tan, C.W. and Leung, J.K.L. (2024) Empowering Student Self‐Regulated Learning and Science Education through ChatGPT: A Pioneering Pilot Study. British Journal of Educational Technology, 55, 1328-1353. [Google Scholar] [CrossRef
[18] Coban, A., Dzsotjan, D., Küchemann, S., Durst, J., Kuhn, J. and Hoyer, C. (2025) AI Support Meets AR Visualization for Alice and Bob: Personalized Learning Based on Individual ChatGPT Feedback in an AR Quantum Cryptography Experiment for Physics Lab Courses. EPJ Quantum Technology, 12, Article No. 15. [Google Scholar] [CrossRef
[19] Behrens, K.A., Marbach-Ad, G. and Kocher, T.D. (2024) AI in the Genetics Classroom: A Useful Tool but Not a Replacement for Creative Writing. Journal of Science Education and Technology, 34, 621-635. [Google Scholar] [CrossRef
[20] Wan, T. and Chen, Z. (2024) Exploring Generative AI Assisted Feedback Writing for Students’ Written Responses to a Physics Conceptual Question with Prompt Engineering and Few-Shot Learning. Physical Review Physics Education Research, 20, Article 010152. [Google Scholar] [CrossRef
[21] Xu, Y., Liu, L., Xiong, J. and Zhu, G. (2025) Graders of the Future: Comparing the Consistency and Accuracy of GPT4 and Pre-Service Teachers in Physics Essay Question Assessments. Journal of Baltic Science Education, 24, 187-207. [Google Scholar] [CrossRef
[22] Kortemeyer, G. (2023) Toward AI Grading of Student Problem Solutions in Introductory Physics: A Feasibility Study. Physical Review Physics Education Research, 19, Article 020163. [Google Scholar] [CrossRef
[23] Kieser, F., Wulff, P., Kuhn, J. and Küchemann, S. (2023) Educational Data Augmentation in Physics Education Research Using ChatGPT. Physical Review Physics Education Research, 19, Article 020150. [Google Scholar] [CrossRef
[24] Krupp, L., Bley, J., Gobbi, I., Geng, A., Müller, S., Suh, S., et al. (2025) LLM-Generated Tips Rival Expert-Created Tips in Helping Students Answer Quantum-Computing Questions. EPJ Quantum Technology, 12, Article No. 33. [Google Scholar] [CrossRef
[25] Martin, P.P., Kranz, D., Wulff, P. and Graulich, N. (2024) Exploring New Depths: Applying Machine Learning for the Analysis of Student Argumentation in Chemistry. Journal of Research in Science Teaching, 61, 1757-1792. [Google Scholar] [CrossRef
[26] Stadler, M., Bannert, M. and Sailer, M. (2024) Cognitive Ease at a Cost: LLMS Reduce Mental Effort but Compromise Depth in Student Scientific Inquiry. Computers in Human Behavior, 160, Article 108386. [Google Scholar] [CrossRef
[27] Dilling, F. and Herrmann, M. (2024) Using Large Language Models to Support Pre-Service Teachers Mathematical Reasoning—An Exploratory Study on ChatGPT as an Instrument for Creating Mathematical Proofs in Geometry. Frontiers in Artificial Intelligence, 7, Article ID: 1460337. [Google Scholar] [CrossRef] [PubMed]
[28] Küchemann, S., Steinert, S., Revenga, N., Schweinberger, M., Dinc, Y., Avila, K.E., et al. (2023) Can ChatGPT Support Prospective Teachers in Physics Task Development? Physical Review Physics Education Research, 19, Article 020128. [Google Scholar] [CrossRef