学生成绩等级预测模型研究
Research on Student Achievement Level Prediction Model
摘要: 学生成绩等级的精准预测是优化学生管理、提升教学指导效能的关键支撑,本研究以Kalboard 360数据集为基础,聚焦学生在线学习平台产生的过程性数据——既涵盖学历背景等基础特征,也包含课堂举手、学习资源访问频次等行为特征(此类数据能够有效反映学生的知识掌握水平),借助PySpark生态工具链开展建模工作:首先通过pyspark.sql库将原始数据转换为DataFrame格式,并完成数据编码等预处理;随后基于pyspark.ml库中的分类算法,分别构建逻辑回归与随机森林两类学生成绩等级预测模型,经混淆矩阵、ROC曲线等性能指标验证,随机森林模型的预测精度显著优于逻辑回归模型,而该研究的核心价值在于,模型输出结果可支撑面向学生的个性化学习建议制定,同时帮助教师及时识别学生的学习难点与问题,进而实施针对性的教学调整与指导,其应用也有助于推动教育领域智能化、个性化教学模式的落地,最终助力学生的全面发展。
Abstract: Accurate prediction of students’ academic performance grades serves as a pivotal underpinning for optimizing student management and enhancing the efficacy of teaching guidance. Based on the Kalboard 360 dataset, this study focuses on the process-oriented data generated by students on online learning platforms, including both basic attributes such as educational background and behavioral features like in-class hand-raising and learning resource access frequency. Such data can effectively reflect students’ level of knowledge mastery. This research leverages the PySpark ecosystem toolkit for model construction. First, the pyspark.sql library is used to convert raw data into DataFrame format and complete preprocessing procedures such as data encoding. Subsequently, based on the classification algorithms in the pyspark.ml library, two types of student academic performance grade prediction models (logistic regression and random forest) are constructed respectively. Verified by performance metrics including confusion matrices and ROC curves, the random forest model demonstrates significantly higher prediction accuracy than the logistic regression model. The core value of this study lies in the fact that the model output can support the formulation of personalized learning recommendations for students, while also helping teachers promptly identify students’ learning difficulties and problems, thereby implementing targeted teaching adjustments and guidance. The application of this model is conducive to promoting the implementation of intelligent and personalized teaching models in the field of education, ultimately facilitating the all-round development of students.
文章引用:刘洋, 杨博豪, 乌伟. 学生成绩等级预测模型研究[J]. 人工智能与机器人研究, 2026, 15(2): 445-454. https://doi.org/10.12677/airr.2026.152043

参考文献

[1] 秦亚杰, 刘梦赤, 胡婕, 冯嘉美. 基于认知诊断与XGBoost的学生表现预测研究[J]. 华南师范大学学报(自然科学版), 2023, 55(1): 55-64.
[2] 马丹. 基于数据挖掘技术的学生成绩分析系统的设计与实现[D]: [硕士学位论文]. 长春: 吉林大学, 2015.
[3] 林梦楠, 李金辉. 基于自适应差分进化的学生成绩等级预测神经网络模型[J]. 现代电子技术, 2022, 45(3): 130-134.
[4] 陆鑫赟, 王兴芬. 双隐层BP神经网络大学生创新能力预估模型[J]. 中国科技论文, 2018, 13(8): 926-932.
[5] Amoo, M.A., Alaba, O.B. and Usman, O.L. (2018) Predictive Modelling and Analysis of Academic Performance of Secondary School Students: Artificial Neural Network Approach. International Journal of Science and Technology Education Research, 9, 1-8. [Google Scholar] [CrossRef
[6] Nguyen, K.T., Duong, T.M., Tran, N.Y., et al. (2020) The Impact of Emotional Intelligence on Performance: A Closer Look at Individual and Environmental Factors. The Journal of Asian Finance, Economics and Business, 7, 183-193.
[7] 孟卓, 袁梅宇. 教育数据挖掘发展现状及研究规律的分析[J]. 教育导刊, 2015(2): 29-33.
[8] 张燕南. 大数据的教育领域应用之研究[D]: [博士学位论文]. 上海: 华东师范大学, 2016.
[9] 高秀梅. 当代大学生学习动机的特征及其对学业成绩的影响[J]. 高教探索, 2020(1): 43-47.
[10] 马玉玲. 基于机器学习的高校学生成绩预测方法研究[D]: [博士学位论文]. 济南: 山东大学, 2020.
[11] 王艳晓. 基于流程性教育数据挖掘的学生成绩预测方法研究[D]: [硕士学位论文]. 青岛: 山东科技大学, 2018.
[12] Kazumali, E. and Kalinga, E. (2017) Neural Network Model for Predicting Students’ Achievement in Blended Courses at the University of Dar Es Salaam. International Journal of Artificial Intelligence & Applications, 8, 23-35. [Google Scholar] [CrossRef
[13] 张文奇, 王海瑞, 朱贵富. 基于因果推断和多头自注意力机制的学生成绩预测[J]. 现代电子技术, 2023, 46(17): 111-116.
[14] Barman, H., Dutta, M.K. and Nath, H.K. (2018) The Telecommunications Divide among Indian States. Telecommunications Policy, 42, 530-551. [Google Scholar] [CrossRef
[15] Hu, L.Q. and Zhao, G. (2021) Research on Influencing Factors of Machine Learning Algorithm on Student Achievement Based on Data Mining. Journal of Nanchang Hangkong University (Natural Science Edition), 35, 43-48, 97.