基于机器学习的学生成绩预测模型的比较研究
A Comparative Study of Machine Learning-Based Student Performance Prediction Models
摘要: 学生学业表现预测作为教育数据挖掘的核心研究方向,对于实现精准教育干预和个性化学习路径规划具有重要意义。本研究基于Kaggle平台提供的多维度学生数据集,采用五种典型机器学习回归算法构建预测模型,旨在系统评估各模型在教育预测任务中的性能表现。研究数据集包含200个学生样本,涵盖学习时间、睡眠时长、出勤率及历史成绩等关键特征。通过标准化预处理、超参数优化及交叉验证等严谨实验流程,研究结果显示:线性回归模型在测试集上表现最优,其均方根误差(RMSE)为2.7860,决定系数(R2)达0.8537。特征相关性分析进一步表明,学习时间与考试成绩呈强正相关(r = 0.7768),而历史成绩、出勤率和睡眠时长的相关性依次递减。本研究不仅为教育数据挖掘提供了方法学参考,也为教育实践中的精准干预策略提供了实证依据。
Abstract: Student academic performance prediction, as a core research direction in educational data mining, is of great significance for achieving precise educational intervention and personalized learning path planning. This study constructs predictive models using five typical machine learning regression algorithms based on a multidimensional student dataset from the Kaggle platform, aiming to systematically evaluate the performance of each model in educational prediction tasks. The research dataset contains 200 student samples, covering key features such as study time, sleep duration, attendance rate, and historical grades. Through rigorous experimental procedures including standardized preprocessing, hyperparameter optimization, and cross-validation, the results show that the linear regression model performs best on the test set, with a root mean square error (RMSE) of 2.7860 and a coefficient of determination (R2) of 0.8537. Feature correlation analysis further indicates a strong positive correlation between study time and exam scores (r = 0.7768), while the correlations of historical grades, attendance rate, and sleep duration decrease sequentially. This study not only provides methodological references for educational data mining but also offers empirical support for precision intervention strategies in educational practice.
文章引用:占兆满, 敖子杰, 李显, 李鲁英. 基于机器学习的学生成绩预测模型的比较研究[J]. 教育进展, 2026, 16(1): 1159-1169. https://doi.org/10.12677/ae.2026.161156

参考文献

[1] Romero, C. and Ventura, S. (2024) Educational Data Mining and Learning Analytics: An Updated Survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 14, e1518.
[2] Baker, R.S. and Inventado, P.S. (2024) Foundations and Frontiers in Educational Data Mining: A Decade in Review. Proceedings of the 14th International Conference on Learning Analytics & Knowledge, Kyoto, 18-22 March 2024, 120-129.
[3] Alam, A. and Mohanty, A. (2023) Predicting Student Performance Using Educational Data Mining: A Systematic Literature Review. Journal of Educational Computing Research, 61, 891-923.
[4] Smith, J., Chen, L. and Wang, H. (2024) A Comparative Analysis of Machine Learning Models for Predicting At-Risk Students in Higher Education. Computers & Education, 215, Article 105000.
[5] Kumar, A. and Zhang, Y. (2025) Explainable AI for Student Stress Prediction: Identifying Key Risk Factors from Survey Data. IEEE Transactions on Learning Technologies, 18, 45-58.
[6] Li, X. and Garcia, F. (2025) A Hybrid Deep Learning and Machine Learning Model for Predicting Student Academic Performance in Regular Basic Education. Expert Systems with Applications, 259, Article 124123.
[7] International Conference on Education Technology and Computers (2023) A Bibliometric Analysis of Machine Learning in Education: Research Themes and Emerging Trends. Proceedings of the 2023 15th International Conference on Education Technology and Computers, Barcelona, 26-28 September 2023, 334-341.
[8] Khine, M.S. (2024) Machine Learning in Educational Sciences: Predictions and Applications. Springer. [Google Scholar] [CrossRef