基于线性回归的学生学业表现分析与研究
Analysis and Research on Students’ Academic Performance Based on Linear Regression
DOI: 10.12677/isl.2026.101013, PDF,    科研立项经费支持
作者: 李梓豪, 李子乐, 程文斌, 李鲁英*:武汉设计工程学院智能工程学院,湖北 武汉
关键词: 线性回归学生学业表现预测模型特征工程Ridge回归Linear Regression Student Academic Performance Predictive Model Feature Engineering Ridge Regression
摘要: 本研究旨在通过线性回归模型分析和预测学生学业表现,以弥补传统教育评估方法的不足。选取BI挪威商学院70名学生作为样本,收集包括年龄、入学考试成绩、学习时间等数据,并随机分为模型构建组(A组,n = 35)和验证组(B组,n = 35)。数据预处理包括清洗、缺失值均值填充和异常值保留;特征工程涉及编码、标准化、SelectKBest筛选。采用Ridge回归构建Python和数据库(DB)课程成绩预测模型。结果显示,Python模型R2 = 0.823,MSE = 42.36;DB模型R2 = 0.796,MSE = 48.72。入学考试成绩和学习时间为主要正向影响因素,年龄呈弱负相关。模型验证通过配对t检验(p > 0.05),证实泛化能力良好。研究表明,学业表现受多因素综合作用,该模型可为个性化教学和早期干预提供数据支持。局限性包括样本量小和特征维度有限,未来可扩展样本并引入动态数据以提升精度。
Abstract: This study aims to analyze and predict students’ academic performance through linear regression models, thereby addressing the deficiencies inherent in traditional educational assessment methods. A sample comprising 70 students from BI Norwegian Business School was selected, with data collected on variables such as age, entrance examination scores, study time, and others. The sample was randomly partitioned into a model construction group (Group A, n = 35) and a validation group (Group B, n = 35). Data preprocessing encompassed cleaning, mean imputation for missing values, and retention of outliers. Feature engineering involved encoding, standardization, SelectKBest feature selection, Ridge regression was utilized to develop predictive models for grades in Python and Database (DB) courses. The results revealed that the Python model attained an R2 of 0.823 and a mean squared error (MSE) of 42.36, while the DB model achieved an R2 of 0.796 and an MSE of 48.72. Entrance examination scores and study time were identified as primary positive predictors, with age demonstrating a weak negative correlation. Model validation, performed via paired t-tests (p > 0.05), substantiated the models’ strong generalization capabilities. The findings indicate that academic performance is shaped by a multifaceted interplay of factors, and this modeling approach can furnish data-driven support for personalized instruction and early interventions. Limitations include the modest sample size and constrained feature dimensions; prospective investigations may expand the sample and integrate dynamic data to augment predictive precision.
文章引用:李梓豪, 李子乐, 程文斌, 李鲁英. 基于线性回归的学生学业表现分析与研究[J]. 交叉科学快报, 2026, 10(1): 94-102. https://doi.org/10.12677/isl.2026.101013

参考文献

[1] 赵予绮, 韩笑旭. 大数据分析中的成绩预测模型构建[J]. 信息记录材料, 2025, 26(10): 148-150.
[2] 李芸嘉, 丁琪. 基于机器学习的学生成绩预测研究[J]. 物流科技, 2025, 48(7): 178-180.
[3] 郑天赐. 基于线上课程学习数据的学生成绩分类预测研究与应用[D]: [硕士学位论文]. 重庆: 西南大学, 2024.
[4] 杨世璐. 机器学习在《大学物理》教学评价和成绩预测中的应用研究[D]: [硕士学位论文]. 成都: 电子科技大学, 2025.
[5] 刘晓雲, 刘鸿雁, 李劲松, 等. 基于特征选择的学生成绩预测方法研究[J]. 信息技术, 2023(10): 17-22.
[6] 王冠帮, 刘鸿雁, 李劲松, 等. 基于K-Means的学生成绩预测方法研究[J]. 信息技术, 2023, 47(2): 1-6.
[7] 宋晓磊, 祁鑫, 王彪. 基于机器学习的学生成绩信息化预测研究[J]. 电脑编程技巧与维护, 2020(4): 110-112.
[8] 姚奇峰. 基于数据挖掘技术的学生成绩影响因素分析及预测模型[D]: [硕士学位论文]. 天津: 天津工业大学, 2020.