机器学习方法和线性随机效应混合模型在纵向数据预测上的对比
Compare Machine Learning Methods and Linear Mixed Models with Random Effects of Longitudinal Data Prediction
DOI: 10.12677/HJDM.2015.53006, PDF, HTML, XML,  被引量 下载: 3,531  浏览: 10,924  国家自然科学基金支持
作者: 李红梅:云南师范大学数学学院,云南 昆明;吴喜之:中国人民大学统计学院,北京
关键词: 线性随机效应混合模型机器学习方法纵向数据交叉验证标准均方误差Linear Mixed Models with Random Effects Machine Learning Method Longitudinal Data Cross-Validation Standard Mean Square Error
摘要: 本文针对牛奶中所含蛋白质的纵向数据,利用R软件,运用机器学习方法中的决策树、boost、bagging、随机森林、神经网络、支持向量机和传统处理纵向数据的线性随机效应混合模型做预测对比。变化训练集并进行八折交叉验证,对得到的标准均方误差分析可知:对于该数据,无论是长期预测(训练集更大)还是短期预测,传统的方法远远不如机器学习方法,机器学习方法有很好的稳健性。
Abstract: This study investigates the longitudinal data of protein in cows by using linear mixed models with random effects and other methods including six machine learning methods (trees, boost, bagging, random forest, neural networks, support vector machines) with R software and makes compassion and prediction for the data. According to the change of the training set and via 8-fold cross va-lidation, it analyzes the mean square error and shows the traditional linear mixed models with random effects method is inferior in general to the machine learning method no matter for the long-term or short-term forecasting. Here long-term forecasting corresponds to the larger size of training sets and smaller size of testing sets in machine learning terminology. Also, machine learning methods are stable.
文章引用:李红梅, 吴喜之. 机器学习方法和线性随机效应混合模型在纵向数据预测上的对比[J]. 数据挖掘, 2015, 5(3): 39-45. http://dx.doi.org/10.12677/HJDM.2015.53006

参考文献

[1] Verbeke, G. and Molenberghs, G. (2000) Liner mixed models for longitudinal data. Springer, New York, 225-225.
[2] Raudenbush, S.W. (2001) Comparing personal trajectories and drawing causal inference from longitudinal data. Annual Review of Psychology, 52, 501-525.
http://dx.doi.org/10.1146/annurev.psych.52.1.501
[3] Collins, L.M. (2006) Analysis of longitudinal data: The integration of theoretical model, temporal design, and statistical model. Annual Review of Psychology, 57, 505-528.
http://dx.doi.org/10.1146/annurev.psych.57.102904.190146
[4] Davdian, M. (2008) Applied longitudinal data analysis. Springer, New York, 9-25.
[5] Izenman, S. (2008) Modern multivariate statistical techniques: Regression, classification, and manifold learning. Springer, New York.
http://dx.doi.org/10.1007/978-0-387-78189-1
[6] 吴喜之 (2012) 复杂数据统计方法——基于R的应用. 人民大学出版社, 北京.
[7] 薛毅, 陈丽萍 (2007) 统计建模与R软件. 清华大学出版社, 北京, 963-974.
[8] James, G., Witten, D., Hastie, T. and Tibshirani, R. (2013) An introduction to statistical learning with applications in R. Springer, New York, Heidelberg, Dordrecht and Lon-don.
[9] Breiman, L. (2001) Random forests. Machine Learning, 45, 5-32.
http://dx.doi.org/10.1023/A:1010933404324
[10] Bühlmann, P. and Hothorn, T. (2007) Boosting algorithms: Regularization, prediction and model fitting (with discussion). Statistical Science, 22, 477-505.
http://dx.doi.org/10.1214/07-STS242
[11] 傅德印 (2013) 应用多元统计. 高等教育出版社, 北京.
[12] Rip-ley, B.D. (1996) Pattern recognition and neural networks. Cambridge University Press, Cambridge.