EM算法对缺失数据极大似然估计的实证研究
An Empirical Study on Maximum Likelihood Estimation of Missing Data by EM Algorithm
DOI: 10.12677/SA.2018.72025, PDF,  被引量   
作者: 黎镭*, 陈蔼祥, 姚赞杰:广东财经大学,统计与数学学院,广东 广州
关键词: EM算法极大似然估计缺失数据随机缺失EM Algorithm Maximum Likelihood Estimation Missing Data Random Deletion
摘要: 用EM方法对缺失数据进行极大似然估计是处理缺失数据的一种基本方法。 本文在介绍基于EM的极大似然估计方法基础上,对一类随机缺失数据集用EM方法进行补齐,结果表明在10%、20%、30%三种不同缺失率下,EM方法进行缺失数据补齐的相对误差均小于0.1,呈现出低缺失率情况下的高准确性的特点。进一步将EM方法应用于对实际问卷调查中的缺失数据进行补齐,对补齐前后调查结果的影响进行了分析。
Abstract: The maximum likelihood estimation of missing data by EM is a basic method to deal with missing data. In this paper based on the maximum likelihood estimation method based on EM, was filled with the EM method for a class of random missing data sets; the results show that in 10%, 20%, 30%, three different loss rate, the relative error of EM method to fill up the missing data is less than 0.1, showing a high accuracy under the condition of low the loss rate. Further, the EM method is applied to the missing data in the actual questionnaire survey, and the influence of the survey results before and after the complement is analyzed.
文章引用:黎镭, 陈蔼祥, 姚赞杰. EM算法对缺失数据极大似然估计的实证研究[J]. 统计学与应用, 2018, 7(2): 210-220. https://doi.org/10.12677/SA.2018.72025

参考文献

[1] Marlin, B.M. and Zemel, R.S. (2009) Collaborative Prediction and Ranking with Non-Random Missing Data. Proceedings of the Third ACM Conference on Recommender Systems, 5-12. [Google Scholar] [CrossRef
[2] Marlin, B.M., Zemel, R.S., Roweis, S. and Slaney, M. (2007) Col-laborative Filtering and the Missing at Random Assumption. UAI.
[3] Marlin, B.M., Zemel, R.S., Roweis, S.T. and Slaney, M. (2011) Recommender Systems: Missing Data and Statistical Model Estimation. IJCAI.
[4] Buhi, E.R., Goodson, P. and Neilands, T.B. (2008) Out of Sight, Not Out of Mind: Strategies for Handling Missing Data. American Journal of Health Behavior, 32, 83-92. [Google Scholar] [CrossRef
[5] Rubin, D.B. (1996) Multiple Imputation after 18+ Years. Journal of the American Statistical Association, 91, 473-489. [Google Scholar] [CrossRef
[6] Dempster, A.P., Laird, N.M. and Rubin, D.B. (1997) Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society Series B Sta-tistical Methodology.
[7] Little, R.J.A. and Rubin, D.B. (2002) Statistical Analysis with Missing Data. Wiley. [Google Scholar] [CrossRef
[8] 李顺静. 基于EM算法的缺失数据的统计分析及应用[D]: [硕士学位论文]. 重庆: 重庆工商大学, 2015.
[9] 谷海彤, 陈邵华, 等. DA多重插补法在电网电能量数据缺失处理中的应用[J]. 广西科技大学学报, 2017(6): 104-106.
[10] 邹薇, 王会进. 基于朴素贝叶斯的EM缺失数据填充算法[J]. 微型机与应用, 2011(16): 75-77.
[11] 吕涛. 市场调查中样本数据缺失值问题研究[J]. 商场现代化, 2014(12): 70-71.
[12] 杨基栋. EM算法理论及其应用[J]. 安庆师范学院学报(自然科学版), 2009, 15(4): 30-35.
[13] 谭宏卫, 曾捷. Logistic回归模型的影响分析[J]. 数理统计与管理, 2013, 32(3): 476-485.
[14] 游晓锋, 丁树良, 刘红云. 缺失数据的估计方法及应用[J]. 江西师范大学学报(自然科学版), 2011, 35(3): 325-330.
[15] 王建军. 影响医患关系和谐的因素及对策研究[J]. 江苏卫生事业管理, 2011, 22(5): 118-120.
[16] 兰烯, 刘国恩, 李林. 医疗机构产权性质对医疗服务质量的影响——基于全国试点城市微观数据的实证分析[J]. 中国经济问题, 2014(2): 67-78.
[17] 陈娜. 加强医德医风建设工作的思考[J]. 管理观察, 2014(7): 185-186.