基于不完全数据缺失值的非参数插补改进
Nonparametric Imputation Improvements under Missing Values for Incomplete Data
DOI: 10.12677/MOS.2023.126516, PDF,   
作者: 汪子同:南京信息工程大学数学与统计学院,江苏 南京
关键词: 不完全数据非参数插补法加权估计最小二乘Incomplete Data Nonparametric Imputation Weighted Estimation Least Squares
摘要: 在数据分析研究中,数据的质量越高,数据集整体越完整,那么得到的研究结果往往越有价值。可是现实中常常面临含有大量不完全数据的数据集,如果直接删除不完全数据进行分析研究就会直接损失大量的样本信息。针对不完全数据的缺失值估计问题,基于非参数插补的思想,本文提出了两种回归函数估计量,给出了两种估计量的推导过程,在模拟研究中验证了在不同数据分布以及数据缺失率下,两个改进的非参数插补法对比其他经典的非参数插补法以及加权估计法在总体均值估计方面具有优势。
Abstract: In data analysis research, the higher the quality of the data and the more complete the overall data set, the more valuable the results are often obtained. However, in reality, we often face datasets containing a large amount of incomplete data, and if the incomplete data is directly deleted for analysis and research, a large amount of sample information will be directly lost. Aiming at the problem of missing value estimation of incomplete data, based on the idea of nonparametric impu-tation, this paper proposes two regression function estimators, gives the derivation process of two estimators, and verifies in simulation studies that the two improved nonparametric imputation methods have advantages over other classical nonparametric imputation methods and weight- ed estimation methods in the estimation of the overall mean under different data distribution and da-ta loss rate.
文章引用:汪子同. 基于不完全数据缺失值的非参数插补改进[J]. 建模与仿真, 2023, 12(6): 5682-5692. https://doi.org/10.12677/MOS.2023.126516

参考文献

[1] Yates, F. (1933) The Analysis of Replicated Experiments When the Field Results Are Incomplete. Empire Journal of Experi-mental Agriculture, 1, 129-142.
[2] Cheng, P.E. and Wei, L.J. (1986) Nonparametric Inference under Ignorable Missing Data Process and Treatment Assignment. International Statistical Symposium, 1, 97-112.
[3] Cheng, P.E. (1984) Strong Con-sistency of Nearest Neighbor Regression Function Estimators. Journal of Multivariate Analysis, 15, 63-72. [Google Scholar] [CrossRef
[4] Horvitz, D.G. and Thompson, D.J. (1952) A Generalization of Sam-pling without Replacement from a Finite Population. Journal of the American Statistical Association, 47, 663-685. [Google Scholar] [CrossRef
[5] Robins, J.M., Rotnitzky, A.G. and Lue, P.Z. (1994) Estimation of Regression Coefficients When Some Regressors Are Not Always Observed. Journal of the American Statistical Association, 89, 846-886. [Google Scholar] [CrossRef
[6] Ning, J.H. and Cheng, P.E. (2012) A Comparison Study of Non-parametric Imputation Methods. Statistics and Computing, 22, 273-285. [Google Scholar] [CrossRef
[7] Ning, J., Liou, M. and Cheng, P.E. (2019) Convex Mixtures Imputation and Applications. StatisticaSinica, 29, 329-351. [Google Scholar] [CrossRef
[8] 祝恒坤, 张海丽. 基于逆概率加权和插补的Mallows模型平均方法[J]. 系统科学与数学, 2022, 42(4): 1032-1059.
[9] 丁先文, 张文, 袁红. 含缺失数据的半参数模型的稳健估计[J]. 统计与决策, 2022, 38(1): 25-28.
[10] 刘莎, 杨有龙. 基于灰色关联分析的类中心缺失值填补方法[J]. 四川大学学报(自然科学版), 2020, 57(5): 871-878.
[11] Rubin, D.B. (1976) Inference and Missing Data. Biometrika, 63, 581-592. [Google Scholar] [CrossRef
[12] Rosenbaum, P.R. and Rubin, D.B. (1983) The Central Role of the Propensity Score in Observational Studies for Causal Effects. Biometrika, 70, 41-55. [Google Scholar] [CrossRef