# 基于经验累积分布的正态和均匀混合分布参数估计The Parameter Estimation of the Mixture of Normal and Uniform Distribution Based on the Empirical Cumulative Distribution Function

• 全文下载: PDF(1319KB)    PP.237-245   DOI: 10.12677/SA.2016.53024
• 下载量: 1,088  浏览量: 1,699

The normal mixture model is easily influenced by the outlier, and the maximum likelihood esti-mation of parameters is not robust estimation. Fraley and Raftery propose a normal model with the addition of a uniform distribution that is regarded as the outlier’s distribution. It fits the ob-servation data accurately. The maximum likelihood function is unbounded when the two parame-ters are near infinitely, because of the probability density function of the uniform distribution. It is impracticable for using the EM algorithm directly. We can specify the parameters of uniform dis-tribution with two different points in observation data which are fixed in the iteration. Then the parameters are specified by the estimation values whose maximum likelihood function is maximum. Coretto and Henning propose the gridding method, but this method also has large amount of calculation and low efficiency. Based on above, we propose a new method based on empirical cu-mulative distribution function for the general situation parameter estimation of the mixture of normal and uniform distribution, first estimating the parameter of the uniform distribution, second estimating the mixing proportion and the parameter of the normal distribution. We can know from the numerical simulation that our method has the advantages in high efficiency, high estimation precision, less amount of calculation and easy implementation.

 [1] McLachlan, G. and Peel, D. (2004) Finite Mixture Models. John Wiley & Sons, New York, 11-14. [2] 谭鲜明. 有限正态混合模型的参数估计及应用[D]: [博士学位论文]. 天津: 南开大学, 2002. [3] Coretto, P. and Hennig, C. (2010) A Simulation Study to Compare Robust Clustering Methods Based on Mixtures. Advances in Data Analysis and Classification, 4, 111-135. http://dx.doi.org/10.1007/s11634-010-0065-4 [4] Fraley, C. and Raftery, A.E. (1998) How Many Clusters? Which Clustering Method? Answers via Model-Based Cluster Analysis. The Computer Journal, 41, 578-588. http://dx.doi.org/10.1093/comjnl/41.8.578 [5] Dean, N. and Raftery, A.E. (2005) Normal Uniform Mixture Differential Gene Expression Detection for cDNA Microarrays. BMC Bioinformatics, 6, 1. http://dx.doi.org/10.1186/1471-2105-6-173 [6] Coretto, P. (2008) The Noise Component in Model-Based Clustering. Ph.D. Thesis, University of London, London. [7] Coretto, P. and Hen-nig, C. (2011) Maximum Likelihood Estimation of Heterogeneous Mixtures of Gaussian and Uniform Distributions. Journal of Statis-tical Planning and Inference, 141, 462-473. http://dx.doi.org/10.1016/j.jspi.2010.06.024 [8] Dennis Jr., J.E. (1981) Algorithms for Nonlinear Fitting. Cambridge University Press, England. [9] Rice, J. (2006) Mathematical Statistics and Data Analysis. Nelson Education, Australia, 378-380. [10] 王豹. 浅谈经验分布函数的收敛性[J]. 徐州教育学院学报, 2008, 23(3): 80-81. [11] 茆诗松, 王静龙, 濮晓龙. 高等数理统计[M]. 第二版. 北京: 高等教育出版社, 2006: 37-43. [12] Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977) Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B (Methodological), 39, 1-38. [13] McLachlan, G. and Krishnan, T. (2007) The EM Algorithm and Extensions. 2nd Edition, John Wiley & Sons, New York.