SA  >> Vol. 5 No. 3 (September 2016)

    基于经验累积分布的正态和均匀混合分布参数估计
    The Parameter Estimation of the Mixture of Normal and Uniform Distribution Based on the Empirical Cumulative Distribution Function

  • 全文下载: PDF(1319KB) HTML   XML   PP.237-245   DOI: 10.12677/SA.2016.53024  
  • 下载量: 968  浏览量: 1,540  

作者:  

王小英,陈常龙,李迎华:华北电力大学数理学院,北京

关键词:
经验累积分布函数EM算法混合分布Empirical Cumulative Distribution Function EM Algorithm Mixture Model

摘要:

混合正态分布模型易受离群点的影响,其参数的极大似然估计不是稳健估计。Fraley和Raftery在混合正态分布中添加一个均匀分布作为离群点的分布,能够准确的拟合观测数据,但是由于均匀分布概率密度函数的特殊性,即当两个参数充分接近时似然函数无界,因此直接利用EM算法进行迭代是行不通的。一般直接指定均匀分布的参数初始值为观测值中任意两个不同的数据点,在所有结果中选取最大似然函数值所对应的参数作为最终的参数估计值,尽管Coretto和Hennig提出网格化思想但是这种方法仍运算量大,效率低。针对一般情形的正态和均匀混合分布参数估计,本文提出了一种基于观测数据的经验累积分布函数的方法,直接估计均匀分布的参数,再估计混合比例和正态分布参数。数据模拟表明该方法具有效率高、计算量小、估计精度高且易于实现的优点。

The normal mixture model is easily influenced by the outlier, and the maximum likelihood esti-mation of parameters is not robust estimation. Fraley and Raftery propose a normal model with the addition of a uniform distribution that is regarded as the outlier’s distribution. It fits the ob-servation data accurately. The maximum likelihood function is unbounded when the two parame-ters are near infinitely, because of the probability density function of the uniform distribution. It is impracticable for using the EM algorithm directly. We can specify the parameters of uniform dis-tribution with two different points in observation data which are fixed in the iteration. Then the parameters are specified by the estimation values whose maximum likelihood function is maximum. Coretto and Henning propose the gridding method, but this method also has large amount of calculation and low efficiency. Based on above, we propose a new method based on empirical cu-mulative distribution function for the general situation parameter estimation of the mixture of normal and uniform distribution, first estimating the parameter of the uniform distribution, second estimating the mixing proportion and the parameter of the normal distribution. We can know from the numerical simulation that our method has the advantages in high efficiency, high estimation precision, less amount of calculation and easy implementation.

文章引用:
王小英, 陈常龙, 李迎华. 基于经验累积分布的正态和均匀混合分布参数估计[J]. 统计学与应用, 2016, 5(3): 237-245. http://dx.doi.org/10.12677/SA.2016.53024

参考文献

[1] McLachlan, G. and Peel, D. (2004) Finite Mixture Models. John Wiley & Sons, New York, 11-14.
[2] 谭鲜明. 有限正态混合模型的参数估计及应用[D]: [博士学位论文]. 天津: 南开大学, 2002.
[3] Coretto, P. and Hennig, C. (2010) A Simulation Study to Compare Robust Clustering Methods Based on Mixtures. Advances in Data Analysis and Classification, 4, 111-135.
http://dx.doi.org/10.1007/s11634-010-0065-4
[4] Fraley, C. and Raftery, A.E. (1998) How Many Clusters? Which Clustering Method? Answers via Model-Based Cluster Analysis. The Computer Journal, 41, 578-588.
http://dx.doi.org/10.1093/comjnl/41.8.578
[5] Dean, N. and Raftery, A.E. (2005) Normal Uniform Mixture Differential Gene Expression Detection for cDNA Microarrays. BMC Bioinformatics, 6, 1.
http://dx.doi.org/10.1186/1471-2105-6-173
[6] Coretto, P. (2008) The Noise Component in Model-Based Clustering. Ph.D. Thesis, University of London, London.
[7] Coretto, P. and Hen-nig, C. (2011) Maximum Likelihood Estimation of Heterogeneous Mixtures of Gaussian and Uniform Distributions. Journal of Statis-tical Planning and Inference, 141, 462-473.
http://dx.doi.org/10.1016/j.jspi.2010.06.024
[8] Dennis Jr., J.E. (1981) Algorithms for Nonlinear Fitting. Cambridge University Press, England.
[9] Rice, J. (2006) Mathematical Statistics and Data Analysis. Nelson Education, Australia, 378-380.
[10] 王豹. 浅谈经验分布函数的收敛性[J]. 徐州教育学院学报, 2008, 23(3): 80-81.
[11] 茆诗松, 王静龙, 濮晓龙. 高等数理统计[M]. 第二版. 北京: 高等教育出版社, 2006: 37-43.
[12] Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977) Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B (Methodological), 39, 1-38.
[13] McLachlan, G. and Krishnan, T. (2007) The EM Algorithm and Extensions. 2nd Edition, John Wiley & Sons, New York.