SA  >> Vol. 6 No. 4 (October 2017)

    不同类型数据下混合模型参数估计效果的对比研究
    Comparative Study on Effects of Parameter Estimation of Mixture Models under Different Types of Data

  • 全文下载: PDF(682KB) HTML   XML   PP.482-491   DOI: 10.12677/SA.2017.64054  
  • 下载量: 338  浏览量: 493   国家自然科学基金支持

作者:  

王小英,李迎华,杨雪梅:华北电力大学数理学院,北京

关键词:
EM算法混合t-分布模型k-Means初始化EM Algorithm Mixture T-Distribution Model K-Means Initialization

摘要:

混合高斯模型在描述数据方面应用较多,但它易受离群点的影响,其参数的极大似然估计不是稳健估计。混合t-分布模型由于其重尾分布的特性,相对于混合高斯分布,在分析重尾数据上更具稳健性。文章首先研究一元混合t-分布模型,利用标准EM算法给出了该模型参数极大似然估计的迭代步骤,并分别在三类模拟数据下与混合高斯模型进行了对比分析,验证了该模型的有效性以及在拟合重尾数据上的优势。算法初始化采用k-means方法。

The normal mixture model has more applications in describing data. But it is easily influenced by the outlier, and the maximum likelihood estimation of parameters is not robust estimation. T-distribution mixture model has better robustness than Gauss mixture model to analyze data with longer time than normal tails because of its heavy-tails. In this paper, we studied a univariate t mixture model primarily. Based on EM algorithm, we derived the iteration steps of maximum li-kelihood estimation of the model’s unknown parameters. Furthermore, we did a comparative analysis by three types of simulated data. Simulation study shows that this model has an advantage in fitting data with longer time than normal tails. The initial value is given by k-means method.

文章引用:
王小英, 李迎华, 杨雪梅. 不同类型数据下混合模型参数估计效果的对比研究[J]. 统计学与应用, 2017, 6(4): 482-491. https://doi.org/10.12677/SA.2017.64054

参考文献

[1] Dempster, P., Laird, N.M. and Rubin, D.B. (1977) Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, 39, 1-38.
[2] McLachlan, G. and Krishnan, T. (2007) The EM Algorithm and Extensions. John Wiley & Sons, New York.
[3] Meng, X.L. and Rubin, D.B. (1993) Maximum Likelihood Estimation via the ECM Algorithm: A General Framework. Biometrika, 80, 267-278.
https://doi.org/10.1093/biomet/80.2.267
[4] Peel, D. and McLachlan, G. (2000) Robust Mixture Modelling Using the t Distribution. Statistics and Computing, 10, 339-348.
https://doi.org/10.1023/A:1008981510081
[5] Liu, C. and Rubin, D.B. (1995) ML Estimation of the t Distribution Using EM and Its Extensions, ECM and ECME. Statistica Sinica, 5, 19-39.
[6] 冉延平. 基于混合模型的聚类算法及其稳健性研究[D]: [硕士学位论文]. 郑州: 中国人民解放军信息工程大学, 2005.
[7] 史鹏飞. 基于改进EM算法的混合模型参数估计及聚类分析[D]: [硕士学位论文]. 西安: 西北大学, 2009.
[8] 杨云飞. 基于混合模型的医学图像分割算法应用研究[D]: [硕士学位论文]. 南京: 东南大学, 2015.
[9] 熊太松. 基于统计混合模型的图像分割方法研究[D]: [博士学位论文]. 成都: 电子科技大学, 2013.
[10] 朱志娥, 吴刘仓, 戴琳. 偏t正态数据下混合线性联合位置与尺度模型的参数估计[J]. 高校应用数学学报, 2016, 31(4): 379-389.
[11] Bishop, C.M. (2006) Pattern Recognition and Machine Learning. Springer, Berlin, 423-435.
[12] Shoham, S. (2002) Robust Clustering by Deterministic Agglomeration EM of Mixtures of Multivariate T-Distributions. Pattern Recognition, 35, 1127-1142.
https://doi.org/10.1016/S0031-3203(01)00080-2
[13] 李航. 统计学习方法[M]. 北京: 清华大学出版社, 2012(3).
[14] Coretto, P. and Hennig, C. (2010) A Simulation Study to Compare Robust Clustering Methods Based on Mixtures. Advances in Data Analysis and Classification, 4, 111-135.
https://doi.org/10.1007/s11634-010-0065-4