支持向量和多中心点:非线性聚类的两大方法
Support Vector and Multi-Exemplar: Two Main Approaches for Nonlinear Clustering
DOI: 10.12677/HJDM.2013.34008, PDF, HTML,  被引量 下载: 3,720  浏览: 13,836 
作者: 王昌栋:中山大学移动信息工程学院;赖剑煌:中山大学信息科学与技术学院
关键词: 非线性聚类核聚类多中心点聚类PSVCMEAPNonlinear Clustering; Support Vector Clustering; Multi-Exemplar Clustering; PSVC; MEAP
摘要: 作为数据挖掘的基础方法之一,数据聚类被广泛应用各个不同领域,例如计算机科学、医学、社会科学和经济学等。根据类的样本点的分布,数据聚类问题通常可以划分成线性可分聚类和非线性可分聚类。由于现实世界的数据分布流形的复杂性,非线性聚类是最流行和最被广泛研究的聚类问题之一。本文首先从四个角度对非线性聚类的近期工作做一个简要的综述,包括基于核的聚类算法、多中心点聚类算法、基于图的聚类算法以及基于支持向量的聚类算法。接着,我们将特别地介绍我们在非线性聚类研究方面的两个主要工作,分别是位置正则化的支持向量聚类(PSVC)以及多中心点近邻传播算法(MEAP)。我们将介绍这些方法的优势与局限性,同时指出未来的研究方向。
Abstract:  

As a fundamental method for data mining, data clustering has been widely used in various fields such as computer science, medical science, social science and economics. According to the data distribution of clusters, the data clustering problem can be categorized into linearly separable clustering and nonlinearly separable clustering. Due to the complex manifold of the real-world data, nonlinearly separable clustering is one of the most popular and widely studied clustering problems. In this paper, we will first make a brief survey on the recent research works in nonlinear clustering, from four perspectives, namely, kernel-based clustering, multi-exemplar clustering, graph-based clustering and support vector-based clustering. Then, we will particularly introduce our two research works in nonlinear clustering, namely, position regularized support vector clustering (PSVC) and multi-exemplar affinity propagation (MEAP). We will analyze their merits and limitations and point out the future research directions.

文章引用:王昌栋, 赖剑煌. 支持向量和多中心点:非线性聚类的两大方法[J]. 数据挖掘, 2013, 3(4): 41-49. http://dx.doi.org/10.12677/HJDM.2013.34008

参考文献

[1] R. Xu, D. Wunsch II. Survey of clustering algorithms. IEEE Transactions on Neural Networks, 2005, 16(3): 645-678.
[2] 孙吉贵, 刘杰, 赵连宇. 聚类算法研究[J]. 软件学报, 2008, 19(1): 48-61.
[3] B. Scholkopf, A. Smola and K. R. Muller. Nonlinear Component analysis as a kernel eigenvalue problem. Neural Computation, 1998, 10(5): 1299-1319.
[4] 张莉, 周伟达, 焦李成. 核聚类算法[J]. 计算机学报, 2002, 25(6): 587-590.
[5] C. D. Wang, J. H. Lai and J. Y. Zhu. A conscience on-line learn- ing approach for kernel-based clustering. IEEE 10th Interna- tional Conference on Data Mining, Sydney, 13-17 December 2010, 531-540.
[6] M. Liu, X. Jiang and A. C. Kot. A multi-prototype clustering algorithm. Pattern Recognition, 2009, 42(5): 689-698.
[7] C. D. Wang, J. H. Lai, J. Y. Zhu and C. Y. Suen. Multi-exemplar affinity propagation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(9): 2223-2237.
[8] L. Ertoz, M. Steinbach and V. Kumar. Finding clusters of differ- ent sizes, shapes, and densities in noisy, high dimensional data. Proceedings of the Third SIAM International Conference on Data Mining, 2003, 112: 47-58.
[9] C. D. Wang, J. H. Lai and J. Y. Zhu. Graph-based multiprototype competitive learning and its applications. IEEE Transactions on Systems, Man, and Cybernetics—Part C, 2012, 42(6): 934-946.
[10] A. Ben-Hur, D. Horn, H. T. Siegelmann, et al. Support vector clustering. Journal of Machine Learning Research, 2001, 2: 125- 137.
[11] C. D. Wang, J. H. Lai. Position regularized support vector do- main description. Pattern Recognition, 2013, 46(3): 875-884.
[12] J. Shawe-Taylor, N. Cristianini. Kernel methods for pattern ana- lysis. Cambridge: Cambridge University Press, 2004.
[13] J. MacQueen. Some methods for classification and analysis of mul- tivariate observations. Proceedings of the 15th Berkeley Sym- posium on Mathematical Statistics and Probability, 1967, 1: 281- 297.
[14] A. Likas, N. Vlassis and J. J. Verbeek. The global k-means clus- tering algorithm. Pattern Recognition, 2003, 36(2): 451-461.
[15] B. Abolhassani, J. E. Salt and D. E. Dodds. A two-phase genetic k-means algorithm for placement of radioports in cellular net- works. IEEE Transactions on Systems, Man, and Cybernetics— Part B, 2004, 34(1): 533-538.
[16] L. Fei-Fei, P. Perona. A bayesian hierarchical model for learning natural scene categories. Proceedings of CVPR, 2005. 524-531.
[17] M. Zhu, A. M. Martinez. Subclass Discriminant analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(8): 1274-1286.
[18] H. I. Avi-Itzhak, J. A. V. Mieghem and L. Rub. Multiple subclass pattern recognition: A maximin correlation approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1995, 17(4): 418-431.
[19] F. R. Kschischang, B. J Frey and H. A. Loeliger. Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 2001, 47(2): 498-519.
[20] Y. Weiss, W. T. Freeman. On the optimality of solutions of the max-product belief-propagation algorithm in arbitrary graphs. IEEE Transactions on Information Theory, 2001, 47(2): 736- 744.
[21] J. Shi, J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8): 888-905.
[22] D. Yankov, E. Keogh and K. F. Kan. Locally constrained support vector clustering. Proceedings of the 7th International Confer- ence on Data Mining, Omaha, 28-31 October 2007, 715-720.
[23] D. M. Tax, R. P. Duin. Support vector domain description. Pat- tern Recognition Letters, 1999, 20(11-13): 1191-1199.