一种新的谱聚类算法及其仿真
A New Spectral Clustering Algorithm and Its Simulation
DOI: 10.12677/CSA.2017.711125, PDF, HTML, XML, 下载: 1,402  浏览: 4,526 
作者: 李 娇, 王 晅*:陕西师范大学物理学与信息技术学院,陕西 西安
关键词: 谱图理论谱聚类相似性矩阵k-MeansSpectral Graph Theory Spectral Clustering Similarity Matrix k-Means
摘要: 针对传统谱聚类基于欧氏距离度量样本之间的相似性,不能反映样本的概率分布特性,特别是具有多峰分布的样本聚类,欧氏距离具有较大的偏差,另外在传统谱聚类中广泛使用的k-means算法由于初始中心的随机性设置,经常会产生不稳定的聚类结果。本文针对相似性度量问题提出了一种能够反映样本概率分布特性的相似性度量方法,并在此基础上对谱聚类中的k-means算法进行了改进,给出了一种对初始中心进行自适应设置的方法。实验结果表明,该算法相比传统的谱聚类在人工数据集与UCI真实数据集上具有更好的聚类效果。
Abstract: The similarity between the samples based on Euclidean distance measurement can not reflect the probability distribution characteristics of the samples, especially the sample clustering with multi-peak distribution, and the Euclidean distance has a large deviation. In addition, the k-means algorithm widely used in the class often produces unstable clustering results due to the randomness of the initial center. In this paper, a similarity measure method which can reflect the probability distribution of the sample is proposed for the similarity measure. On this basis, the k-means algorithm in spectral clustering is improved, and a method is proposed for the initial center adaptive setting method. The experimental results show that the proposed algorithm has better clustering effect than the traditional spectral clustering in the artificial data set and the UCI real data set.
文章引用:李娇, 王晅. 一种新的谱聚类算法及其仿真[J]. 计算机科学与应用, 2017, 7(11): 1107-1117. https://doi.org/10.12677/CSA.2017.711125

参考文献

[1] Jain, A., Murty, M. and Flynn, P. (1999) Data Clustering: A Review. ACM Computing Surveys, 31, 264-323.
https://doi.org/10.1145/331499.331504
[2] 王晅, 马建峰. 数字图像分析与模式识别[M]. 北京: 科学出版社, 2011: 228-229.
[3] Huang, Z. (1998) Extensions to the k-Means Algorithm for Clustering Large Datasets with Categorical Values. Data Mining and Knowledge Discovery, 283-304.
[4] Wu, X. and Zhang, S. (2003) Synthesizing High-Frequency Rules from Different Data Sources. IEEE Transactions on Knowledge and Data Engineering, 15, 353-367.
[5] Wu, X., Zhang, C. and Zhang, S. (2005) Database Classification for Multi-Database Mining. Information Systems, 30, 71-88.
https://doi.org/10.1016/j.is.2003.10.001
[6] Luxburg, U.V. (2007) A Tutorial on Spectral. Statistics and Computing, 17, 395-416.
https://doi.org/10.1007/s11222-007-9033-z
[7] Yan, D. (2009) Fast Approximate Spectral Clustering. UC Berkeley, 907-915.
https://doi.org/10.1145/1557019.1557118
[8] 蔡晓妍, 戴冠中, 等. 谱聚类算法综述[J]. 计算机科学, 2008, 35(7): 14-18.
[9] Wu, Z. and Leahy, R. (1993) An Optimal Graph Theoretic Approach to Data Clustering: Theory and Its Application to Image Segmentation. IEEE Transactions on PAMI, 15, 1101-1113.
[10] Chang, H. and Yeung, D.Y. (2011) Robust Path-Based Spectral Clustering. Applied and Computational Harmonic Analysis, 30, 319-336.
[11] Chi, Y. and Song, X.D. (2009) On Evolutionary Spectral Clustering. ACM Transactions on Knowledge Discovery from Data, 3, Article No. 17.
https://doi.org/10.1145/1631162.1631165
[12] Tulin, I. (2015) A Parameter-Free Similarity Graph for Spectral Clustering. Expert Systems with Applications, 42, 9489-9498.
https://doi.org/10.1016/j.eswa.2015.07.074
[13] Mario, B. (2015) A Density-Based Similarity Matrix Construction for Spectral Clustering. Neurocomputing, 151, 835-844.
https://doi.org/10.1016/j.neucom.2014.10.012
[14] Yan, J., Cheng, D.B., Zong, M. and Deng, Z.Y. (2014) Improved Spectral Clustering Algorithm Based on Similarity Measure. In: Luo, X., Yu, J.X. and Li, Z., Eds., Advanced Data Mining and Applications. ADMA 2014. Lecture Notes in Computer Science, Vol. 8933, Springer, Cham, 641-654.
https://doi.org/10.1007/978-3-319-14717-8_50
[15] Anil, K.J. (2010) Data Clustering: 50 Years beyond K-Means. Pattern Recognition Letters, 31, 651-666.
https://doi.org/10.1016/j.patrec.2009.09.011
[16] Ng, A.Y., Jordan, M.I. and Weiss, Y. (2002) On Spectral Clustering: Analysis and an Algorithm. In: Advances in Neural Information Processing Systems, MIT Press, Cambridge, MA, 894-856.
[17] Yang, P., Zhu, Q.S. and Huang, B. (2011) Spectral Clustering with Density Sensitive Similarity Function. Knowledge-Based Systems, 24, 621-628.
https://doi.org/10.1016/j.knosys.2011.01.009
[18] Zhang, X., Jiao, L., Liu, F., et al. (2008) Spectral Clustering Ensemble Applied to SAR Image Segmentation. IEEE Transactions on Geosciences and Remote Sensing, 46, 2126-2136.
https://doi.org/10.1109/TGRS.2008.918647
[19] Cai, X.Y., Dai, G.Z. and Yang, L.B. (2009) A New Self-Adaptive Clustering Algorithm. China Science and Technology Press, Beijing, 329-334.
[20] Fiedler, M. (1975) A Property of Eigenvectors of Non-Negative Symmetric Matrices and Its Application to Graph Theory. Czechoslovak Mathematical Journal, 25, 619-633.