基于本证间隙增大的K邻域加权谱聚类算法
K Neighbor Relationship for Spectral Clustering with Lager Eigengap
摘要: 聚类是近些年来无监督学习的热门方法之一。而谱聚类又是聚类方法研究中的热点,并且通常表现出良好的聚类性能。谱聚类中的一个关键步骤是构建相似性矩阵。然而,一些传统的算法模型不能很好地考虑数据集的分布结构,很难真正的反映数据点之间的相似性。针对以上的问题,受密度聚类的启发,我们通过找到数据之间的最近邻关系优化相似矩阵,不仅能更好的反映数据之间真实的邻域关系,并且得到了更好的聚类效果。本文不仅提出了一种新的相似矩阵构造方法,并且证明了由本方法得到结果的拉普拉斯矩阵的本证间隙变得更大。最后,通过实验表明该方法的优越性及本证间隙的增大。
Abstract: Recently, clustering is one of the hot method in unsupervised learning. Spectral clustering is a hot in clustering method and often shows good clustering performance. One crucial step of spectral clustering is constructing a similarity matrix. However, the traditional model cannot consider the distribution structure of the data set well, and it is difficult to truly reflect the similarity between data points. Inspired by density clustering to find the nearest neighbor relationship between data and obtain the optimized similarity matrix. In this paper, we propose a novel construction method of similarity matrix that use a graph with kneighbor relationship (KNRS), considering the kneighborhood distribution of data this model, as a result, the eigengap becomes lager. We have also made comparison with some methods on some common datasets, the experiments show the superiority of our model in the benchmark data sets.
文章引用:田文敏, 王晅, 田春霖. 基于本证间隙增大的K邻域加权谱聚类算法[J]. 计算机科学与应用, 2020, 10(2): 289-302. https://doi.org/10.12677/CSA.2020.102030

参考文献

[1] Jain, A.K. (2008) Data Clustering: 50 Years beyond K-Means.
[2] Park, H.S. and Jun, C.H. (2009) A Simple and Fast Algorithm for k-Medoids Clustering. Expert Systems with Applications, 36, 3336-3341. [Google Scholar] [CrossRef
[3] Guha, S., Rastogi, R. and Shim, K. (1998) Cure: An Efficient Clustering Algorithm for Large Databases. Information Systems, 26, 35-58. [Google Scholar] [CrossRef
[4] Bezdek, J.C., Ehrlich, R. and Full, W. (1984) FCM: The Fuzzy C-Means Clustering Algorithm. Computers & Geosciences, 10, 191-203. [Google Scholar] [CrossRef
[5] Ester, M., Kriegel, H.P., Sander, J., et al. (1996) A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proceedings of International Conference on Knowledge Discovery & Data Mining, Vol. 96, 226-231.
[6] Luxburg, U.V. (2007) A Tutorial on Spectral Clustering. Statistics & Computing, 17, 395-416. [Google Scholar] [CrossRef
[7] Schlkopf, B., Smola, A. and Mller, K. (1998) Nonlinear Component Analysis as a Kernel Eigen Value Problem. Neural Computation, 10, 1299-1319.
[8] Xu, D. and Tian, Y. (2015) A Comprehensive Survey of Clustering Algorithms. Annals of Data Science, 2, 165-193. [Google Scholar] [CrossRef
[9] Donath, W.E. and Hoffman, A.J. (1973) Lower Bounds for the Partitioning of Graphs. IBM Journal of Research & Development, 17, 420-425. [Google Scholar] [CrossRef
[10] Fiedler, M. (1976) Algebraic Connectivity of Graphs. Czechoslovak Mathematical Journal, 23, 298-305.
[11] Hagen, L. and Kahng, A.B. (2002) New Spectral Methods for Ratio Cut Partitioning and Clustering. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 11, 1074-1085. [Google Scholar] [CrossRef
[12] Shi, J. and Malik, J. (2000) Normalized Cuts and Image Segmentation. Departmental Papers (CIS), 107.
[13] Ng, A.Y., Jordan, M.I. and Weiss, Y. (2002) On Spectral Clustering: Analysis and an Algorithm. In: Advances in Neural Information Processing Systems, Springer, The Netherlands, 849-856.
[14] Feng, J., Lin, Z., Xu, H. and Yan, S. (2014) Robust Subspace Segmentation with Block-Diagonal Prior. 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, 23-28 June 2014, 3818-3825. [Google Scholar] [CrossRef
[15] Xia, T., Cao, J., Zhang, Y.-D. and Li, J.-T. (2009) On Defining Affinity Graph for Spectral Clustering through Ranking on Manifolds. [Google Scholar] [CrossRef
[16] Zhang, X., Li, J. and Yu, H. (2011) Local Density Adaptive Similarity Measurement for Spectral Clustering. Pattern Recognition Letters, 32, 352-358. [Google Scholar] [CrossRef
[17] 孙继广. 矩阵扰动分析[M]. 北京: 科学出版社, 2001: 146-160.
[18] Davis, C. and Kahan, W.M. (1970) The Rotation of Eigenvectors by a Perturbation. SIAM Journal on Numerical Analysis, 7, 1-46. [Google Scholar] [CrossRef
[19] Perona, P. and Freeman, W. (1998) A Factorization Approach to Grouping. In: European Conference on Computer Vision, Springer, Netherlands, 655-670. [Google Scholar] [CrossRef
[20] Scott, G.L. and Longuet-Higgins, H.C. (1990) Feature Grouping by ‘Relocalisation’ of Eigenvectors of the Proximity Matrix. BMVC, 1-6. [Google Scholar] [CrossRef
[21] Ding, C.H., He, X., Zha, H., Gu, M. and Simon, H.D. (2001) A Min-Max Cut Algorithm for Graphpartitioning and Data Clustering. Proceedings 2001 IEEE International Conference on Data Mining, San Jose, CA, 29 November-2 December 2001, 107-114.
[22] 王丽. 图论在算法设计中的应用[D]: [硕士学位论文]. 西安: 西安电子科技大学, 2010.
[23] Malik, J., Belongie, S., Leung, T. and Shi, J. (2001) Contour and Texture Analysis for Image Segmentation. International Journal of Computer Vision, 43, 7-27.
[24] Meila, M. and Shi, J. (2001) Learning Segmentation by Random Walks. In: Advances in Neural Information Processing Systems, Springer, The Netherlands, 873-879.
[25] Liu, R., Lin, Z., and Su, Z. (2014) Learning Markov Random Walks for Robust Subspace Clustering and Estimation. Neural Networks, 59, 1-15. [Google Scholar] [CrossRef] [PubMed]
[26] Li, X.-Y. and Guo, L.-J. (2012) Constructing Affinity Matrix in Spectral Clustering Based on Neighbor Propagation. Neurocomputing, 97, 125-130. [Google Scholar] [CrossRef
[27] Li, Q., Ren, Y., Li, L. and Liu, W. (2016) Fuzzy Based Affinity Learning for Spectral Clustering. Pattern Recognition, 60, 531-542. [Google Scholar] [CrossRef
[28] 田铮, 李小斌, 句彦伟. 谱聚类的扰动分析[J]. 中国科学, 2007, 37(4): 527-543.
[29] 孔万增, 孙志海, 杨灿. 基于本征间隙与正交特征向量的自动谱聚类[J]. 电子学报, 2010, 38(8): 1980-1985.
[30] Bhatia, R. (2013) Matrix Analysis. Springer Science & Business Media, New York.
[31] He, X., Cai, D. and Niyogi, P. (2006) Laplacian Score for Feature Selection. In: Advances in Neural Information Processing Systems, Springer, The Netherlands, 507-514.
[32] Xiang, T. and Gong, S. (2008) Spectral Clustering with Eigenvector Selection. Pattern Recognition, 41, 1012-1029. [Google Scholar] [CrossRef
[33] Asuncion, A. and Newman, D. (2007) UCI Machine Learning Repository.
[34] Strehl, A. and Ghosh, J. (2002) Cluster Ensembles—A Knowledge Reuse Framework for Combining Multiple Partitions. Journal of Machine Learning Research, 3, 583-617.
[35] Jin, R., Kang, F. and Ding, C.H. (2006) A Probabilistic Approach for Optimizing Spectral Clustering. In: Advances in Neural Information Processing Systems, Springer, The Netherlands, 571-578.