基于K阈值的谱聚类算法
Spectral Clustering Algorithm Based on K-Threshold
摘要: 在传统的NJW谱聚类算法中,样本间的相似性度量由欧式距离和高斯核函数公共决定;而欧式距离是基于整个样本集合的,忽略了局部样本之间的关连;且高斯核函数的值也是通过多次实验获得,容易陷入局部最优使得聚类结果大受影响。针对以上问题,提出了一种新的谱聚类算法,算法引入K邻域、最短路径以及标准差的概念;重建样本间的相似性度量的参数,并且用样本标准差重新构建样本间的联系进而修正高斯核函数的值,使其更接近样本本身的特性。通过实验分析,该算法不但继承了谱聚类算法的收敛于全局的特性,适用于各种形状的数据,而且还克服了欧式距离和高斯核函数造成的相似性度量的局限,提高了算法准确率,增强聚类结果的可信度。
Abstract: In the traditional NJW spectral clustering algorithm, similarity measurement between samples is determined by Euclidean distance and Gaussian kernel function. Euclidean distance is based on the whole sample set, ignoring the correlation between local samples. Moreover, the value of Gaussian kernel function is also obtained through multiple experiments, which is easy to fall into local optimization and greatly affects the clustering results. Aiming at the above problems, a new spectral clustering algorithm is proposed, which introduces the concepts of K neighborhood, shortest path and standard deviation. The parameters of similarity measurement between samples are reconstructed, and the relationship between samples is reconstructed with sample standard deviation, so as to modify the value of Gaussian kernel function and make it closer to the characteristics of samples. Through the experimental analysis, the algorithm not only inherits the convergence of spectral clustering algorithm to the whole world, and is applicable to data of various shapes, but also overcomes the limitations of similarity measurement caused by Euclidean distance and Gaussian kernel function, improves the accuracy of the algorithm, and enhances the reliability of clustering results.
文章引用:郭梅梅, 王晅. 基于K阈值的谱聚类算法[J]. 计算机科学与应用, 2019, 9(5): 969-984. https://doi.org/10.12677/CSA.2019.95110

参考文献

[1] Jain, A.K., Murty, M.N. and Flynn, P.J. (1999) Data Clustering: A Review. ACM Computing Surveys, 31, 264-323. [Google Scholar] [CrossRef
[2] 王千, 王成, 冯振元, 等. K-means聚类算法研究综述[J]. 电子设计工程, 2012(4).
[3] 孙伟. 谱聚类算法的改进研究[D]: [硕士学位论文]. 兰州: 兰州商学院, 2012.
[4] Cai, D., He, X., Han, J. and Member, S. (2005) Document Clustering Using Locality Preserving Indexing. IEEE Transactions on Knowledge and Data Engineer-ing, 17, 1624-1637. [Google Scholar] [CrossRef
[5] 蔡晓研, 戴冠中, 杨黎斌. 谱聚类算法综述[J]. 计算机科学, 2008, 35(7): 14-18.
[6] von Luxburg, U. (2007) A Tutorial on Spectral Clustering. Statistics and Computing, 17, 395-416. [Google Scholar] [CrossRef
[7] Filipponea, M., Camastrab, F., Masullia, F. and Rovetta, S. (2008) A Survey of Kernel and Spectral Methods for Clustering. Pattern Recognition, 41, 176-190. [Google Scholar] [CrossRef
[8] Nascimento, M.C.V. and de Carvalho, A.C.P.L.F. (2011) Spectral Methods for Graph Clustering—A Survey. European Journal of Operational Research, 211, 221-231. [Google Scholar] [CrossRef
[9] Chen, W. and Feng, G. (2012) Spectral Clustering: A Semi-Supervised Approach. Neurocomputing, 77, 229-242. [Google Scholar] [CrossRef
[10] Ozertem, U., Erdogmus, D. and Jenssen, R. (2008) Mean Shift Spectral Clus-tering. Pattern Recognition, 41, 1924-1938. [Google Scholar] [CrossRef
[11] Donath, W.E. and Hoffman, A.J. (1973) Lower Bounds for the Partitioning of Graphs. IBM Journal of Research and Development, 17, 420-425. [Google Scholar] [CrossRef
[12] Fiedler, M. (1973) Algebraic Connectivity of Graphs. Czechoslovak Mathematical Journal, 23, 298-305.
[13] Li, X.-Y. and Guo, L.-J. (2012) Con-structing Affinity Matrix in Spectral Clustering Based on Neighbor Propagation. Neurocomputing, 97, 125-130. [Google Scholar] [CrossRef
[14] Nataliani, Y. and Yang, M.-S. (2017) Powered Gaussian Kernel Spectral Clustering. Neural Computing and Applications, 31, 557-572. [Google Scholar] [CrossRef
[15] Gong, Y.C. and Chen, C. (2008) Locality Spectral Clutering. Proceeding of the 21st Australasian Joint Conference on Artificial Intelligence: Advance in Artificial Intelligence, Springer-Verlag, 348-354. [Google Scholar] [CrossRef
[16] Shi, J. and Malik, J. (2000) Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 888-905. [Google Scholar] [CrossRef
[17] Dhillon, I.S., Guan, Y. and Kulis, B. (2004) Kernel K-Means: Spectral Clustering and Normalized Cuts. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, 551-556. [Google Scholar] [CrossRef
[18] Ding, C., He, X., Zha, H.-Y., Gu, M. and Simon, H.D. (2001) A Min-Max Cut Algorithm for Graph Partitioning and Data Clustering. Proceedings of the 2001 IEEE International Conference on Data Mining, San Jose, CA, 29 November-2 December 2001, 107-114. [Google Scholar] [CrossRef
[19] Hagen, L. and Kahng, A.B. (1992) New Spectral Methods for Ratio Cut Partitioning and Clustering. IEEE Transactions on Computer-Aided Design of Inte-grated Circuits and Systems, 11, 1074-1085. [Google Scholar] [CrossRef
[20] McQueen, J. (1967) Some Methods for Classification and Analysis of Multivariate Observations. Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, 1, 281-297.
[21] Ng, A.Y., Jordan, M.I. and Weiss, Y. (2002) On Spectral Clustering: Analysis and an Algorithm. In: Dietterich, T.G., Becker, S. and Ghahramani, Z., Eds., Proceedings of the 14th Advances in Neural Information Processing Systems, MIT Press, Cambridge, MA, 849-856.