基于K-means的三支聚类算法
Three-Way Clustering Algorithms Based on Disturbances and K-Means
DOI: 10.12677/AAM.2018.710157, PDF, 下载: 1,085  浏览: 3,689  国家自然科学基金支持
作者: 沈丹, 王晓磊, 王平心*:江苏科技大学理学院,江苏 镇江
关键词: K-means扰动离散度K-Means Perturbation Discrete Degree
摘要: K-means算法是一种传统的基于划分聚类的算法,其本质是一种硬聚类划分,即要求每个研究对象要么属于这个类,要么不属于这个类,其聚类结果具有严格的边界。然而将某些不确定的对象强制分配到某个类中往往容易带来较高的决策风险。三支聚类将确定的元素放入核心域中,将不确定的元素放入边界域中延迟决策,可以有效地降低决策风险。本文将三支决策理论和K-means算法相结合得到一个新的三支聚类算法,该算法利用K-means聚类的结果,对不确定的点做更加合理的分类,同时对聚类完成的结果做扰动处理,分离出聚类内部的核心域和边界域。
Abstract: K-means algorithm is a traditional algorithm used for partition clustering, and its essence is a hard clustering, that is, the object studied only has two possible results, either belonging to this class or not belonging to this class, and its segmentation results are highly accurate. However, this algorithm has obvious disadvantages, and it is unable to deal with objects with features that are not obvious. The three-way clustering is a kind of fuzzy clustering division, which can deal with the non-obvious objects through the definition of core domain and boundary domain. This paper combines the ideas of three-way decision theory and K-means algorithm to form a new clustering algorithm. It cannot only maintain the original accuracy when clustering, but also make a more reasonable classification of relatively uncertain points. Then the core domain and boundary domain of the cluster are separated by perturbation processing.
文章引用:沈丹, 王晓磊, 王平心. 基于K-means的三支聚类算法[J]. 应用数学进展, 2018, 7(10): 1349-1356. https://doi.org/10.12677/AAM.2018.710157

参考文献

[1] Jain, A.K., Murty, M.N. and Flynn, P.J. (1999) Data Clustering: A Review. ACM Computing Survey, 31, 264-323.
https://doi.org/10.1145/331499.331504
[2] Kalyani, S. and Swarup, K.S. (2011) Particle Swarm Optimization Based K-Means Clustering Approach for Security Assessment in Power Systems. Expert Systems with Applications, 38, 10839-10846.
https://doi.org/10.1016/j.eswa.2011.02.086
[3] El Alami, M.E. (2011) Supporting Image Retrieval Framework with Rule Base System. Knowledge-Based Systems, 24, 331-340.
https://doi.org/10.1016/j.knosys.2010.10.005
[4] Martín-Guerrero, J.D., Palomares, A., Balaguer-Ballester, E., et al. (2006) Studying the Feasibility of a Recommender in a Citizen Webportal Based on User Modeling and Clustering Algorithms. Expert Systems with Applications, 30, 299-312.
https://doi.org/10.1016/j.eswa.2005.07.025
[5] Hartigan, J.A. and Wong, M.A. (1979) A K-Means Clustering Algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics), 28, 100-108.
[6] 王文森. 变异系数——一个衡量离散程度简单而有用的统计指标[J]. 中国统计, 2007, 2007(6): 41-42.
[7] Adil, F., Najlaa, A., Zahir, T., Abdullah, A., et al. (2014) A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis. IEEE Transactions on Emerging Topics in Computing, 2, 267-279.
https://doi.org/10.1109/TETC.2014.2330519