基于阴影集的共享最邻近三支DBSCAN
Three-Way DBSCAN Text Clustering Based on Shadowed Sets and Shared Nearest Neighbor
DOI: 10.12677/hjdm.2025.152012, PDF,    科研立项经费支持
作者: 李志聪*, 闫 昆:哈尔滨师范大学计算机科学与信息工程学院,黑龙江 哈尔滨
关键词: 三支决策三支聚类阴影集文本聚类Three-Way Decision Three-Way Clustering Shadowed Sets Text Clustering
摘要: 传统DBSCAN算法在处理数据时,将某些不确定的数据强制划分到某一类中往往容易带来决策风险。针对此问题,提出了基于阴影集的共享最邻近三支DBSCAN算法。该算法利用三支决策思想,将核心点划分到核心域中,对于非核心点引入阴影集理论,计算样本的隶属度,将样本划分到核心域或边界域中,并通过共享最邻近算法进一步细化边界域中的样本划分,从而提升聚类的准确性和鲁棒性。该算法应用在文本分析领域,通过实验对比分析,验证了该算法具有较好的性能,提高了文本聚类的准确性。
Abstract: The traditional DBSCAN algorithm, when processing data, often faces decision risks by forcing certain uncertain data points into a specific cluster. A three-way DBSCAN algorithm based on shadowed sets and Shared Nearest Neighbor is proposed to address this issue. This algorithm utilizes the three-way decision-making approach to classify core points into the core region. For non-core points, the theory of shadow sets is introduced to calculate the membership degree of the samples, categorizing them into either the core region or boundary region. The Shared Nearest Neighbor algorithm is then applied to further refine the classification of samples within the boundary region, thereby enhancing the accuracy and robustness of clustering. Applied in text analysis, experimental comparative analysis has verified that this algorithm demonstrates better performance and improves the accuracy of text clustering.
文章引用:李志聪, 闫昆. 基于阴影集的共享最邻近三支DBSCAN[J]. 数据挖掘, 2025, 15(2): 137-150. https://doi.org/10.12677/hjdm.2025.152012

参考文献

[1] Wang, P., Yang, X., Ding, W., Zhan, J. and Yao, Y. (2024) Three-Way Clustering: Foundations, Survey and Challenges. Ap-plied Soft Computing, 151, Article ID: 111131. [Google Scholar] [CrossRef
[2] Leuski, A. (2001) Evalu-ating Document Clustering for Interactive Information Retrieval. Proceedings of the Tenth International Conference on Infor-mation and Knowledge Management, Atlanta, 5-10 October 2001, 33-40. [Google Scholar] [CrossRef
[3] Mei, Q. and Zhai, C. (2005) Discovering Evolutionary Theme Patterns from Text: An Exploration of Temporal Text Mining. Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, Chicago, 21-24 August 2005, 198-207. [Google Scholar] [CrossRef
[4] Nandwani, P. and Verma, R. (2021) A Review on Sentiment Analysis and Emotion Detection from Text. Social Network Analysis and Mining, 11, Article No. 81. [Google Scholar] [CrossRef] [PubMed]
[5] Ester, M., Kriegel, H.P., Sander, J., et al. (1996) A Density-Based Al-gorithm for Discovering Clusters in Large Spatial Databases with Noise. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, 2-4 August 1996, 226-231.
[6] Rehman, S.U., Asghar, S., Fong, S. and Sarasvady, S. (2014) DBSCAN: Past, Present and Future. The Fifth International Conference on the Applications of Digital In-formation and Web Technologies (ICADIWT 2014), Bangalore, 17-19 February 2014, 232-238. [Google Scholar] [CrossRef
[7] Deng, D. (2020) DBSCAN Clustering Algorithm Based on Density. 2020 7th International Forum on Electrical Engineering and Automation (IFEEA), Hefei, 25-27 September 2020, 949-953. [Google Scholar] [CrossRef
[8] Ienco, D. and Bordogna, G. (2016) Fuzzy Extensions of the DBScan Clustering Algorithm. Soft Computing, 22, 1719-1730. [Google Scholar] [CrossRef
[9] Ertöz, L., Stein-bach, M. and Kumar, V. (2003) Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data. Proceedings of the 2003 SIAM International Conference on Data Mining, San Francisco, 1-3 May 2003, 47-58. [Google Scholar] [CrossRef
[10] Yu, H., Chen, L., Yao, J. and Wang, X. (2019) A Three-Way Clustering Method Based on an Improved DBSCAN Algorithm. Physica A: Statistical Mechanics and Its Applications, 535, Article 122289. [Google Scholar] [CrossRef
[11] Pedrycz, W. (1998) Shadowed Sets: Representing and Processing Fuzzy Sets. IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), 28, 103-109. [Google Scholar] [CrossRef] [PubMed]
[12] Pedrycz, W. and Vukovich, G. (2002) Granular Computing with Shadowed Sets. International Journal of Intelligent Systems, 17, 173-197. [Google Scholar] [CrossRef
[13] Pedrycz, W. (2005) Interpretation of Clusters in the Framework of Shadowed Sets. Pattern Recognition Letters, 26, 2439-2449. [Google Scholar] [CrossRef
[14] Pedrycz, W. (2009) From Fuzzy Sets to Shadowed Sets: Interpretation and Computing. International Journal of Intelligent Systems, 24, 48-61. [Google Scholar] [CrossRef
[15] Jiang, C., Li, Z. and Yao, J. (2022) A Shadowed Set-Based Three-Way Clustering Ensemble Approach. International Journal of Machine Learning and Cybernetics, 13, 2545-2558. [Google Scholar] [CrossRef
[16] Zhang, Y., Zhang, T., Peng, C., Ma, F. and Pedrycz, W. (2024) Rough Fuzzy K-Means Clustering Based on Parametric Decision-Theoretic Shadowed Set with Three-Way Approximation. International Journal of Fuzzy Systems, 26, 1698-1715. [Google Scholar] [CrossRef
[17] Zhang, X. and Zhou, S. (2023) WOA-DBSCAN: Application of Whale Optimization Algorithm in DBSCAN Parameter Adaption. IEEE Access, 11, 91861-91878. [Google Scholar] [CrossRef
[18] 李文杰, 闫世强, 蒋莹, 等. 自适应确定DBSCAN算法参数的算法研究[J]. 计算机工程与应用, 2019, 55(5): 1-7, 148.
[19] Kim, J., Choi, J., Yoo, K. and Nasridinov, A. (2018) AA-DBSCAN: An Approximate Adaptive DBSCAN for Finding Clusters with Varying Densities. The Journal of Supercom-puting, 75, 142-169. [Google Scholar] [CrossRef
[20] Smiti, A. and Eloudi, Z. (2013) Soft DBSCAN: Improving DBSCAN Clustering Method Using Fuzzy Set Theory. 2013 6th International Conference on Human System Interactions (HSI), Sopot, 6-8 June 2013, 380-385. [Google Scholar] [CrossRef
[21] 申秋萍, 张清华, 高满, 等. 基于局部半径的三支DBSCAN算法[J]. 计算机科学, 2023, 50(6): 100-108.
[22] Yao, Y. (2010) Three-Way Decisions with Probabilistic Rough Sets. Information Sciences, 180, 341-353. [Google Scholar] [CrossRef
[23] Yao, Y. (2011) The Superiority of Three-Way Decisions in Probabilistic Rough Set Models. Information Sciences, 181, 1080-1096. [Google Scholar] [CrossRef
[24] Yu, H., Zhang, C. and Wang, G. (2016) A Tree-Based Incremental Overlapping Clustering Method Using the Three-Way Decision Theory. Knowledge-Based Systems, 91, 189-203. [Google Scholar] [CrossRef
[25] Yu, H. (2017) A Framework of Three-Way Cluster Analysis. Rough Sets: International Joint Conference, IJCRS 2017, Olsztyn, 3-7 July 2017, 300-312. [Google Scholar] [CrossRef
[26] 鞠哲, 曹隽喆, 顾宏. 用于不平衡数据分类的模糊支持向量机算法[J]. 大连理工大学学报, 2016, 56(5): 525-531.
[27] Maji, P. and Pal, S.K. (2007) RFCM: A Hybrid Clustering Algorithm Using Rough and Fuzzy Sets. Fundamenta Informaticae, 80, 475-496. [Google Scholar] [CrossRef
[28] 周水庚, 周傲英, 曹晶, 等. 一种基于密度的快速聚类算法[J]. 计算机研究与发展, 2000, 37(11): 1287-1292.