关联规则挖掘中几个兴趣度量的值域研究
Research on the Value Domains of Several Interest Measures in Association Rule Mining
DOI: 10.12677/hjdm.2024.143015, PDF,   
作者: 万 鑫*, 李建军*, 李裕梅:北京工商大学数学与统计学院,北京
关键词: 关联规则挖掘兴趣度量值域Association Rule Mining Interest Measures Value Domain
摘要: 本文旨在研究关联规则挖掘中的各种兴趣度量的值域问题。首先,详细介绍了关联规则挖掘过程中涉及的定义和支持度、置信度、确信度、提升度和Laplace测度这五种兴趣度量的定义,并通过具体例子对这些度量进行了说明和解释。然后,深入探讨了这五种兴趣度量的值域,并给出了其在数据库大小有限和接近无穷两种情况下的值域情况。此外,本文还对这些兴趣度量值域的区间端点的取值进行了细致讨论,指出了与其他研究结果的区别及其原因,并给出了严谨的数学证明和对比分析,为关联规则挖掘提供了更全面和准确的度量工具。
Abstract: This article aims to study the value domains problem of several interest metrics in association rule mining. Firstly, a detailed introduction was given to the definitions of five interest measures involved in the process of association rule mining, including support, confidence, conviction, lift, and Laplace measures. These measures were explained and illustrated through specific examples. Then, the value domains of these five interest measures were explored in depth, and their value situations were given in two scenarios: limited database size and near infinite database size. In addition, this article also provides a detailed discussion on the values at the interval endpoints of these interest measures, pointing out the differences and reasons from other research results, and providing a more comprehensive and accurate measurement tool for association rule mining through rigorous mathematical proof and comparative analysis.
文章引用:万鑫, 李建军, 李裕梅. 关联规则挖掘中几个兴趣度量的值域研究[J]. 数据挖掘, 2024, 14(3): 162-171. https://doi.org/10.12677/hjdm.2024.143015

参考文献

[1] 郭瑞, 钱晓东. 基于一阶谓词公式去除商务数据冗余关联规则的研究[J]. 计算机工程与科学, 2017, 39(3): 593-598.
[2] 翟悦, 秦放. 基于概念格的无冗余关联规则提取算法[J]. 计算机应用与软件, 2015, 32(4): 46-49+66.
[3] Lobo, D. (2014) Association Rules: Normalizing the Lift. Ninth International Conference on Digital Information Management (ICDIM 2014), Phitsanulok, 29 September 2014-1 October 2014, 151-155. [Google Scholar] [CrossRef
[4] Ordonez, C. (2006) Comparing Association Rules and Decision Trees for Disease Prediction. Proceedings of the International Workshop on Healthcare Information and Knowledge Management, Association for Computing Machinery, 17-24. [Google Scholar] [CrossRef
[5] 邱均平, 崔腾腾, 陈仕吉. 基于聚类和关联规则的Altmetric TOP榜文献特征分析[J]. 现代情报, 2021, 41(9): 12-21, 63.
[6] 李鑫, 史天运, 常宝, 等. 基于优化的MsEclat算法的铁路机车事故故障关联规则挖掘[J]. 中国铁道科学, 2021, 42(4): 155-165.
[7] 王枭翔. 基于相关兴趣度的关联规则挖掘[D]: [硕士学位论文]. 兰州: 兰州交通大学, 2014.
[8] Bao, F.G., Mao, L.H., Zhu, Y.L., et al. (2022) An Improved Evaluation Methodology for Mining Association Rules. Axioms, 11, 2-17. [Google Scholar] [CrossRef
[9] Azevedo Paulo, J. and Jorge Alipio, M. (2007) Comparing Rule Measures for Predictive Association Rules. In: Machine Learning: ECML 2007, Springer-Verlag, 510-517. [Google Scholar] [CrossRef
[10] Lenca, P., Meyer, P., Vaillant, B., et al. (2008) On Selecting Interestingness Measures for Association Rules: User Oriented Description and Multiple Criteria Decision Aid. European Journal of Operational Research, 184, 610-626. [Google Scholar] [CrossRef
[11] Mcnicholas, P.D., Murphy, T.B. and O’Regan, M. (2008) Standardising the Lift of an Association Rule. Computational Statistics & Data Analysis, 52, 4712-4721. [Google Scholar] [CrossRef
[12] Mohammed J. Zaki, Wagner Meira Jr. 数据挖掘与分析概念与算法[M]. 吴诚堃, 译. 北京: 人民邮电出版社, 2017: 186-189.
[13] Brin, S., Motwani, R, Ullman, J.D., et al. (2001) Dynamic Itemset Counting and Implication Rules for Market Basket Data. ACM SIGMOD Record, 26, 255-264. [Google Scholar] [CrossRef