基于剪枝优化与索引求交的改进Eclat算法
An Improved Eclat Algorithm Based on Pruning Optimization and Indexing Intersection
摘要:
针对现有Eclat算法中普遍存在的候选集规模大、求交效率低的问题,提出了基于剪枝优化和索引求交的改进Eclat算法。首先根据频繁集的性质采用预剪枝和后剪枝相结合的候选集优化策略,即利用预剪枝技术裁剪待连接的项集数量以减少项集连接操作,同时利用先验性质对连接后的项集进行后剪枝处理;接着提出了一种基于事务索引的布尔数组求交方法,即通过将事务标识作为索引来设置并检索布尔数组,以获得项集支持度计数;最后通过设计对比实验,在经典数据集上测试该方法的有效性。实验表明,通过该方法能够有效压缩候选集规模,改善求交计算效率,特别是在支持度阈值小、事务数规模大的情况下,算法的运行效率得到了明显的提升。
Abstract:
In view of the problem of large number of candidate sets and low intersecting efficiency in the existing Eclat algorithm, an improved Eclat algorithm based on pruning optimization and index intersection is proposed. First, according to the properties of frequent itemsets, a candidate set optimization strategy based on pre pruning and post pruning is adopted, and that is to use pre pruning technology to tailor the number of itemsets to be linked to reduce itemset join operations, and use the transcendental property to prune the linked itemsets. Then a Boolean array intersection method based on transaction index is proposed, that is, to set and retrieve Boolean arrays by using the transaction identifier as an index to obtain the support count of the item set. Finally, the effectiveness of the method is tested on the classical dataset by designing contrast experiments. Experiments show that this method can effectively compress the size of the candidate sets and improve the efficiency of intersection calculation, especially in the case of small support threshold and large number of transactions, and the efficiency of the algorithm has been greatly improved.
参考文献
|
[1]
|
肖文, 胡娟, 周晓峰. 基于MapReduce计算模型的并行关联规则挖掘算法研究综述[J]. 计算机应用研究, 2018(1).
|
|
[2]
|
Imielienskin, T., Swami, A. and Agrawal, R. (1993) Mining Association Rules between Set of Items in Large Databases. ACM Sigmod Record, 22, 207-216. [Google Scholar] [CrossRef]
|
|
[3]
|
Han, J., Pei, J. and Yin, Y. (2000) Mining Fre-quent Patterns without Candidate Generation. ACM SIGMOD International Conference on Management of Data, Dallas, Texas, 15-18 May 2000, 1-12. [Google Scholar] [CrossRef]
|
|
[4]
|
Zaki, M.J. (2000) Scalable Algorithms for Association Mining. IEEE Transactions on Knowledge and Data Engineering, 12, 372-390. [Google Scholar] [CrossRef]
|
|
[5]
|
崔妍, 包志强. 关联规则挖掘综述[J]. 计算机应用研究, 2016, 32(2): 330-334.
|
|
[6]
|
周英, 卓金武, 卞月青. 大数据挖掘: 系统方法与实例分析[M]. 北京: 机械工业出版社, 2016.
|
|
[7]
|
韩家炜, Micheline, K., 裴健. 数据挖掘:概念与技术[M]. 第3版. 北京: 机械工业出版社, 2012.
|
|
[8]
|
饶正婵, 范年柏. 关联规则挖掘Apriori算法研究综述[J]. 计算机时代, 2012(9): 11-13.
|
|
[9]
|
Zaki, M.J. and Gouda, K. (2003) Fast Vertical Mining Using Diffsets. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, D.C., 24-27 August 2003, 326-335. [Google Scholar] [CrossRef]
|
|
[10]
|
熊忠阳, 陈培恩, 张玉芳. 基于散列布尔矩阵的关联规则Eclat改进算法[J]. 计算机应用研究, 2010, 27(4): 1323-1325.
|
|
[11]
|
冯培恩, 刘屿, 邱清盈, 等. 提高Eclat算法效率的策略[J]. 浙江大学学报(工学版), 2013, 47(2): 223-230.
|
|
[12]
|
曾雷. 关联规则挖掘中Apriori算法的研究[D]: [硕士学位论文]. 重庆: 重庆交通大学, 2016.
|