基于改进Apriori算法的频繁项集挖掘
Mining Frequent Itemsets Based on Improved Apriori Algorithm
DOI: 10.12677/CSA.2022.123063, PDF,  被引量   
作者: 兰建鑫, 孙 杰:天津工业大学计算机与科学技术学院,天津
关键词: 关联规则Apriori深度递归散列加权时延Association Rules Apriori Deep Recursion Hash Weighting Time Delay
摘要: 传统的关联规则挖掘算法有三种,分别是Apriori算法、FP-growth算法和Eclat算法。其中传统的Apriori算法简单易实现,但处理海量数据时耗时巨大且磁盘I/O过高,效率低下。而FP-growth算法虽然快速且高效,但对于内存资源极其不友好,且挖掘过程中出现问题难以追踪。针对Apriori算法与FP-growth算法的优缺点,本文提出了一种基于深度递归与散列技术改进的Apriori算法。该算法基于散列技术与递归思想,将传统算法的遍历次数大幅度降低,且很大程度上减少了磁盘I/O,保证了较低的时延和更多的存储空间,在算法时间和空间复杂度方面进行了一定程度上的优化。既提高了传统Apriori算法的效率,同时也保证了算法的可扩展性。
Abstract: There are three traditional association rule mining algorithms, namely Apriori algorithm, FP-growth algorithm and Eclat algorithm. The traditional Apriori algorithm is simple and easy to implement, but it takes a lot of time to process massive data, high disk I/O and low efficiency. Alt-hough FP-growth algorithm is fast and efficient, it is extremely unfriendly to memory resources, and problems in the mining process are difficult to track. Aiming at the advantages and disadvantages of Apriori algorithm and FP-growth algorithm, this paper proposes an improved Apriori algorithm based on deep recursion and hashing technology. Based on hash technology and recursive ideas, the algorithm greatly reduces the traversal times of the traditional algorithm, greatly reduces disk I/O, ensures lower delay and more storage space, and optimizes the time and space complexity of the algorithm to a certain extent. It not only improves the efficiency of the traditional Apriori algorithm, but also ensures the scalability of the algorithm.
文章引用:兰建鑫, 孙杰. 基于改进Apriori算法的频繁项集挖掘[J]. 计算机科学与应用, 2022, 12(3): 622-629. https://doi.org/10.12677/CSA.2022.123063

参考文献

[1] Jiawei Han, Micheling Kamber, Jian Pei, 等, 著. 数据挖掘:概念与技术[M]. 范明, 孟小峰, 译. 北京: 机械工业出版社, 2012.
[2] Siguenza-Guzman, L., Saquicela, V., Avila-Ordóñez, E., et al. (2015) Literature Review of Data Min-ing Applications in Academic Libraries. The Journal of Academic Librarianship, 41, 499-510. [Google Scholar] [CrossRef
[3] Agrawal, R. and Srikant, R. (1994) Fast Algorithms for Mining Association Rules in Large Databases. International Conference on Very Large Data Bases, San Francisco, CA, Sep-tember 1994, 487-499.
[4] Agrawal, R., Imielinski, T. and Swami, A.N. (1993) Mining Association Rules between Sets of Items in Large Databases. ACM SIGMOD Record, 22, 207-216. [Google Scholar] [CrossRef
[5] Han, J., Pei, J. and Yin, Y. (2000) Mining Frequent Patterns without Candidate Generation. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dal-las, TX, May 2000, 1-12. [Google Scholar] [CrossRef
[6] Fard, M.J.S. and Namin, P.A. (2020) Review of Apriori Based Fre-quent Itemset Mining Solutions on Big Data. 2020 6th International Conference on Web Research (ICWR), Tehran, 22-23 April 2020, 157-164. [Google Scholar] [CrossRef
[7] Wang, H.B. and Gao, Y.J. (2021) Research on Parallel-ization of Apriori Algorithm in Association Rule Mining. Procedia Computer Science, 183, 641-647. [Google Scholar] [CrossRef
[8] Javed, M.F., Nawaz, W. and Khan, K.U. (2021) HOVA-FPPM: Flexible Periodic Pattern Mining in Time Series Databases Using Hashed Occurrence Vectors and Apriori Approach. Scientific Programming, 2021, Article ID: 8841188. [Google Scholar] [CrossRef
[9] Wang, H., Huang, P. and Chen, X. (2021) Research and Application of a Multidimensional Association Rules Mining Method Based on OLAP. International Journal of Information Technolo-gy and Web Engineering (IJITWE), 16, 75-94. [Google Scholar] [CrossRef
[10] Bustio-Martínez, L., Letras-Luna, M., Cumplido, R., et al. (2019) Using Hashing and Lexicographic Order for Frequent Itemsets Mining on Data Streams. Journal of Parallel and Distributed Computing, 125, 58-71. [Google Scholar] [CrossRef
[11] Guo, H., Liu, H., Chen, J.Y., et al. (2021) Data Mining and Risk Prediction Based on Apriori Improved Algorithm for Lung Cancer. Journal of Signal Processing Systems, 93, 795-809. [Google Scholar] [CrossRef
[12] Wang, X. and Jiao, G. (2020) Research on Association Rules of Course Grades Based on Parallel FP-Growth Algorithm. Journal of Computational Methods in Sciences and Engineer-ing, 20, 759-769. [Google Scholar] [CrossRef