文章引用说明 更多>> (返回到该文章)

张新有, 曾华燊, 贾磊. 入侵检测数据集KDD CUP99研究[J]. 计算机工程与设计, 2010, 31(22): 4809-4812.

被以下文章引用:

  • 标题: 基于Spark的动态聚类算法研究Research on Dynamic Clustering Algorithm Based on Spark Framework

    作者: 张伯涛, 李建华, 范磊

    关键字: D-Stream, PDStream, Spark, 动态聚类算法D-Stream, PDStream, Spark, Dynamic Clustering Algorithm

    期刊名称: 《Computer Science and Application》, Vol.6 No.11, 2016-11-24

    摘要: 针对数据流的聚类算法,近年来取得了有效的进展,出现了许多卓有成效的算法。随着信息采集技术的进步,需要处理的数据量越来越大,需要研究针对数据流的并行聚类算法。本文基于串行的数据流聚类算法D-Stream作出并行化改进,用通用的大数据处理框架Spark设计了一个基于分布式架构运行的动态数据聚类算法PDStream。实验结果表明,该算法具有更高的效率和良好的扩展性,能够实现分布式架构下的流数据动态聚类。 In the era of big data, with the rapid growth of data size, the requirements of data processing in-crease constantly. It has put forward many effective algorithms for data stream clustering these years. However, with the continuous development of social technology, single machine environ-ment has been difficult to meet the needs of data mining. Cluster environment is used more for information collection and data processing, the traditional clustering algorithm does not adapt well to the new processing requirements. This paper made some improvements from the data stream clustering algorithm D-Stream, used the big data processing framework Spark and designed a dynamic data clustering algorithm PDStream based on distributed architecture. The new algorithm is proved to be more efficient and able to perform dynamic clustering tasks under distributed architecture from the results of experiment.

在线客服:
对外合作:
联系方式:400-6379-560
投诉建议:feedback@hanspub.org
客服号

人工客服,优惠资讯,稿件咨询
公众号

科技前沿与学术知识分享