EM聚类分析法在大数据时代的应用
Application of EM Clustering in the Era of Big Data
摘要: 在大数据的时代背景下,各行各业的数据信息在应用和共享上都有着极大的便利,所以对数据的处理和分析手段显得尤为重要。大数据具有数据量庞大、分析效率低下、非结构化等特点,由于这样的复杂性,对同一组数据,可以从不同的角度去分析。使用传统的单维聚类方法不再适合大数据,本文研究了一个探索性的聚类方法——EM聚类分析法,具体分类过程是基于R语言中的Mclust()函数。将该方法应用于两个不同的实例,对于第一个例子中的小数据集我们采取了加入噪声的方式扩充其为大数据集,并在大数据集中得到了更好的分类效果,经过研究发现EM聚类分析法在大数据中得到的聚类效果更好,也适用于多维数据分析,最后对1900个基因在六个不同时间点的观测数据应用,给出了具体分类结果。
Abstract:
In the era of big data, the application and sharing of data information in all walks of life are greatly convenient, so the means of data processing and analysis are particularly important. Big data is characterized by a large amount of data, low analysis efficiency and unstructured data. Due to such complexity, the same set of data can be analyzed from different perspectives. Using the traditional unidimensional clustering method is not suitable for large data, so this paper studies an exploratory clustering method, EM clustering analysis. The specific classification process is based on the Mclust() function in R language. This method was applied to two different examples. For the first example of small data set, we have taken the way to add noise to extend it to a large data set, and got a better classification effect in the large data set. During the study, we found it is better to use the EM clustering analysis method in the large data, and it can also be applied to multidimensional data analysis. At last, the specific classification results of 1900 genes at six different time points are given.
参考文献
|
[1]
|
孙艺, 赵瑛珲, 王天棋, 马彦凯, 赵佳琪. 一种K-均值优化算法的研究与改进[J]. 自动化技术与应用, 2021, 40(9): 1-5+11.
|
|
[2]
|
田兵. 系统聚类法及其应用研究[J]. 阴山学刊(自然科学版), 2014, 28(2): 11-16.
|
|
[3]
|
茆诗松, 王静龙, 濮晓龙. 高等数理统计[M]. 第2版. 北京: 高等教育出版社, 2006: 427-433.
|
|
[4]
|
薛薇. R语言数据挖掘R [M]. 北京: 中国人民大学出版社, 2019: 155-160.
|
|
[5]
|
刘瑞银. 基于趋势性的剂量反应研究[D]: [博士学位论文]. 长春: 东北师范大学, 2011.
|