红酒品种聚类分析
Cluster Analysis of Red Wine Varieties
摘要: 本文根据UCI红酒化学成分数据,指标为:红酒品牌、酒精浓度、苹果酸含量、灰度、灰的碱度、镁含量、总酚类化合物量、类黄酮量、原花青素年、颜色强度、色调、稀释酒、脯氨酸,进行红酒品种的聚类分析。本文基于K-means聚类法、层次聚类法及Dbscan聚类法,对红酒化学成分数据进行聚类分析。首先对数据进行筛选,并选择适合的、有意义的,且适合聚类的指标,然后进行数据预处理,最后通过R语言实现红酒品种的聚类。通过聚类结果进行解释,给出红酒的品种类别与质量的好坏。
Abstract: In this paper, according to the chemical composition data of UCI red wine, the indicators are: red wine brand, alcohol concentration, malic acid content, gray scale, alkalinity of gray, magnesium content, total phenolic compounds amount, flavonoid amount, procyanidin year, color intensity, hue, diluted wine and proline, to conduct cluster analysis of red wine varieties. Based on K-means clustering method, hierarchical clustering method and Dbscan clustering method, this paper conducts clustering analysis on the chemical composition data of red wine. First, the data were screened and appropriate, meaningful and clustering indexes were selected. Then, data pretreatment was carried out. Finally, the clustering of red wine varieties was realized through R language. The classification and quality of red wine were explained by clustering results.
文章引用:杨帆, 苏理云. 红酒品种聚类分析[J]. 统计学与应用, 2021, 10(1): 31-46. https://doi.org/10.12677/SA.2021.101004

参考文献

[1] Hou, G.L., Ge, B., Sun, L.L. and Xing, K.X. (2020) A Study on Wine Sensory Evaluation by the Statistical Analysis Method. Czech Journal of Food Sciences, 38, 1-10.
[Google Scholar] [CrossRef
[2] 周涓, 熊忠阳, 张玉芳, 等. 基于最大最小距离法的多中心聚类算法[J]. 计算机应用, 2006, 26(6): 1425-1427.
[3] 喻彪, 骆雯, 赖朝安. 数据挖掘聚类算法研究[J]. 现代制造工程, 2009(3): 141-145.
[4] 贺玲, 吴玲达, 蔡益朝. 数据挖掘中的聚类算法综述[J]. 计算机应用研究, 2007(1): 10-13.
[5] 周开乐, 杨善林, 丁帅, 罗贺. 聚类有效性研究综述[J]. 系统工程理论与实践, 2014, 34(9): 2417-2431.
[6] Bernhard, F. (1988) Algorithms for Clustering Data. In: Jain, A.K. and Dubes, R.C., Eds., Prentice Hall Advanced Reference Series in Computer Science, Prentice Hall, Englewood Cliffs, NJ, Vol. 21, 137-138.
[7] 王莉. 数据挖掘中聚类方法的研究[D]: [博士学位论文]. 天津: 天津大学, 2004.
[8] 吴晓蓉. K-均值聚类算法初始中心[D]: [硕士学位论文]. 长沙: 湖南大学, 2006.
[9] 张科泽, 杨鹤标, 沈项军, 等. 基于节点数据密度的分布式K-means聚类算法研究[J]. 辽宁工程技术大学, 2011, 28(10): 3643-3645, 3655.
[10] 段明秀. 层次聚类算法的研究及应用[D]: [硕士学位论文]. 长沙: 中南大学, 2009.
[11] 周水庚, 周傲英, 曹晶. 基于数据分区的DBSCAN算法[J]. 计算机研究与发展, 2000, 37(10): 1153-1159.
[12] Zhang, Y.C., Chen, S., Chen, S.Y., Chen, H. and Guo, P. (2020) A Novel Lidar Gradient Cluster Analysis Method of Nocturnal Boundary Layer Detection during Air Pollution Episodes. Atmospheric Measurement Techniques, 13, 6675-6689.
[Google Scholar] [CrossRef
[13] Blanquet, J., Fur, Y.L. and Ballester, J. (2017) Computerized Delimitation of Odorant Areas in Gas-Chromatography Olfactometry by Kernel Density Estimation. Data Processing on French White Wines, 167, 29-35.
[Google Scholar] [CrossRef