基于R软件分析两组专家对五个葡萄酒样品的评分数据
Analyzing the Score Data of Five Wine Samples from Two Groups of Experts Based on R Software
摘要: 本文利用R软件主要讨论了两组专家对五个葡萄酒样品的评分及专家评分的合理性问题。首先,利用两个正态总体均值的假设检验评判两组专家的评分之间是否存在显著差异,从检验结果发现两组专家的评分是基本相符的,从而评比结果有一定的公平性与合理性。其次,利用均值的多重检验考察专家们对不同样品的区分度。在0.05的显著性水平下,专家们能够区分样品1与样品2、样品3、样品5,样品2与样品4,样品3与样品4。对五个样品的等级从高到低排序之后发现,专家们基本上可以区分等级相差为1的样品。但是专家们没有有效地区分出样品1与样品4(等级相差1.5),样品3与样品5(等级相差1)。然后,运用系统聚类的方法将五个样品分为优、良、差三类。最后,采用距离判别分析法,利用训练样本建立判别函数,将训练样本回代进行判别,得到专家的误判率和正确率,从而利用判别函数对新的样本进行分类。
Abstract: By using R software, we discuss the evaluations of five wine samples by two groups of specialists and the rationality of the evaluations. First of all, by using the hypothesis testing of two normal population means, we judge whether there are significant score differences between two groups of specialists. The test results show consistency of scores of two groups of specialists, and thus the evaluation result has certain fairness and rationality. Secondly, by using multiple t test of the mean, we can investigate the degree of differentiation of different samples by the specialists. Under the significance level of 0.05, the specialists can separate sample 1 from samples 2, 3, and 5, samples 2 and 4, samples 3 and 4. By ordering the levels of five samples from high to low, we find that the specialists can basically distinguish samples with levels with level difference by 1. But specialists do not effectively distinguish samples 1 and 4 (level difference 1.5), samples 3 and 5 (level difference 1). Then we use the hierarchical clustering method to classify five samples to three classes: excellent, good, and bad. Finally, by using the distance discriminant analysis method, the discriminant function is established based on the training sample, then by discrimination of the training sample, we get specialists’ misjudgment rate and accurate rate, and thus we can use the discriminant function to classify the new samples.
文章引用:明鹤, 张应应. 基于R软件分析两组专家对五个葡萄酒样品的评分数据[J]. 统计学与应用, 2014, 3(4): 133-140. http://dx.doi.org/10.12677/SA.2014.34018

参考文献

[1] R Core Team (2014) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/
[2] 薛毅, 陈丽萍 (2007) 统计建模与R软件. 清华大学出版社, 北京.
[3] 杨虎, 刘琼荪, 钟波 (2004) 数理统计. 高等教育出版社, 北京.
[4] 张应应, 魏毅 (2014) R函数实现正态总体均值、方差的区间估计及假设检验的设计. 统计与决策, 9, 74-77.
[5] Zhang, Y.Y. (2013) OneTwo-Samples: Deal with one and two (normal) samples. R package version 1.0-3. http://CRAN.R-project.org/package=OneTwoSamples
[6] 王学民 (2009) 应用多元分析. 第3版, 上海财经大学出版社, 上海.
[7] 方开泰 (1982) 有序样品的一些聚类方法. 应用数学学报, 1, 94-101.