基于支持向量机方法的葡萄酒质量预测研究
Research on Wine Quality Prediction Based on Support Vector Machine Method
DOI: 10.12677/HJDM.2016.61006, PDF, HTML, XML,  被引量 下载: 3,527  浏览: 8,955 
作者: 李恩来:云南财经大学统计与数学学院,云南 昆明
关键词: 葡萄酒质量预测数据挖掘支持向量机Wine Quality Prediction Data Mining Support Vector Machine
摘要: 随着人们生活水平不断的提高,葡萄酒越来越受到人们的喜爱。葡萄酒的产量越来越大。然而葡萄酒质量鉴定手段还是仅靠品酒师的人工品尝打分来判定葡萄酒质量的好坏,显然这种鉴定方式难以满足当今市场的需求。现在有不少学者运用数据挖掘中的一些算法(比如Logistic多项模型,人工神经网络,支持向量机,决策树,Bagging,AdaBoost,最近邻方法等算法)来对葡萄酒质量进行预测研究,其结果并不是很好(误判率均在15%以上),但相对于仅靠品酒师的人工品尝打分来判定,其结果还是较为可靠的,通过前人的研究可以知道仅仅简单使用支持向量机中的常见核函数,并不能很好的预测葡萄酒质量,因此本文基于支持向量机方法的核函数进行修改,主要将支持向量机方法中常见核函数进行线性组合而得到新的核函数。本文通过使用UCI数据库中的“Wine Quality Data Set”的数据来验证本文所提出的方法与数据挖掘常用的算法进行对比,通过十折交叉验证的方法来判断方法的好坏。
Abstract: With the continuous improvement of people’s living standards, wine has become more and more popular among people. Wine production is growing. However, the quality of the wine is still only determined by wine tasters’ grading, which is obviously difficult to meet the needs of today’s market. Many scholars use data mining algorithms (such as Logistic multinomial model, artificial neural network, support vector machine, decision tree, Bagging, AdaBoost, nearest neighbor algo-rithm) to predict the wine quality. The results are not very good. The results are reliable and can be used to support vector machine (SVM), which can be used to predict the quality of wine. In this paper, we use the Quality Data Set UCI data to verify the proposed method and the data mining al-gorithm for comparison, using ten-fold cross validation method to determine the quality of the method.
文章引用:李恩来. 基于支持向量机方法的葡萄酒质量预测研究[J]. 数据挖掘, 2016, 6(1): 42-53. http://dx.doi.org/10.12677/HJDM.2016.61006

参考文献

[1] 张建生. 中国葡萄酒市场白皮书(2007.2008) [EB/OL]. http://www.redwinelife.com, 2009-11.
[2] 林翠香. 基于数据挖掘的葡萄酒质量识别[D]: [硕士学位论文]. 长沙: 中南大学, 2010.
[3] Han, J.W. and Kamber, M. 数据挖掘: 概念与技术[M]. 第3版. 北京: 机械工业出版社, 2012.
[4] Ebeler, S.E. (1999) Linking Flavor Chemistry to Sensory Analysis of Wine. In: Flavor Chemistry—Thirty Years of Progress, Kluwer Academic Publishers, Dordrecht, 409-422.
http://dx.doi.org/10.1007/978-1-4615-4693-1_35
[5] Cortez, P., Cerdeira, A., Almeida, F., Matos, T. and Reis, J. (2009) Modeling Wine Preferences by Data Mining from Physicochemical Properties. Decision Support System, 47, 533-547.
http://dx.doi.org/10.1016/j.dss.2009.05.016
[6] 李运, 李记明. 统计分析在葡萄酒质量评价中的应用[J]. 酿酒科技, 2009(4): 79-80.
[7] 徐海涛. 改进的近似支持向量机在葡萄酒质量鉴定中的应用[J]. 安徽农业科学, 2010, 38(29): 16105-16106.
[8] 许志卿, 苏喜友, 张顾. 基于支持向量机方法的森林火险预测研究[J]. 中国农学通报, 2012, 28(13): 126-131.
[9] Cherkassy, V. and Ma, Y. (2004) Practical Selection of SVM Parameters and Noise Estimation for SVM Regression. Neural Networks, 17, 113-126.
http://dx.doi.org/10.1016/S0893-6080(03)00169-2
[10] 李强. 创建决策树算法的比较研究——ID3, C4. 5, C5. 0算法的比较[J]. 甘肃科学学报, 2007, 18(4): 84-87.
[11] 刘延玲. 新的Hopfield神经网络分类器在葡萄酒质量评价中的应用[J]. 价值工程, 2012(2): 181-182.
[12] Breiman, L. (2001) Random Forests. Machine Learning, 45, 5-32.
http://dx.doi.org/10.1023/A:1010933404324