基于随机森林分类模型的葡萄干特征分析
Characterization of Raisins Based on Random Forest Classification Model
DOI: 10.12677/AAM.2023.128356, PDF,   
作者: 余丽萍*, 吴喜之, 王涛:云南师范大学数学学院,云南 昆明
关键词: 机器学习随机森林特征分类R语言Machine Learning Random Forest Feature Classification R Language
摘要: 为了实现两种葡萄干的高效率分类,以R语言作为工具,将两种土耳其葡萄干(Besni和Kecimen)的900颗(每种450颗)葡萄干图像数据作为数据集,通过图像提取技术,提取7种形态学特征:Area、Perimeter、MajorAxisLength、MinorAxisLength、Eccentricity、ConvexArea、Extent,数据集经过归一化和清除噪音的处理,选择随机森林算法建立分类模型,与SVM模型相比较,结果表明:随机森林模型使用混淆矩阵进行综合评价结果显示与SVM模型不分上下,但对于葡萄干数据而言,使用随机森林模型对变量重要性的解读更适合,研究表示Perimeter和MajorAxisLength这两个形态学特征对随机森林的分类模型十分重要。
Abstract: In order to realize the efficient classification of two kinds of raisins, R language was used as a tool, and the image data of 900 raisins (450 raisins each) of two kinds of Turkish raisins (Besni and Kecimen) were used as a dataset, and seven morphological features were extracted by image ex-traction technique: Area, Perimeter, MajorAxisLength, MinorAxisLength, Eccentricity, ConvexArea, and Extent, the dataset was normalized and noise removal, and the Random Forest algorithm was selected to build the classification model, which was compared with the SVM model, and the results showed that: the Random Forest model using the confusion matrix for the comprehensive evalua-tion of the results showed that it was indistinguishable from the SVM model, but for the raisin data, the interpretation of the importance of the variables using the random forest model is more appro-priate, and the study indicated that the two morphological features of Perimeter and MajorAx-isLength are important for the classification model of the random forest.
文章引用:余丽萍, 吴喜之, 王涛. 基于随机森林分类模型的葡萄干特征分析[J]. 应用数学进展, 2023, 12(8): 3576-3586. https://doi.org/10.12677/AAM.2023.128356

参考文献

[1] Cinar, I., Koklu, M. and Tasdemir, S. (2020) Classification of Raisin Grains Using Machine Vision and Artificial Intelli-gence Methods. Gazi Journal of Engineering Sciences, 6, 200-209. [Google Scholar] [CrossRef
[2] 李忠新, 朱占江, 杨莉玲, 杨忠强, 崔宽波, 刘奎, 刘佳, 沈晓贺, 买合木江. 推进新疆葡萄干走向国际市场的技术对策研究[J]. 新疆农业科学, 2012, 49(6): 1103-1109.
[3] Karimi, N., Kondrood, R.R. and Alizadeh, T. (2017) An Intelligent System for Quality Measurement of Golden Bleached Raisins Using Two Comparative Machine Learning Algorithms. Measurement, 107, 68-76. [Google Scholar] [CrossRef
[4] Wen, J, Fang, X.Z, Cui, J.R., et al. (2019) Robust Sparse Linear Discriminant Analysis. IEEE Transactions on Circuits and Systems for Video Technology, 29, 390-403. [Google Scholar] [CrossRef
[5] 黄国宏, 刘刚. 一种新的基于高斯混合模型的线性判别分析[J]. 计算机工程与应用, 2007, 43(27): 75-77.
[6] 丁世飞, 齐丙娟, 谭红艳. 支持向量机理论与算法研究综述[J]. 电子科技大学学报, 2011, 40(1): 2-10.
[7] Breiman, L. (1996) Bagging Predictors. Machine Learning, 24, 123-140. [Google Scholar] [CrossRef
[8] 李旭青, 刘世盟, 李龙, 等. 基于RF算法优选多时相特征的冬小麦空间分布自动解译[J]. 农业机械学报, 2019, 50(6): 218-225.
[9] Kumar, N.S. and Arun, M. (2017) Ge-netic Algorithm-Based Feature Selection for Classification of Land Cover Changes Using Combined LANDSAT and ENVISAT Images. International Journal of Bio-Inspired Computation, 10, 172-187. [Google Scholar] [CrossRef
[10] 黄衍, 查伟雄. 随机森林与支持向量机分类性能比较[J]. 软件, 2012, 33(6): 107-110.
[11] Quinlan, J.R. (1986) Induction of Decision Trees. Machine Learning, 1, 81-106. [Google Scholar] [CrossRef
[12] Solomatine, D.P. and Shrestha, D.L. (2004) AdaBoost.RT: A Boosting Algorithm for Regression Problems. 2004 IEEE International Joint Conference on Neural Networks, Budapest, 25-29 July 2004, 1163-1168. [Google Scholar] [CrossRef
[13] 杨迎港, 刘培, 张合兵, 张文志. 基于特征优选随机森林算法的GF-2影像分类[J]. 航天返回与遥感, 2022, 43(2): 115-126.
[14] 王日升, 谢红薇, 安建成. 基于分类精度和相关性的随机森林算法改进[J]. 科学技术与工程, 2017, 17(20): 67-72.
[15] 李坤, 赵俊三, 林伊琳, 陈轲, 毕瑞. 基于RF和SVM模型的东川泥石流易发性评价研究[J]. 云南大学学报(自然科学版), 2022, 44(1): 107-115.