基于随机森林算法的机器学习分类研究综述
A Review of Machine Learning Classification Based on Random Forest Algorithm
DOI: 10.12677/AIRR.2024.131016, PDF,  被引量    国家自然科学基金支持
作者: 向进勇, 王振华, 邓芸芸:伊犁师范大学网络安全与信息技术学院,新疆 伊宁;伊犁师范大学伊犁河谷智能计算研究与应用重点实验室,新疆 伊宁
关键词: 决策树随机森林机器学习Decision Trees Random Forests Machine Learning
摘要: 机器学习是实现人工智能的重要技术,随机森林算法是机器学习的代表算法之一。随机森林算法以简单、有效而闻名工业界和学术界,它是基于决策树的分类器,通过投票选择最优的分类树。随机森林算法有可变重要性度量、包外误差、近似度等优秀特性,因此随机森林被广泛的应用到分类算法中。目前,不仅在医学、农业、自然语言处理等领域被广泛提及,而且在垃圾信息分类、入侵检测、内容信息过滤、情感分析等方面都有广泛的应用。本文主要介绍了随机森林的构建过程以及随机森林的研究现状,主要从分类性能、应用领域以及分类效果加以介绍,分析随机森林算法优缺点以及研究人员对随机森林算法的改进,希望通过分析能够让初学随机森林算法的研究人员掌握随机森林的理论基础。
Abstract: Machine learning is an important technology to realize artificial intelligence, and random forest algorithm is one of the representative algorithms of machine learning. The random forest algorithm is well-known in industry and academia for its simplicity and effectiveness. It is a decision tree-based classifier that selects the optimal classification tree through voting. Random forest algorithm is widely used in classification algorithms because of its excellent characteristics such as variable importance measure, out-of-envelope error and approximation. At present, it is not only widely mentioned in medicine, agriculture, natural language processing and other fields, but also widely used in junk information classification, intrusion detection, content information filtering, sentiment analysis and other aspects. This paper mainly introduces the construction process of random forest and the research status of random forest, mainly from the classification performance, application field and classification effect, analyzes the advantages and disadvantages of random forest algorithm and the improvement of random forest algorithm by researchers, hoping that through analysis, researchers who have just learned random forest algorithm can master the theoretical basis of random forest.
文章引用:向进勇, 王振华, 邓芸芸. 基于随机森林算法的机器学习分类研究综述[J]. 人工智能与机器人研究, 2024, 13(1): 143-152. https://doi.org/10.12677/AIRR.2024.131016

参考文献

[1] Abdel-Hamid, O., Mohamed, A., Jiang, H. and Penn, G. (2012) Applying Convolutional Neural Networks Concepts to Hybrid NN-HMM Model for Speech Recognition. 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, 25-30 March 2012, 4277-4280. [Google Scholar] [CrossRef
[2] Adeen, I.M.N., Abdulazeez, A.M. and Zeebaree, D.Q. (2020) Systematic Review of Unsupervised Genomic Clustering Algorithms Techniques for High Dimensional Datasets. Technology Reports of Kansai University, 62, 355-374.
[3] Zeebaree, D.Q., Haron, H., Abdulazeez, A.M. and Zebari, D.A. (2019) Machine Learning and Region Growing for Breast Cancer Segmentation. 2019 International Conference on Advanced Science and Engineering (ICOASE), Zakho-Duhok, 2-4 April 2019, 88-93. [Google Scholar] [CrossRef
[4] Sadiq, S.S., Abdulazeez, A.M. and Haron, H. (2020) Solving Multi-Objective Master Production Schedule Problem Using Memetic Algorithm. Indonesian Journal of Electrical Engineering and Computer Science, 18, 938-945. [Google Scholar] [CrossRef
[5] Abdulqader, D.M., Abdulazeez, A.M. and Zeebaree, D.Q. (2020) Machine Learning Supervised Algorithms of Gene Selection: A Review. Technology Reports of Kansai University, 62, 233-243.
[6] Zebari, D.A., Haron, H., Zeebaree, D.Q. and Zain, A.M. (2019) A Simultaneous Approach for Compression and Encryption Techniques Using Deoxyribonucleic Acid. 2019 13th International Conference on Software, Knowledge, Information Management and Applications (SKIMA), Island of Ulkulhas, 26-28 August 2019, 1-6. [Google Scholar] [CrossRef
[7] Sadeeq, H. and Abdulazeez, A.M. (2018) Hardware Implementation of Firefly Optimization Algorithm Using FPGAs. 2018 International Conference on Advanced Science and Engineering (ICOASE), Duhok, 9-11 October 2018, 30-35. [Google Scholar] [CrossRef
[8] Najat, N. and Abdulazeez, A.M. (2017) Gene Clustering with Partition around Mediods Algorithm Based on Weighted and Normalized Mahalanobis Distance. 2017 International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS), 24-26 November 2017, Okinawa, 140-145.
[9] Mienye, I.D., Sun, Y. and Wang, Z. (2019) Prediction Performance of Improved Decision Tree-Based Algorithms: A Review. Procedia Manufacturing, 35, 698-703. [Google Scholar] [CrossRef
[10] Das, K., Behera, R.N. and Tech, B. (2007) A Survey on Machine Learning: Concept, Algorithms and Applications. International Journal of Innovative Research in Computer and Communication Engineering, 5, 1301-1309.
[11] Schonlau, M. and Zou, R.Y. (2020) The Random Forest Algorithm for Statistical Learning. The Stata Journal: Promoting Communications on Statistics and Stata, 20, 3-29. [Google Scholar] [CrossRef
[12] Han, J., Fang, M., Ye, S., Chen, C., Wan, Q. and Qian, X. (2019) Using Decision Tree to Predict Response Rates of Consumer Satisfaction, Attitude, and Loyalty Surveys. Sustainability, 11, Article 2306. [Google Scholar] [CrossRef
[13] Zhou, Z., Wang, Y., He, X. and Zhang, X. (2020) Optimization of Random Forests Algorithm Based on ReliefF-SA. IOP Conference Series: Materials Science and Engineering, 768, Article ID: 072065. [Google Scholar] [CrossRef
[14] Kumar, G.K., Viswanath, P. and Rao, A.A. (2016) Ensemble of Randomized Soft Decision Trees for Robust Classification. Sādhanā, 41, 273-282. [Google Scholar] [CrossRef
[15] Li, Y., Jiang, Z.L., Yao, L., Wang, X., Yiu, S.M. and Huang, Z. (2019) Outsourced Privacy-Preserving C4.5 Decision Tree Algorithm over Horizontally and Vertically Partitioned Dataset among Multiple Parties. Cluster Computing, 22, 1581-1593. [Google Scholar] [CrossRef
[16] Singh, S. and Giri, M. (2014) Comparative Study Id3, Cart and C4.5 Decision Tree Algorithm: A Survey. International Journal of Advanced Information Science and Technology, 3, 47-52.
[17] Band, S.S., Janizadeh, S., Saha, S., Mukherjee, K., Bozchaloei, S.K., Cerdà, A., Shokri, M. and Mosavi, A. (2020) Evaluating the Efficiency of Different Regression, Decision Tree, and Bayesian Machine Learning Algorithms in Spatial Piping Erosion Susceptibility Using ALOS/PALSAR Data. Land, 9, Article 346. [Google Scholar] [CrossRef
[18] Sarker, I.H., Colman, A., Han, J., Khan, A.I., Abushark, Y.B. and Salah, K. (2020) BehavDT: A Behavioral Decision Tree Learning to Build User-Centric Context-Aware Predictive Model. Mobile Networks and Applications, 25, 1151-1161. [Google Scholar] [CrossRef
[19] Ozgode Yigin, B., Algin, O. and Saygili, G. (2020) Comparison of Morphometric Parameters in Prediction of Hydrocephalus Using Random Forests. Computers in Biology and Medicine, 116, Article ID: 103547. [Google Scholar] [CrossRef] [PubMed]
[20] Denisko, D. and Hoffman, M.M. (2018) Classification and Interaction in Random Forests. Proceedings of the National Academy of Sciences of the United States of America, 115, 1690-1692. [Google Scholar] [CrossRef] [PubMed]
[21] Utkin, L.V., Kovalev, M.S. and Coolen, F.P.A. (2020) Imprecise Weighted Extensions of Random Forests for Classification and Regression. Applied Soft Computing, 92, Article ID: 106324. [Google Scholar] [CrossRef
[22] Demidova, L. and Ivkina, M. (2019) Defining the Ranges Boundaries of the Optimal Parameters Values for the Random Forest Classifier. 2019 1st International Conference on Control Systems, Mathematical Modelling, Automation and Energy Efficiency (SUMMA), Lipetsk, 20-22 November 2019, 518-522. [Google Scholar] [CrossRef
[23] Kolhe, M.L., Tiwari, S., Trivedi, M.C. and Mishra, K.K. (2020). Advances in Data and Information Sciences: Proceedings of ICDIS 2019. Springer, Singapore.[CrossRef
[24] Gajowniczek, K., Grzegorczyk, I., Ząbkowski, T. and Bajaj, C. (2020) Weighted Random Forests to Improve Arrhythmia Classification. Electronics, 9, Article 99. [Google Scholar] [CrossRef] [PubMed]
[25] Zhang, B.Z., Qiao, X.M., Yang, H.M. and Zhou, Z.B. (2020). A Random Forest Classification Model for Transmission Line Image Processing. 2020 15th International Conference on Computer Science & Education (ICCSE), Delft, 18-22 August 2020, 613-617.[CrossRef
[26] Goel, E. and Abhilasha, E. (2017) Random Forest: A Review. International Journal of Advanced Research in Computer Science and Software Engineering, 7, 251-257. [Google Scholar] [CrossRef
[27] Imaizumi, T., Okada, A., Miyamoto, S., Sakaori, F., Yamamoto, Y. and Vichi, M. (2020) Advanced Studies in Classification and Data Science. Springer, Singapore. [Google Scholar] [CrossRef
[28] Darbanian, E., Rahbari, D., Ghanizadeh, R. and Nickray, M. (2020) Improving Response Time of Task Offloading by Random Forest, Extra-Trees and Adaboost Classifiers in Mobile Fog Computing. Jordanian Journal of Computers and Information Technology, 6, 345-360. [Google Scholar] [CrossRef
[29] Chaudhary, A., Kolhe, S. and Kamal, R. (2016) An Improved Random Forest Classifier for Multi-Class Classification. Information Processing in Agriculture, 3, 215-222.
[30] Chen, S., Mulder, V.L., Martin, M.P., Walter, C., Lacoste, M., Richer-De-Forges, A.C., Saby, N.P.A., Loiseau, T., Hu, B. and Arrouays, D. (2019) Probability Mapping of Soil Thickness by Random Survival Forest at a National Scale. Geoderma, 344, 184-194. [Google Scholar] [CrossRef
[31] Bargarai, F.A.M., Abdulazeez, A.M., Tiryaki, V.M. and Zeebaree, D.Q. (2020) Management of Wireless Communication Systems Using Artificial Intelligence-Based Software Defined Radio. International Journal of Interactive Mobile Technologies (IJIM), 14, 107-133. [Google Scholar] [CrossRef
[32] Iwendi, C. and Jo, O. (2020) COVID-19 Patient Health Prediction Using Boosted Random Forest Algorithm. Frontiers in Public Health, 8, Article 357. [Google Scholar] [CrossRef] [PubMed]
[33] Zhang, F. and Yang, X. (2020) Improving Land Cover Classification in an Urbanized Coastal Area by Random Forests: The Role of Variable Selection. Remote Sensing of Environment, 251, Article ID: 112105. [Google Scholar] [CrossRef
[34] Saenz-Cogollo, J.F. and Agelli, M. (2020) Investigating Feature Selection and Random Forests for Inter-Patient Heartbeat Classification. Algorithms, 13, Article 75. [Google Scholar] [CrossRef
[35] Chai, Z. and Zhao, C. (2020) Multiclass Oblique Random Forests with Dual-Incremental Learning Capacity. IEEE Transactions on Neural Networks and Learning Systems, 31, 5192-5203.
[36] Azar, A.T., Gaber, T., Oliva, D., Ṭulbah, M.F. and Hassanien, A.E. (2020) Proceedings of the International Conference on Artificial Intelligence and Computer Vision (AICV2020). Springer.
https://public.ebookcentral.proquest.com/choice/publicfullrecord.aspx?p=6144671
[37] 汤圣君, 张韵婕, 李晓明, 等. 超体素随机森林与LSTM神经网络联合优化的室内点云高精度分类方法[J]. 武汉大学学报(信息科学版), 2023, 48(4): 525-533.
[38] 徐精诚, 陈学斌, 董燕灵, 等. 融合特征选择的随机森林DDoS攻击检测[J]. 计算机应用, 2023, 43(11): 3497-3503.
[39] Li, H., Lin, J., Lei, X. and Wei, T.X. (2022) Compressive Strength Prediction of Basalt Fiber Reinforced Concrete via Random Forest Algorithm. Materials Today Communications, 30, Article ID: 103117. [Google Scholar] [CrossRef
[40] Guo, Q., Zhang, J., Guo, S., et al. (2022) Urban Tree Classification Based on Object-Oriented Approach and Random Forest Algorithm Using Unmanned Aerial Vehicle (UAV) Multispectral Imagery. Remote Sensing, 14, Article 3885. [Google Scholar] [CrossRef