基于随机森林算法的欧洲土壤重金属污染研究
Study on Heavy Metal Pollution in European Soil Based on Random Forest Algorithms
DOI: 10.12677/SA.2019.82024, PDF,   
作者: 宋申辉, 杨瑞琰:中国地质大学(武汉)数学与物理学院,湖北 武汉;谢淑云*:中国地质大学(武汉)地球科学学院,湖北 武汉
关键词: 随机森林节点分裂算法核主成分分析重金属污染 Random Forest Node Splitting Algorithm Kernel Principal Component Analysis Heavy Metal Pollution
摘要: 在大数据背景下,为提高评价土壤中重金属污染的效率,引入机器学习中的随机森林算法。本文以欧洲表层土壤为例,建立Random forest模型,对As、Co、Cr、Cu、Ni、Pb、Zn 7种重金属的污染程度进行分类;然后通过加入核主成分分析对模型进行改进,建立KPCA-Random forest模型,并从分类精度和运行时间两个维度上进行对比。结果显示:改进后模型的分类精确度由93.41%提高到94.67%,运行时间从12.530601 s缩减到9.437811 s。最后本文对建立的随机森林模型的优缺点进行了评价,并提出今后的研究方向。
Abstract: Under the background of large data, in order to improve the efficiency of evaluating heavy metal pollution in soil, a random forest algorithm in machine learning is introduced. In this paper, Random forest model was established to analyze the pollution degree of As, Co, Cr, Cu, Ni, Pb and Zn in top soil of Europe. Then, the KPCA-Random forest model is established by adding the kernel principal component analysis to improve the model, and the classification accuracy and running time are compared. The results show that the classification accuracy of the improved model is improved from 93.41% to 94.67%, and the running time is reduced from 12.530601 s to 9.437811 s. Finally, the advantages and disadvantages of the Random forest model are evaluated, and the future research directions are also proposed.
文章引用:宋申辉, 谢淑云, 杨瑞琰. 基于随机森林算法的欧洲土壤重金属污染研究[J]. 统计学与应用, 2019, 8(2): 218-226. https://doi.org/10.12677/SA.2019.82024

参考文献

[1] Rivera, M.B., Giráldez, M.I. and Fernández-Caliani, J.C. (2016) Assessing the Environmental Availability of Heavy Metals in Geogenically Contaminated Soils of the Sierra de Aracena Natural Park (SW Spain). Is There a Health Risk? Science of the Total Environment, 560-561, 254-265. [Google Scholar] [CrossRef] [PubMed]
[2] Batjargal, T., Otgonjargal, E., Baek, K. and Yang, J.S. (2012) Assessment of Metals Contamination of Soils in Ulaanbaatar, Mongolia. Journal of Hazardous Materials, 184, 872-876. [Google Scholar] [CrossRef] [PubMed]
[3] Alavi, A.H., Gandomi, A.H. and Lary, D.J. (2016) Progress of Machine Learning in Geosciences: Preface. Geoscience Frontiers, 7, 1-2. [Google Scholar] [CrossRef
[4] Lary, D.J., Alavi, A.H., Gandomi, A.H. and Walker, A.L. (2016) Machine Learning in Geosciences and Remote Sensing. Geoscience Frontiers, 7, 3-10. [Google Scholar] [CrossRef
[5] Anifowose, F.A., Labadin, J. and Abdulraheem, A. (2017) Ensem-ble Machine Learning: An Untapped Modeling Paradigm for Petroleum Reservoir Characterization. Journal of Petroleum Science and Engineering, 151, 480-487. [Google Scholar] [CrossRef
[6] Scarpone, C., Schmidt, M.G., Bulmer, C.E. and Knudby, A. (2017) Semi-Automated Classification of Exposed Bedrock Cover in British Columbia’s Southern Mountains Using a Random Forest Approach. Geomorphology, 285, 214-224. [Google Scholar] [CrossRef
[7] Breiman, L. (2001) Random Forests. Machine Learning, 45, 5-32. [Google Scholar] [CrossRef
[8] 王浩. 基于随机森林的网络攻击检测方法[J]. 软件, 2016, 37(11): 60-63.
[9] 彭程, 文雨, 李楚畅. 基于决策树算法的医疗大数据[J]. 信息技术与信息化, 2018, 222(9): 70-74.
[10] Lado, L.R., Heng, T. and Renter, H.I. (2008) Heavy Metals in European Soils: A Geostatistical Analysis of the FOREGS Geochemical Database. Geoderma, 48, 189-199. [Google Scholar] [CrossRef
[11] 赵帅, 李妍君, 熊伟丽. 基于KPCA-Bagging的高斯过程回归建模方法及应用[J]. 控制工程, 2019, 26(1): 131-136.
[12] Brouers, F. and Al-Musawi, T.J. (2018) Brouers-Sotolongo Fractal Kinetics versus Fractional Derivative Kinetics: A New Strategy to Analyze the Pollutants Sorption Kinetics in Porous Materials. Journal of Hazardous Materials, 350, 162-168. [Google Scholar] [CrossRef] [PubMed]