基于截枝随机森林模型的TMS特征选择方法研究
TMS Feature Selection Method for Truncate Based Random Forest Model
DOI: 10.12677/CSA.2020.102029, PDF,   
作者: 王 松*:楚雄师范学院经济与管理学院,云南 楚雄;周长敏:凯里学院大数据工程学院,贵州 凯里;周学广:海军工程大学信息安全系,湖北 武汉
关键词: 随机森林通信管理系统特征选择决策树Random Forest Transportation Management System Feature Selection Decision Tree
摘要: 国家电网省级通信管理系统TMS存在账物不一致、数据录入错误、缺失数据等问题,需要对大量数据进行分析处理并重新分类;为了提高分类学习的准确度,需要对数据的大量特征进行有效选择。本文将随机森林模型应用于特征选择,依据决策树数目、特征划分标准、特征划分候选子集中的最大特征数、特征重排后模型的准确率变化等多个参数,提出了一种优化的TMS系统数据的随机森林特征选择方法,通过实验进行了验证。
Abstract: TMS has some problems such as inconsistent accounts, wrong data input, missing data, and so on. It needs to analyze and re-classify a lot of data, and to improve the accuracy of classification learning, it needs to select a lot of data features effectively. In this paper, the stochastic forest model is applied to feature selection, according to the number of decision trees, the criteria of feature partition, the maximum feature number in the candidate subset of feature partition, the change of the accuracy of the model after feature rearrangement, etc. , an optimized random forest feature selection method for TMS data is proposed and verified by experiments.
文章引用:王松, 周长敏, 周学广. 基于截枝随机森林模型的TMS特征选择方法研究[J]. 计算机科学与应用, 2020, 10(2): 276-288. https://doi.org/10.12677/CSA.2020.102029

参考文献

[1] Quinlan, J.R. (1986) Induction of Decision Trees. Kluwer Academic Publishers, New York, 22-26. [Google Scholar] [CrossRef
[2] Breiman, L.I., Friedman, J.H., Olshen, R.A., et al. (1984) Classification and Regression Trees (CART). Encyclopedia of Ecology, 40, 582-588. [Google Scholar] [CrossRef
[3] Surhone, L.M., Tennoe, M.T., Henssonow, S.F., et al. (2010) ID3 Algo-rithm. Betascript Publishing, New York, 132-133.
[4] Steven, L. (1994) Book Review: C4.5: Programs for Machine Learning by J. Ross Quinlan. San Francisco, USA: Morgan Kauffman Publishers Inc., 1993. Machine Learning, 16, 87-92. [Google Scholar] [CrossRef
[5] Jiang, W. (2004) Process Consistency for Adaboost. Annals of Statistics, 32, 13-29. [Google Scholar] [CrossRef
[6] Breiman, L. (1996) Bagging Predictors. Machine Learning, 24, 123-140. [Google Scholar] [CrossRef
[7] Breiman, L. (2001) Random Forests. Machine Learning, 45, 5-32. [Google Scholar] [CrossRef
[8] Efron, B. and Tibshirani, R. (1986) Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy. Statistical Science, 1, 54-75. [Google Scholar] [CrossRef
[9] 胡志鹏, 颜秉勇, 彭亦功. 层次采样的代价敏感随机森林算法及其应用[J]. 计算机工程与设计, 2019, 40(12): 3361-3366.
[10] 李春生, 焦海涛, 刘澎, 等. 基于C4.5决策树分类算法的改进与应用[J]. 计算机技术与发展, 2020(4): 1-9.
[11] 刘凯, 郑山红, 蒋权, 等. 基于随机森林的自适应特征选择算法[J]. 计算机技术与发展, 2018, 28(9): 101-104.
[12] 杨晶, 廖翯, 妥建军. 面向智能电网应用的电力大数据关键技术[J]. 电子技术与软件工程, 2018(4): 173.
[13] 文武, 赵成, 赵学华, 等. 基于信息增益和萤火虫算法的文本特征选择[J]. 计算机工程与设计, 2019, 40(12): 3457-3462.
[14] 陈谌, 梁雪春. 基于基尼指标和卡方检验的特征选择方法[J]. 计算机工程与设计, 2019, 40(8): 2342-2345.
[15] 罗计根, 杜建强, 聂斌, 等. 一种聚类欠采样策略的随机森林优化方法[J]. 计算机工程与应用, 1-9. http://kns.cnki.net/kcms/detail/11.2127.TP.20191125.0924.002.html