非一致决策表的决策树分析
Decision Tree Analysis for Inconsistent Decision Tables
DOI: 10.12677/CSA.2016.610074, PDF, HTML, XML, 下载: 1,802  浏览: 3,403  国家自然科学基金支持
作者: 许美玲, 乔莹, 曾静, 莫毓昌, 钟发荣:浙江师范大学数理与信息工程学院,浙江 金华
关键词: 数据挖掘非一致决策表多值决策贪心算法Data Mining Inconsistent Decision Table Many-Valued Decision Greedy Algorithm
摘要: 决策树技术在数据挖掘的分类领域中被广泛采用。采用决策树从一致决策表(即条件属性值相同的样本其决策值相同)中挖掘有价值信息的相关研究较为成熟,而对于非一致决策表(即条件属性值相同的样本其决策值不同)采用决策树进行数据挖掘是当前研究热点。本文基于贪心算法的思想,提出了一种非一致决策表的决策树分析方法。首先使用多值决策方法处理非一致决策表,将非一致决策表转换成多值决策表(即用一个集合表示样本的多个决策值);然后根据贪心选择思想,使用不纯度函数和不确定性相关指标设计贪心选择策略;最后使用贪心选择设计决策树构造算法实现决策树构造。通过实例说明了所提出的权值和贪心选择指标能够比已有的最大权值贪心选择指标生成规模更小的决策树。
Abstract: Decision tree is a widely used technique to discover patterns from consistent data set. But if the data set is inconsistent, where there are groups of examples with equal values of conditional attributes but different decisions (values of the decision attribute), then to discover the essential patterns or knowledge from the data set is challenging. Based on the greedy algorithm, we propose a new approach to construct a decision tree for inconsistent decision table. Firstly, an inconsistent decision table is transformed into a many-valued decision table. After that, we develop a greedy algorithm using “weighted sum” as the impurity and uncertainty measure to construct a decision tree for inconsistent decision tables. An illustration example is used to show that our “weighted sum” measure is better than the existing “weighted max” measure to reduce the size of constructed decision tree.
文章引用:许美玲, 乔莹, 曾静, 莫毓昌, 钟发荣. 非一致决策表的决策树分析[J]. 计算机科学与应用, 2016, 6(10): 597-606. http://dx.doi.org/10.12677/CSA.2016.610074

参考文献

[1] Moshkov, M. and Zielosko, B. (2011) Combinatorial Machine Learning-A Rough Set Approach, ser. Studies in Computational Intel-ligence. Springer, Berlin, Vol. 360.
[2] Tsoumakas, G. and Katakis, I. (2007) Multi-Label Classification: An Overview. International Journal of Data Warehousing and Mining (IJDWM), 3, 1-13.
http://dx.doi.org/10.4018/jdwm.2007070101
[3] Cour, T., Sapp, B., Jordan, C. and Taskar, B. (2009) Learning from Ambiguously Labeled Images. IEEE Conference on Computer Vision and Pattern Recognition, 20-25 June 2009, 919-926.
http://dx.doi.org/10.1109/CVPRW.2009.5206667
[4] Hüllermeier, E. and Beringer, J. (2006) Learning from Ambiguously Labeled Examples. Intelligent Data Analysis, 10, 419-439.
[5] Jin, R. and Ghahramani, Z. (2002) Learning with Multiple Labels. Neural Information Processing Systems, Vancouver, 9 December 2002, 897-904.
[6] Azad, M., Chikalov, I., Moshkov, M. and Zielosko, B. (2012) Greedy Algorithm for Construction of Decision Trees for Tables with Many-Valued Decisions. Proceedings of the 21st International Workshop on Concurrency, Specification and Programming, Berlin, 26-28 September 2012, CEUR-WS.org, Vol. 928.
[7] Azad, M. and Moshkov, M. (2014) Minimization of Decision Tree Average Depth for Decision Tables with Many- Valued Decisions. Procedia Computer Science, 35, 368-377.
http://dx.doi.org/10.1016/j.procs.2014.08.117
[8] Dembczynski, K., Greco, S., Kotlowski, W. and Roman, S. (2007) Optimized Generalized Decision in Dominance- Based Rough Set Approach. 2nd International Conference on Rough Sets and Knowledge Technology, Toronto, 14-16 May 2007, 118-125.
[9] Azad, M., Chikalov, I. and Moshkov, M. (2013) Three Approaches to Deal with Inconsistent Decision Tables— Comparison of Decision Tree Complexity. Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, 46-54.
[10] Mingers, J. (1988) An Empirical Comparison of Selection Measures for Decision-Tree Induction. Machine Learning, 3, 319-342.
http://dx.doi.org/10.1007/BF00116837
[11] Azad, M. and Moshkov, M. (2014) “Misclassification Error” Greedy Heuristic to Construct Decision Trees for Inconsistent Decision Tables. Proceedings of the International Conference on Knowledge Discovery and Information Retrieval, Rome, 184-191.
http://dx.doi.org/10.5220/0005059201840191