基于粗糙集属性约简与多种分类模型的个人信用评估
Personal Credit Risk Assessment Based on Rough Set Attribute Reduction and Multiple Classification Models
DOI: 10.12677/FIN.2018.84016, PDF,    国家科技经费支持
作者: 曹 宁*, 李淑锦:杭州电子科技大学,经济学院,浙江 杭州
关键词: 个人信用评估数据挖掘粗糙集属性约简Personal Credit Assessment Data Mining Rough Set Attribute Reduction
摘要: 个人信贷是商业银行最重要的业务之一,而该业务存在着信息不对称,处于信息劣势方的银行面临着巨大的信用风险。“知识就是一种对对象进行分类的能力”,依据这一观点本文提出“个人信用评估就是一种对借款人信用的分类方法”。SMOTE改善了德国信用数据集类别不平衡的情况,布尔逻辑离散化技术客观地对一些连续的指标离散化,克服了人工离散化的主观性。基于粗糙集遗传属性约简算法对德国信用数据集的评估指标约简效果显著,指标由20个约简为10个,约简率高达50%,实现了在分类器性能近似不变的前提下,提高分类器的可解释性、缓解了过拟合、增强了分类器的泛化能力,并且大大缩短了训练分类器的耗时。经过属性约简,C4.5决策树的性能有所提高。对于经过属性约简的德国信用数据集,Logistic回归是最优的模型。
Abstract: Personal credit is one of the most important businesses of commercial banks, and there is information asymmetry in this business, and banks that are in information disadvantaged face enormous credit risks. “Knowledge is the ability to classify objects”, based on this view, this article pro-poses that “personal credit assessment is a method of classifying the borrower’s credit”. SMOTE has improved the unbalanced category of German credit data sets. Boolean reasoning approaches objectively discretize some continuous indicators, overcoming the subjectivity of artificial discretization. Based on the rough set genetic attribute reduction algorithm, the evaluation index of German credit data set has a significant reduction effect. The index is reduced from 20 to 10 and the reduction rate is as high as 50%, which realizes that the performance of the classifier is approximately constant, improving the interpretability of the classifier, alleviating overfitting, enhancing the generalization capability of the classifier, and greatly reducing the time-consuming training of the classifier. After attribute reduction, the performance of the C4.5 decision tree has improved. Logistic regression is the optimal model for attribute reduction of German credit data sets.
文章引用:曹宁, 李淑锦. 基于粗糙集属性约简与多种分类模型的个人信用评估[J]. 金融, 2018, 8(4): 137-144. https://doi.org/10.12677/FIN.2018.84016

参考文献

[1] Jiawei Han, Micheline Kamber, Jian Pei. 数据挖掘: 概念与技术[M]. 第3版. 北京: 机械工业出版社, 2012.
[2] Mak, B. and Munakata, T. (2002) Rule Extraction from Expert Heuristics: A Comparative Study of Rough Sets with Neural Networks and ID3. European Journal of Operational Research, 136, 212-229. [Google Scholar] [CrossRef
[3] Huang, C.L., Chen, M.C. and Wang, C.J. (2007) Credit Scoring with a Data Mining Approach Based on Support Vector Machines. Expert Systems with Applications, 33, 847-856. [Google Scholar] [CrossRef
[4] Ping, Y. and Lu, Y. (2011) Neighborhood Rough Set and SVM Based Hybrid Credit Scoring Classifier. Expert Systems with Applications, 38, 11300-11304. [Google Scholar] [CrossRef
[5] 杜婷. 基于粗糙集支持向量机的个人信用评估模型[J]. 统计与决策, 2012(1): 94-96.
[6] 胡来丰. 基于粗糙集BP神经网络个人信用评估模型[D]: [硕士学位论文]. 成都: 电子科技大学, 2015.
[7] 杨蕴涵. 多种分类模型在个人信用评估中的应用[D]: [硕士学位论文]. 重庆: 重庆大学, 2015.
[8] 陈慧. 基于数据挖掘的个人信用风险评估单一模型与集成模型的研究[D]: [硕士学位论文]. 南宁: 广西大学, 2016.
[9] 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016.
[10] 王国胤. Rough集理论与知识获取[M]. 西安: 西安交通大学出版社, 2001.
[11] 颜艳, 杨慧中. 基于遗传算法的粗糙集属性约简算法[J]. 计算机工程与应用, 2007, 43(31): 156-158.