基于机器学习的银行个人信用风险评估研究
Research on Bank Personal Credit Risk Assessment Based on Machine Learning
摘要: 本文运用CCF竞赛提供的中原银行个人信用贷款违约数据,进行了数据清洗和特征工程的工作,从初始的38个特征缩减到18个特征,结合5C理论和预期收入理论探究了影响银行个人信用风险的重要因素,经过特征重要性排序排名前五的因素是:信贷周转余额合计、贷款发放日期据初始日期天数、借款人贷款评分平均分、当前贷款利率和匿名变量f0。为提升银行对个人信用风险评估的准确率,本文基于随机森林模型比较了SMOTE、随机欠采样和SMOTEENN三种非平衡数据的处理方法进行实验,SMOTEENN组合采样的效果最好;然后建立了决策树、随机森林、AdaBoost和LightGBM共4个机器学习模型,结果表明平衡后LightGBM的准确率最高,达到了96.1%。
Abstract: In this paper, using the personal credit loan default data of Zhongyuan Bank provided by the CCF competition, the data cleaning and feature engineering was carried out and the initial 38 features were reduced to 18 features. Then the important factors affecting the bank personal credit risk were explored by combining the 5C theory and expected income theory, and the top five factors ranked by feature importance were: total credit working balance, loan disbursement date accord-ing to the initial date days, borrower’s average loan score, current loan interest rate and anonymous variable f0. In order to improve the accuracy of bank personal credit risk assessment, this paper compared three methods of processing unbalanced data, SMOTE, random under sampling and SMOTEENN, based on the random forest model, and SMOTEENN combined sampling had the best effect; then a total of four machine learning models, decision tree, random forest, AdaBoost and LightGBM, were established and it’s showed that LightGBM had the highest accuracy rate after bal-ancing, reaching 96.1%.
文章引用:薛琦, 罗鄂湘. 基于机器学习的银行个人信用风险评估研究[J]. 建模与仿真, 2023, 12(4): 3747-3755. https://doi.org/10.12677/MOS.2023.124343

参考文献

[1] 顾洲一, 胡丽娟. 机器学习视角下商业银行客户信用风险评估研究[J]. 金融发展研究, 2022(1): 79-84.
[2] Wang, T., Liu, R. and Qi, G. (2022) Multi-Classification Assessment of Bank Personal Credit Risk Based on Multi- Source Information Fusion. Expert Systems with Applications, 191, Article ID: 116236. [Google Scholar] [CrossRef
[3] 张丽颖, 杨若瑾. 基于机器学习的个人贷款违约预测模型的应用研究[J]. 金融监管研究, 2022(6): 46-59.
[4] Risk, D.D. (1941) Elements in Consumer Instalment Financing. National Bu-reau of Economic Research, New York.
[5] Davis, R.H., Edelman, D.B. and Gammerman, A.J. (1992) Machine-Learning Algorithms for Credit-Card Applications. IMA Journal of Management Mathematics, 4, 43-51. [Google Scholar] [CrossRef
[6] 张承钿, 何浩龙, 许建龙. 基于异构集成模型的个人信用评估研究[J]. 计算机仿真, 2022, 39(3): 485-489.
[7] Dastile, X. and Celik, T. (2021) Making Deep Learning-Based Predictions for Credit Scoring Explainable. IEEE Access, 9, 50426-50440. [Google Scholar] [CrossRef
[8] 罗方科, 陈晓红. 基于Logistic回归模型的个人小额贷款信用风险评估及应用[J]. 财经理论与实践, 2017, 38(1): 30-35.
[9] Breiman, L., Friedman, J., Olshen, R., et al. (1984) Classification and Regression Trees (CART). Biometrics, 40, 358-361. [Google Scholar] [CrossRef