基于银行大数据的用户信用风险预测模型
User Credit Risk Prediction Model based on Big Data
摘要: 信用风险是银行经营的主要风险,影响银行的发展,有必要建立信用风险预测模型,帮助银行规避风险、减少损失。本文以某家商业银行的八万条千维数据作为研究对象,采用“分组主成分”的方法对千维变量进行降维的数据预处理,运用Logistic回归和随机森林建立信用风险预测模型。两种模型的分析结果显示,客户的信用卡级别、职业、价值等级、个人业务基本情况、存款及本外币持有额情况对违约风险预测的影响较大。Logistic回归曲线下面积为0.847,预测准确率为75%;随机森林曲线下面积为0.848,预测准确率为85%,相较于以往的研究,两个模型的预测准确率都有明显提高。实际应用时,两种模型可以相互结合,充分发挥二者的优越性。
Abstract: Credit risk is the main risk of bank operation and affects the development of bank. It is necessary to establish credit risk prediction model to help banks avoid risks and reduce losses. In this paper, 80,000 pieces of thousand dimensional data of a commercial bank are taken as the research object, and the method of “group principal component” is used to preprocess the data of thousand dimensional variables. Then, the credit risk prediction model is established by using Logistic regression and random forest respectively. The analysis results of the two models show that the customer’s credit card level, occupation, value level, basic information of personal business, deposits and foreign current holdings have great influence on predicting the probability of default. The area under the curve of logistic regression model is 0.847, and the prediction accuracy is 75%; the area under the curve of the random forest model is 0.848, and the prediction accuracy is 85%. Compared with previous studies, the prediction accuracy of the two models is significantly improved. In practical application, the two models can be combined with each other to give full play to their advantages.
文章引用:胡竞文, 刘潇, 冯哲. 基于银行大数据的用户信用风险预测模型[J]. 统计学与应用, 2020, 9(4): 582-592. https://doi.org/10.12677/SA.2020.94062

参考文献

[1] 赵晓菊. 信用风险管理[M]. 上海: 上海财经大学出版社, 2008(5): 10-22.
[2] 庞素琳, 王燕鸣. 判别分析模型在信用评价中的应用[J]. 南方经济, 2006(3): 113-119.
[3] 迟国泰, 张亚京, 石宝峰. 基于Probit回归的小企业债信评级模型及实证[J]. 管理科学学报, 2016, 19(6): 136-156.
[4] Milad, M. and Vural, A. (2015) Risk Assessment in Social Lending via Random Forest. Expert Systems with Applications, 42, 4621-4631.
[Google Scholar] [CrossRef
[5] Tsang, S., Koh, Y.S., Dobbie, G., et al. (2014) Detecting Online Auction Shilling Frauds Using Supervised Learning. Expert Systems with Applications, 41, 3027-3040.
[Google Scholar] [CrossRef
[6] 陈为民. 基于支持向量机的信用卡信用风险管理模型与技术研究[D]: [博士学位论文]. 长沙: 湖南大学, 2009.
[7] 任晓萌. 基于逻辑样条回归的信用风险预测模型[D]: [硕士学位论文]. 大连: 大连理工大学, 2019.
[8] 王海雷. 面向高维数据的特征学习算法研究[D]: [博士学位论文]. 合肥: 中国科学技术大学, 2019.
[9] 张婷婷. Logistic回归及其相关方法在个人信用评分中的应用[D]: [硕士学位论文]. 太原: 太原理工大学, 2017.
[10] 张亚琴. 基于集成学习的信用风险预测研究[D]: [硕士学位论文]. 兰州: 兰州大学, 2019.
[11] 2019SAS大赛官方. “扬子江新金融杯”2019年SAS(中国)高校数据分析大赛暨首届国际邀请赛[Z], 2019.
[12] 李太顺, 刘沛. ROC曲线绘制和曲线下面积比较的SAS宏包[J]. 中国卫生统计, 2018, 35(2): 302-304+309.
[13] 盖曦, 乔龙威. 基于主成分分析法的我国商业银行系统性风险的度量[J]. 长沙大学学报, 2013, 27(5): 100-103.