基于机器学习的银行信用卡违约预测研究
Research on Bank Credit Card Default Prediction Based on Machine Learning
摘要: 信用卡业务是银行的核心业务,各大商业银行通过发行信用卡来抢占市场和发展客户。虽然信用卡业务给银行带来了高额利润,但信用卡的粗放式管理导致信用卡客户存在较高的违约率,给银行带来了极大的风险。因此,如何有效针对信用卡业务进行风险管理已经成为银行业的热点关注问题之一。本文采用机器学习的相关算法构建银行信用卡违约预测模型,预测信用卡用户次月的违约情况,辅助银行进行风险管理。具体地,本文通过逻辑回归、决策树、随机森林、自适应增强和梯度提升树这五类算法来构建信用卡违约预测模型并通过准确率等模型评价指标对比不同特征选择方式下五种模型的预测效果。本文使用某银行信用卡持卡人的相关数据进行实验,实验结果表明,相比于算法选择,不同的特征选择方式对于模型性能有更大的影响,其中,过滤式特征选择的适应性更强。
Abstract: Credit card business is the core business of Banks. Commercial Banks seize the market and develop customers by issuing credit cards. Although credit card business brings high profits to banks, extensive credit card management leads to high default rate of credit card customers, which brings great risks to banks. Therefore, how to effectively manage the risk of credit card business has become one of the hot issues in the banking industry. This paper uses machine learning related algorithms to construct a bank credit card default prediction model, predicts credit card users’ defaults in the next month, and assists banks in risk management. Specifically, this paper con-structs credit card default prediction models through logistic regression, decision tree, random forest, adaboost and gradient boosting decision tree, and compares the prediction effects of five models under different feature selection methods through evaluation indexes such as accuracy. In this paper, relevant data of credit card holders of a bank are used for experiments. The experi-mental results show that different feature selection methods have a greater impact on model per-formance than algorithm selection. Among them, the filter feature selection is more adaptable.
文章引用:单华玮. 基于机器学习的银行信用卡违约预测研究[J]. 数据挖掘, 2019, 9(4): 145-152. https://doi.org/10.12677/HJDM.2019.94018

参考文献

[1] 方匡南, 章贵军, 张惠颖. 基于Lasso-logistic模型的个人信用风险预警方法[J]. 数量经济技术经济研究, 2014, 31(2): 125-136.
[2] Venkatesh, A. and Jacob, S.G. (2016) Prediction of Credit-Card Defaulters: A Comparative Study on Performance of Classifiers. International Journal of Computer Applications, 145, 36-41.
[Google Scholar] [CrossRef
[3] Yeh, I.C. and Lien, C. (2009) The Comparisons of Data Mining Techniques for the Predictive Accuracy of Probability of Default of Credit Card Clients. Expert Systems with Applications, 36, 2473-2480.
[Google Scholar] [CrossRef
[4] Yang, S. and Zhang, H. (2018) Comparison of Several Data Mining Methods in Credit Card Default Prediction. Intelligent Information Management, 10, 115.
[Google Scholar] [CrossRef
[5] Hsu, T.C., Liou, S.T., Wang, Y.P., et al. (2019) Enhanced Recurrent Neural Network for Combining Static and Dynamic Features for Credit Card Default Prediction. ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hof, 12-17 May 2019, 1572-1576.
[Google Scholar] [CrossRef
[6] 朱健. 四种数据挖掘算法的信用卡违约识别对比研究[D]: [硕士学位论文]. 大连: 大连理工大学, 2017.
[7] 刘铭, 张双全, 何禹德. 基于改进型模糊神经网络的信用卡客户违约预测[J]. 模糊系统与数学, 2017, 31(1): 143-148.
[8] Bahnsen, A.C., Aouada, D., Stojanovic, A., et al. (2016) Feature Engineering Strategies for Credit Card Fraud Detection. Expert Systems with Applications, 51, 134-142.
[Google Scholar] [CrossRef
[9] Xu, P., Ding, Z. and Pan, M.Q. (2017) An Improved Credit Card Users Default Prediction Model Based on Ripper. 13th International Conference on Natural Computation, Fuzzy Sys-tems and Knowledge Discovery, Guilin, 29-31 July 2017, 1785-1789.
[Google Scholar] [CrossRef
[10] Breiman, L., Friedman, J.H., Olshen, R.A., et al. (1984) Clas-sification and Regression Trees. Wadsworth, Belmont.
[11] Breiman, L. (2001) Random Forests. Machine Learning, 45, 5-32.
[Google Scholar] [CrossRef
[12] Freund, Y., Schapire, R. and Abe, N. (1999) A Short Introduction to Boosting. Journal—Japanese Society for Artificial Intelligence, 14, 1612.
[13] Friedman, J.H. (2001) Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics, 29, 1189-1232.
[Google Scholar] [CrossRef