改进SMOTE算法在Logistic回归信用评分模型中的应用
Application of Improved SMOTE Algorithm in Logistic Regression Credit Scoring Model
DOI: 10.12677/HJDM.2021.112006, PDF,   
作者: 许芷慧, 杨立洪:华南理工大学数学学院,广东 广州
关键词: SMOTE算法过采样变量权重Logistic回归SMOTE Algorithm Sampling Feature Weighting Logistic Regression
摘要: 信用评分模型是商业银行贷前审批的重要应用模型,它通过提前识别出高风险客户来降低银行遭受信贷违约和欺诈的风险。Logistic回归模型作为最广泛使用的信用评分模型,对于信贷数据样本不平衡的特点较为敏感,若不改善样本不平衡问题,将会使模型的分类性能欠佳。为此,本文结合Logistic回归原理,提出了考虑变量重要性来合成辅助样本的改进SMOTE过采样算法(FW_SMOTE),通过与传统SMOTE、一些经典的改进SMOTE算法,如Borderline-SMOTE和ADASYN做实验对比,发现FW_SMOTE过采样算法使Logistic回归信用评分模型的效果有所改善,具有一定的应用价值。
Abstract: Credit scoring model is an important application model for pre-loan approval of commercial banks. It can help the bank reduce the risk of credit default and fraud by identifying high-risk customers in advance. Logistic regression model, as the most widely used credit scoring model, is sensitive to the imbalance of credit data samples. If the problem of samples imbalance is not improved, the classifi-cation performance of the model will be poor. To this end, combined with Logistic regression princi-ple, we propose an improved SMOTE algorithm which produces the auxiliary sample through the method of feature weighting synthesis (FW_SMOTE), and compare it with traditional SMOTE, some classic improved SMOTE algorithm, such as Borderline-SMOTE and ADASYN by experiment contrast, finding that FW_SMOTE makes the Logistic regression performance of credit scoring model improve and has a certain application value.
文章引用:许芷慧, 杨立洪. 改进SMOTE算法在Logistic回归信用评分模型中的应用[J]. 数据挖掘, 2021, 11(2): 50-58. https://doi.org/10.12677/HJDM.2021.112006

参考文献

[1] 向鸿鑫, 杨云. 不平衡数据挖掘方法综述[J]. 计算机工程与应用, 2019, 55(4): 1-16.
[2] Chawla, N.V., Bowyer, K.W., Hall, L.O., et al. (2002) SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research, 16, 321-357. [Google Scholar] [CrossRef
[3] 石洪波, 陈雨文, 陈鑫. SMOTE过采样及其改进算法研究综述[J]. 智能系统学报, 2019, 14(6): 1073-1083.
[4] Han, H., Wang, W.-Y. and Mao, B.-H. (2005) Bor-derline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. Advances in Intelligent Computing. Springer, Berlin, Heidelberg. [Google Scholar] [CrossRef
[5] He, H., Bai, Y., Garcia, E.A., et al. (2008) ADASYN: Adaptive Syn-thetic Sampling Approach for Imbalanced Learning. IEEE International Joint Conference on Neural Networks, 1322-1328.
[6] Zhu, T., Lin, Y. and Liu, Y. (2017) Synthetic Minority Oversampling Technique for Multiclass Imbal-ance Problems. Pattern Recognition, 72, 327-340. [Google Scholar] [CrossRef
[7] Li, X., Zou, B., Wang, L., Zeng, M., Yue, K., Wei, F., et al. (2015) A Novel LASSO-Based Feature Weighting Selection Method for Microarraydata Classification. Proceedings of 2015 IET International Conference on Biomedical Image and Signal Pro-cessing, Beijing, 1-5.
[8] 廖芹, 郝志峰, 陈志宏. 数据挖掘与数学建模[M]. 北京: 国防工业出版社, 2010: 24-28.
[9] 梅子行. 智能风控[M]. 北京: 机械工业出版社, 2020: 28-33.