基于Lasso和CatBoost融合模型的信用风险评估研究
Credit Risk Assessment Based on Lasso and CatBoost Fusion Model
DOI: 10.12677/AAM.2021.106229, PDF,   
作者: 牛 丽:太原理工大学数学学院,山西 晋中;李东喜:太原理工大学大数据学院,山西 晋中
关键词: Lasso算法CatBoost算法集成算法信用风险Lasso Algorithm CatBoost Algorithm Integrated Algorithm Credit Risk
摘要: 准确高效的信用评估模型能够提高预测风险的能力,对信用贷款风险起到有效的防范作用。因此,我们提出了一种Lasso和CatBoost的融合模型及Lasso-CatBoost算法并将其应用于信贷数据。该算法先用Lasso进行特征选择,再用CatBoost结合重要特征进行信用风险分析,达到用较少的变量得到较高预测准确率的目的。为了验证该算法的高效性及准确性,我们以某银行信用数据为样本,在对数据进行预处理后,运用Lasso-CatBoost算法对该数据做信用风险分析。结果表明,以AUC作为评价指标,与Logistics回归算法以及RandomForest、Adaboost、XGBoost、LightGBM等集成算法相比,Lasso-CatBoost算法用到的特征最少且预测的准确率最高,说明了我们提出的算法在信用风险分析方面表现最好。
Abstract: Accurate and efficient credit evaluation model can improve the ability to predict the risk, and play an effective role in preventing credit loan risk. Therefore, we propose a fusion model of Lasso and Catboost and Lasso-Catboost algorithm and apply it to credit data. Firstly, Lasso is used for feature selection. Then CatBoost is used to analyze credit risk by combing the selected important features, which obtains higher prediction accuracy with fewer variables. In order to verify the efficiency and accuracy of the algorithm, we take the credit data of a bank as a sample. After preprocessing the data, we adopt Lasso-CatBoost algorithm to analyze the credit risk of the data. The results show that Lasso-CatBoost algorithm uses the least features and has the highest prediction accuracy, compared to Logistic regression algorithm and Randomforest, AdaBoost, Xgboost, Lightgbm and other integrated algorithms, which shows that our algorithm has the best performance in credit risk analysis.
文章引用:牛丽, 李东喜. 基于Lasso和CatBoost融合模型的信用风险评估研究[J]. 应用数学进展, 2021, 10(6): 2194-2205. https://doi.org/10.12677/AAM.2021.106229

参考文献

[1] 于鸣燕. 人工神经网络在金融领域信用风险评估中的应用[D]: [硕士学位论文]. 南京: 南京理工大学, 2007.
[2] 刘冉. 基于神经网络的个人信用评估模型的研究[D]: [硕士学位论文]. 大连: 大连海事大学, 2007.
[3] Durand, D. (1941) Risk Elements in Consumer Installment Financing. National Bureau of Economic Research, New York.
[4] Fisher, R. (1936) Linear Discriminant Analysis. Annals of Eugenics, 7, 179. [Google Scholar] [CrossRef
[5] Wiginton, J.C. (1980) A Note on the Comparison of Logit and Discriminant Models of Consumer Credit Behavior. Journal of Financial and Quantitative Analysis, 15, 757-770. [Google Scholar] [CrossRef
[6] 夏利宇, 张勇, 鲁强, 汤广瑞. 结合XGBoost算法和Logistic回归的信用评级方法[J]. 征信, 2019, 37(11): 56-59.
[7] 罗昊. 基于自适应LASSO变量选择的Logistic信用评分模型研究[D]: [硕士学位论文]. 南京: 东南大学, 2016.
[8] 李海超, 王开军. 基于LASSO回归模型的网贷借款成功影响因素挖掘[J]. 计算机系统应用, 2017, 26(7): 204-209.
[9] Odom, M.D. and Sharda, R. (1990) A Neural Network Model for Bankruptcy Prediction. IEEE International Joint Conference on Neural Networks, Vol. 2, 163-168. [Google Scholar] [CrossRef
[10] Baesens, B., Gestel, T., et al. (2003) Benchmarking State-of-the-Art Classification Algorithms for Credit Scoring. Journal of the Operational Research Society, 54, 627-635. [Google Scholar] [CrossRef
[11] Yeh, H.C., Yang, M.L., et al. (2007) An Empirical Study of Credit Scoring Model for Credit Card. Second International Conference on Innovative Computing, Information and Control, Kumamoto, 5-7 September 2007, 216. [Google Scholar] [CrossRef
[12] Van Sang, H., Nam, N.H., et al. (2016) A Novel Credit Scoring Prediction Model Based on Feature Selection Approach and Parallel Random Forest. Indian Journal of Science and Technology, 9, 23-33. [Google Scholar] [CrossRef
[13] Jena, S.K., Kumar, A., et al. (2017) Banking Credit Scoring Assessment Using Predictive K-Nearest Neighbour (PKNN) Classifier. In: Handbook of Research on Intelligent Techniques and Modeling Applications in Marketing Analytics, IGI Global, Hershey, 332-350. [Google Scholar] [CrossRef
[14] Xia, Y., Liu, C., et al. (2017) A Boosted Decision Tree Approach Using Bayesian Hyper-Parameter Optimization for Credit Scoring. Expert Systems with Applications, 33, 225-241. [Google Scholar] [CrossRef
[15] Ma, X., Sha, J., et al. (2018) Study on a Prediction of P2P Network Loan Default Based on the Machine Learning LightGBM and XGboost Algorithms According to Different High Dimensional Data Cleaning. Electronic Commerce Research and Applications, 31, 24-39. [Google Scholar] [CrossRef