一种基于分类算法集成学习模型的金融信贷违约预测
An Ensemble Learning Model for Predicting Financial Credit Default Based on Classification Algorithm
DOI: 10.12677/mos.2024.132169, PDF,   
作者: 肖家鑫, 于莲芝:上海理工大学光电信息与计算机工程学院,上海
关键词: 集成学习金融信贷分类算法元学习器Integrated Learning Financial Credit Classification Algorithm Meta-Learner
摘要: 随着金融市场的不断发展,金融信贷业务的激增也导致了信用风险的不断增加。为了应对这一挑战,传统的风险评估方法已经不能满足实际需求。目前集成模型成为违约问题研究的热点,通过整合多个分类算法的预测结果,充分利用各个算法的优势,以提高预测准确性和鲁棒性。本文研究了双阶段异构堆叠集成模型(DH-SEM)在金融信贷违约中的应用。该模型包括两个关键阶段,在第一阶段,选择了SVM、KNN、朴素贝叶斯作为三个监督基础学习器;在第二阶段,采用了随机森林作为元学习器来预测分类结果。对于金融信贷违约预测,DH-SEM模型预测准确率为0.886,相比传统的模型预测的更加准确。
Abstract: With the continuous development of the financial market, the proliferation of financial credit business has also led to a continuous increase in credit risk. In order to cope with this challenge, traditional risk assessment methods can no longer meet the actual demand. Currently integrated models have become a hotspot in the study of default problems, which make full use of the advantages of each algorithm by integrating the prediction results of multiple classification algorithms in order to improve the prediction accuracy and robustness. In this paper, we study the application of two-stage heterogeneous stacked integration model (DH-SEM) in financial credit default. The model consists of two key phases; in the first phase, SVM, KNN, and plain Bayes are selected as the three supervised base learners; in the second phase, Random Forest is employed as the meta-learner to predict the classification results. For financial credit default prediction, the prediction accuracy of DH-SEM model is 0.886, which is more accurate compared to the traditional model prediction.
文章引用:肖家鑫, 于莲芝. 一种基于分类算法集成学习模型的金融信贷违约预测[J]. 建模与仿真, 2024, 13(2): 1797-1813. https://doi.org/10.12677/mos.2024.132169

参考文献

[1] Altman, E.I. (1968) Discriminant Analysis and the Prediction of Corporate Bankruptcy. The Journal of Finance, 23, 589-609. [Google Scholar] [CrossRef
[2] Wiginton, J.C. (1980) A Note on the Comparison of Logit and Discriminant Models of Consumer Credit Behavior. Journal of Financial and Quantitative Analysis, 15, 757-770. [Google Scholar] [CrossRef
[3] West, D. (2000) Neural Network Credit Scoring Models. Computers & Operations Research, 27, 1131-1152. [Google Scholar] [CrossRef
[4] Huang, C.L., Chen, M.C. and Wang, C.J. (2007) Credit Scoring with a Data Mining Approach Based on Support Vector Machines. Expert Systems with Applications, 33, 847-856. [Google Scholar] [CrossRef
[5] Lee, T.S., Chiu, C.C., Chou, Y.C. and Lu, C.J. (2006) Mining the Customer Credit Using Classification and Regression Tree and Multivariate Adaptive Regression Splines. Computational Statistics & Data Analysis, 50, 1113-1130. [Google Scholar] [CrossRef
[6] Finlay, S. (2011) Multiple Classifier Architectures and Their Application to Credit Risk Assessment. European Journal of Operational Research, 210, 368-378. [Google Scholar] [CrossRef
[7] 李萌. Logit模型在商业银行信用风险评估中的应用研究[J]. 管理科学, 2005(2): 33-38.
[8] 郑昱. 基于Probit模型的个人信用风险实证研究[J]. 上海金融, 2009(10): 85-89.
[9] 姚潇, 余乐安. 模糊近似支持向量机模型及其在信用风险评估中的应用[J]. 系统工程理论与实践, 2012, 32(3): 549-554.
[10] 任潇, 姜明辉, 车凯, 等. 个人信用评估组合模型选择方案研究[J]. 哈尔滨工业大学学报, 2016, 48(5): 67-71.
[11] 方匡南, 吴见彬, 朱建平, 等. 信贷信息不对称下的信用卡信用风险研究[J]. 经济研究, 2010, 45(S1): 97-107.
[12] 白鹏飞, 安琪, Nicolaas Fransde ROOIJ, 李楠, 周国富. 基于多模型融合的互联网信贷个人信用评估方法[J]. 华南师范大学学报(自然科学版), 2017, 49(6): 119-123.
[13] 朱丽云. 基于LightGBM算法的个人信用风险评估研究[D]: [硕士学位论文]. 广州: 华南理工大学, 2020.
[14] 刘晓晨. 基于集成策略的个人信用评估模型[D]: [硕士学位论文]. 湘潭: 湘潭大学, 2020.
[15] Taran, S. and Pandey, A. (2023) A Dual-Staged Heterogeneous Stacked Ensemble Model for Gender Recognition Using Speech Signal. Applied Acoustics, 205, Article ID: 109271. [Google Scholar] [CrossRef
[16] Breiman, L. (2001). Random Forests. Machine Learning, 45, 5-32.[CrossRef