基于机器学习方法的企业贷款违约风险预测
Corporate Loan Default Risk Prediction Based on Machine Learning Method
DOI: 10.12677/MOS.2021.103088, PDF,    科研立项经费支持
作者: 陈旭岚, 庞建华*:广西科技大学理学院,广西 柳州;韩苏皖:南京审计大学统计与数学学院,江苏 南京
关键词: 机器学习企业贷款违约风险Machine Learning Enterprise Loans The Risk of Default
摘要: 研究企业的贷款违约风险不仅对金融机构解决“惜贷”问题和防范信用风险具有重要的现实意义,而且能为企业规范自身经营和改善财务状况提出有针对性的建议及措施。本文根据某机构的企业贷款违约数据对贷款违约风险进行研究,首先对原始数据进行缺失值处理、特征选择和不平衡数据处理,然后利用逻辑回归、随机森林、XGBoost和LightGBM四种机器学习方法对数据进行建模和分析并比较模型优劣,最后利用GBDT模型计算特征重要性。结果表明:1) 三种集成模型的预测效果显著优于单一模型,2) 在集成模型中LightGBM模型表现出了最好的预测性能,3) 企业的纳税情况和曾经获得的授信情况可以作为判断该企业是否会发生贷款逾期现象的重要参考。
Abstract: The research on the loan default risk of enterprises not only has important practical significance for financial institutions to solve the problem of “reluctant to lend” and prevent credit risks, but also can put forward targeted suggestions and measures for enterprises to standardize their own opera-tion and improve their financial situation. This paper, based on the enterprise loan default data of an organization studies the default risk of the enterprise, first of all to the original data missing value processing, feature selection and unbalanced data processing, and then uses four machine learning methods of logistic regression, random forests, XGBoost and LightGBM for data modeling and analysis model, and advantages and disadvantage are compared. Finally, GBDT model is used to calculate the importance of features. The results show that: 1) The prediction effect of the three in-tegrated models is significantly better than that of the single model; 2) LightGBM model shows the best prediction performance among the integrated models; 3) The tax payment and the credit ob-tained by the enterprise can be used as an important reference to judge whether the enterprise will have the loan overdue phenomenon.
文章引用:陈旭岚, 韩苏皖, 庞建华. 基于机器学习方法的企业贷款违约风险预测[J]. 建模与仿真, 2021, 10(3): 890-897. https://doi.org/10.12677/MOS.2021.103088

参考文献

[1] 朱景文. 基于遗传算法的上市公司债券违约风险识别方案策划[D]: [硕士学位论文]. 上海: 上海师范大学, 2020.
[2] 王晓菲, 刘继端, 詹梓雯, 刘彦清, 张燕玲, 周燕. 基于AI与传统风险度量模型下房地产企业信用风险度量分析[J]. 商讯, 2021(20): 89-91.
[3] 李玉占. 基于SMOTE-随机森林的互联网金融公司财务风险预警模型[J]. 经济研究导刊, 2020(33): 79-80.
[4] 潘永明, 王雅杰, 来明昭. 基于IG-SVM模型的供应链融资企业信用风险预测[J]. 南京理工大学学报, 2020, 44(1): 117-126.
[5] 郑建国, 李新. 基于SVM模型的企业信用风险评估研究[J]. 企业科技与发展, 2020(5): 220-221+224.
[6] 胡贤德, 曹蓉, 李敬明, 阮素梅, 方贤. 小微企业信用风险评估的IDGSO-BP集成模型构建研究[J]. 运筹与管理, 2017, 26(4): 132-139+148.
[7] 孔英会, 景美丽. 基于混淆矩阵和集成学习的分类方法研究[J]. 计算机工程与科学, 2012, 34(6): 111-117.
[8] Ke, G., Meng, Q., Finley, T., et al. (2017) LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Proceedings of the 31st Inter-national Conference on Neural Information Processing Systems, 12, 3149-3157.