基于集成学习的汽车保险反欺诈预测研究
Research on Anti-Fraud Prediction of Auto Insurance Based on Ensemble Learning
DOI: 10.12677/aam.2025.144197, PDF,   
作者: 蔚全爱:青岛大学数学与统计学院,山东 青岛
关键词: 集成学习保险欺诈预测机器学习Ensemble Learning Insurance Fraud Prediction Machine Learning
摘要: 近年来,我国保险行业快速发展,但随之而来的车险欺诈问题日益突出,严重威胁企业运营和市场稳定。本文基于车险欺诈数据,结合事故、赔付和客户特征,分别构建随机森林、GBDT、LDA和LGBM四种单模型进行预测分析,其中LGBM模型表现最优。在此基础上,首先采用传统的Stacking集成学习方法融合上述模型,提升了整体预测性能。随后,为进一步缓解可能存在的过拟合问题,本文借鉴残差网络的思想,将第一层模型输出的预测概率与原始特征直接拼接作为第二层模型输入,实验发现模型性能有一定波动,表现不稳定。最终,本文利用PCA技术对融合特征进行优化,并动态调整融合权重,实现了模型的深度融合,最终预测准确率提高至0.870,显著提升了车险欺诈行为的识别效果,为保险公司的风险控制提供了有效支持。
Abstract: In recent years, China’s insurance industry has experienced rapid growth; however, automobile insurance fraud has concurrently become increasingly prevalent, posing significant threats to business operations and market stability. In this study, automobile insurance fraud data, including accident information, compensation records, and customer characteristics, are used to develop four individual prediction models: Random Forest, GBDT, LDA, and LGBM. Among these, the LGBM model demonstrated superior performance. Subsequently, a traditional stacking ensemble approach was adopted to integrate these base models, achieving enhanced prediction accuracy. To further address potential overfitting issues, this study initially borrowed from residual network methodology, directly concatenating original features with predicted probabilities from the first-layer models as input for the second-layer model. However, experiments revealed inconsistent improvements, with performance fluctuations. Ultimately, PCA-based dimensionality reduction and a dynamic weighting strategy were introduced, achieving a deeper fusion of features. This improved ensemble strategy raised prediction accuracy to 0.870, significantly enhancing automobile insurance fraud detection performance and providing valuable support for risk management within insurance companies.
文章引用:蔚全爱. 基于集成学习的汽车保险反欺诈预测研究[J]. 应用数学进展, 2025, 14(4): 688-700. https://doi.org/10.12677/aam.2025.144197

参考文献

[1] 陈辉, 董洪斌. 中国保险业可持续发展报告2022 [M]. 北京: 中国经济出版社, 2022.
[2] 孙祁祥. 中国保险业发展报告[M]. 北京: 北京大学出版社, 2019.
[3] 魏丽, 戴稳胜, 陈泽. 现代保险制度建设[M]. 北京: 中国人民大学出版社, 2024.
[4] Artı́s, M., Ayuso, M. and Guillén, M. (1999) Modelling Different Types of Automobile Insurance Fraud Behaviour in the Spanish Market. Insurance: Mathematics and Economics, 24, 67-81. [Google Scholar] [CrossRef
[5] Nabrawi, E. and Alanazi, A. (2023) Fraud Detection in Healthcare Insurance Claims Using Machine Learning. Risks, 11, Article No. 160. [Google Scholar] [CrossRef
[6] 闫春, 李亚琪, 孙海棠. 基于蚁群算法优化随机森林模型的汽车保险欺诈识别研究[J]. 保险研究, 2017(6): 114-127.
[7] Heymans, M.W. and Twisk, J.W.R. (2022) Handling Missing Data in Clinical Research. Journal of Clinical Epidemiology, 151, 185-188. [Google Scholar] [CrossRef] [PubMed]
[8] Brandt, J. and Lanzén, E. (2021) A Comparative Review of SMOTE and ADASYN in Imbalanced Data Classification.
[9] Hu, J. and Szymczak, S. (2023) A Review on Longitudinal Data Analysis with Random Forest. Briefings in Bioinformatics, 24, bbad002. [Google Scholar] [CrossRef] [PubMed]
[10] Yu, D. and Xiang, B. (2023) Discovering Topics and Trends in the Field of Artificial Intelligence: Using LDA Topic Modeling. Expert Systems with Applications, 225, Article ID: 120114. [Google Scholar] [CrossRef
[11] Yan, J., Xu, Y., Cheng, Q., Jiang, S., Wang, Q., Xiao, Y., et al. (2021) LightGBM: Accelerated Genomically Designed Crop Breeding through Ensemble Learning. Genome Biology, 22, 1-24. [Google Scholar] [CrossRef] [PubMed]
[12] Kurita, T. (2021) Principal Component Analysis (PCA). In: Ikeuchi, K., Ed., Computer Vision: A Reference Guide, Springer International Publishing, 1013-1016. [Google Scholar] [CrossRef
[13] Yu, W., Zhang, Q. and Li, W. (2025) High-Dimensional Projection-Based ANOVA Test. Journal of Multivariate Analysis, 207, Article ID: 105401. [Google Scholar] [CrossRef
[14] 王宏漫, 欧宗瑛. 采用PCA/ICA特征和SVM分类的人脸识别[J]. 计算机辅助设计与图形学学报, 2003, 15(4): 416-420+431.