融合AUC-RW算法的XGBoost-LightGBM旅游保险投保预测模型研究
Research on XGBoost-LightGBM Travel Insurance Application Prediction Model Based on AUC-RW Algorithm
DOI: 10.12677/sa.2024.135180, PDF,    科研立项经费支持
作者: 王 辰, 肖阳田, 肖鸿民*:西北师范大学,数学与统计学院,甘肃 兰州
关键词: 投保意愿预测XGBoostLightGBMAUC-RW算法多模型融合Prediction of Willingness to Insure XGBoost LightGBM AUC-RW Algorithm Multi-Model Fusion
摘要: 在数字化转型浪潮席卷全球的社会背景下,保险行业正以前所未有的速度向智能化、个性化服务迈进。为了精准捕捉客户需求,并有效提升旅游保险市场的渗透率,保险行业正积极探索运用高级数据分析与机器学习技术来预测客户的投保意愿。为此,我们创新性地提出了基于多模型融合策略的XGBoost-LightGBM预测模型,该模型旨在通过深度挖掘客户数据,为旅游保险产品的精准营销提供科学依据。以Kaggle平台的客户旅游保险投保情况数据集为对象进行预处理,分别构建了XGBoost与LightGBM这两个高效、灵活的梯度提升框架模型,在模型融合阶段使用采样器对模型进行参数优化,创造性地引入了AUC-RW算法确定融合权重,将两个模型的预测结果加权结合作为XGBoost-LightGBM组合模型的预测结果。最后结合准确率、F1值等评价指标,与其他算法模型进行比较分析。结果表明:结合AUC-RW算法的XGBoost-LightGBM模型相较于XGBoost、LightGBM、随机森林(RF)、支持向量机(SVM)更具有优势,预测精度更高。
Abstract: Amidst the global wave of digital transformation, the insurance industry is advancing towards intelligent and personalized services at an unprecedented pace. To accurately capture customer needs and effectively enhance the penetration rate of the travel insurance market, the insurance industry is actively exploring the application of advanced data analytics and machine learning techniques to predict customers’ willingness to purchase insurance. To this end, we innovatively propose an XGBoost-LightGBM prediction model based on a multi-model fusion strategy. This model aims to provide a scientific basis for precision marketing of travel insurance products by deeply mining customer data. Using the dataset of customer travel insurance purchases from the Kaggle platform as the subject of preprocessing, we have constructed two efficient and flexible gradient boosting framework models, namely XGBoost and LightGBM. During the model fusion stage, we optimize the model parameters using samplers, and creatively introduce the AUC-RW algorithm to determine the fusion weights. The prediction results of the two models are then weighted and combined to serve as the prediction outcome of the XGBoost-LightGBM ensemble model. Finally, we conduct a comparative analysis with other algorithmic models using evaluation metrics, such as accuracy and F1 score. The results indicate that the XGBoost-LightGBM model combined with the AUC-RW algorithm outperforms XGBoost, LightGBM, Random Forest (RF), and Support Vector Machine (SVM) in terms of prediction accuracy.
文章引用:王辰, 肖阳田, 肖鸿民. 融合AUC-RW算法的XGBoost-LightGBM旅游保险投保预测模型研究[J]. 统计学与应用, 2024, 13(5): 1847-1858. https://doi.org/10.12677/sa.2024.135180

参考文献

[1] 姚洪心, 黄雪妮. 多层线性模型与神经网络融合算法在公司债收益率预测中的应用[J]. 数学的实践与认识, 2024, 54(8): 89-101.
[2] 王舜. CART决策树在旅游保险数据中的研究[J]. 软件, 2022, 43(10): 122-124.
[3] 万平, 李立状, 娄峰, 等. 基于PSO-SVM的文本分类在保险精准营销中的应用[J]. 系统工程, 2023, 41(5): 144-150.
[4] 孟生旺, 李天博, 高光远. 基于机器学习算法的车险索赔概率与累积赔款预测[J]. 保险研究, 2017(10): 42-53.
[5] 辛凯琪, 汤金凤. 基于数据挖掘的重大疾病保险客户风险预测及细分[J]. 上海保险, 2022(11): 55-60.
[6] 常文晗. 基于Adacost算法的居民保险购买行为影响因素分析[D]: [硕士学位论文]. 天津: 天津财经大学, 2020.
[7] 王俊杰, 毕利, 张凯, 等. 基于多特征融合和XGBoost-LightGBM-ConvLSTM的短期光伏发电量预测[J]. 太阳能学报, 2023, 44(7): 168-174.
[8] 吴飞, 王鹏程, 杨康. 基于PCA-SSA-XGBoost的车辆驾驶性评估[J/OL]. 吉林大学学报(工学版): 1-11. 2024-10-18. [Google Scholar] [CrossRef
[9] 张柯, 刘海忠. 基于VMD-Stacking混合模型的短期风速预测研究[J]. 计算机时代, 2023(5): 40-45.
[10] Chowdhury, S., Mayilvahanan, P. and Govindaraj, R. (2020) Optimal Feature Extraction and Classification-Oriented Medical Insurance Prediction Model: Machine Learning Integrated with the Internet of Things. International Journal of Computers and Applications, 44, 278-290. [Google Scholar] [CrossRef
[11] Hughes, A. (1994) Strategic Database Marketing: The Master Plan for Starting and Managing a Profitable, Customer Based Marketing Program. McGraw-Hill.
[12] 王贵龙. 基于关联向量机的保险客户识别研究[D] : [硕士学位论文]. 西安: 西安工业大学, 2011.
[13] 丁世飞, 齐丙娟, 谭红艳. 支持向量机理论与算法研究综述[J]. 电子科技大学学报, 2011, 40(1): 1-10.
[14] 方红帏, 赵涛, 佃松宜. 基于三域特征提取和GS-SVM的ECG信号智能分类技术研究[J]. 四川大学学报(自然科学版), 2020, 57(2): 297-303.
[15] 陈启伟, 王伟, 马迪, 等. 基于Ext-GBDT集成的类别不平衡信用评分模型[J]. 计算机应用研究, 2018, 35(2): 421-427.
[16] 庄家懿, 杨国华, 郑豪丰, 等. 基于多模型融合的CNN-LSTM-XGBoost短期电力负荷预测方法[J]. 中国电力, 2021, 54(5): 46-55.
[17] Ke, G.L., Meng, Q., Finley, T., et al. (2017) LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, 4-9 December 2017.
[18] Tang, M., Zhao, Q., Ding, S.X., Wu, H., Li, L., Long, W., et al. (2020) An Improved LightGBM Algorithm for Online Fault Detection of Wind Turbine Gearboxes. Energies, 13, Article 807. [Google Scholar] [CrossRef