基于WOA-XGBoost模型对电商用户流失预测研究
A Study on E-Commerce User Churn Prediction Based on the WOA-XGBoost Model
摘要: 在电商竞争日益激烈的当今时代,用户流失问题对企业发展影响重大,防止客户流失是任何企业都需要重点解决的一个问题,因此预测流失用户一直是学术界研究的热点。本文基于抖音电商用户相关信息数据集,分别构建了决策树、随机森林、XGBoost模型对用户是否流失进行预测,测试集上的预测结果表明XGBoost模型在AUC值、召回率、F1指标效果最优。随后,鉴于XGBoost模型的性能易受参数影响,为进一步提升其预测精度,引入鲸鱼优化算法(WOA)对XGBoost的关键参数,如学习率、最大树深度、子采样率等进行全局寻优。在优化过程中,采用五折交叉验证法确保模型的泛化能力,避免过拟合现象。实验结果显示,经WOA优化后的XGBoost模型,相比原模型,在AUC指标上提升了1.20%,准确率提升1.05%,F1值增长了0.97%,在电商用户流失预测任务中展现出更强的预测能力和更高的可靠性,为电商平台精准识别流失用户、制定有效挽留策略提供了更优的技术方案。
Abstract: In the current era of increasingly fierce e-commerce competition, the problem of user churn has a significant impact on the development of enterprises. Preventing customer churn is a key issue that any enterprise needs to address. Therefore, predicting churn users has always been a hot topic in academic research. This paper uses the data set of TikTok e-commerce user-related information to construct decision tree, random forest, and XGBoost models respectively to predict whether users will churn. The prediction results on the test set show that the XGBoost model has the best performance in terms of AUC value, recall rate, and F1-measure. Subsequently, considering that the performance of the XGBoost model is vulnerable to parameter influences, in order to further improve its prediction accuracy, the Whale Optimization Algorithm (WOA) is introduced to globally optimize the key parameters of XGBoost, such as learning rate, maximum tree depth, subsample rate, etc. During the optimization process, the five-fold cross-validation method is adopted to ensure the generalization ability of the model and avoid overfitting. The experimental results show that, compared with the original model, the XGBoost model optimized by WOA has an increase of 1.20% in the AUC index, a 1.05% increase in accuracy, and a 0.97% enhancement in F1-value. It demonstrates stronger predictive ability and higher reliability in the task of e-commerce user churn prediction, providing a better technical solution for e-commerce platforms to accurately identify churn users and formulate effective retention strategies.
参考文献
|
[1]
|
中国互联网络信息中心(CNNIC). 中国互联网络发展状况统计报告[EB/OL]. 北京: 中国互联网络信息中心, 2023. https://www.cnnic.cn/NMediaFile/2023/0908/MAIN1694151810549M3LV0UWOAV.pdf, 2025-07-17.
|
|
[2]
|
Reichheld, F.F. and Sasser, W.E. (1990) Zero Defections: Quality Comes to Services. Harvard Business Review, 68, 105-111.
|
|
[3]
|
朱世武, 崔嵬, 谢邦昌. 移动电话客户流失数据挖掘[J]. 数理统计与管理, 2005, 24(1): 62-68.
|
|
[4]
|
Verbeke, W., Martens, D., Mues, C. and Baesens, B. (2012) Building Comprehensible Customer Churn Prediction Models with Advanced Rule Induction Techniques. Expert Systems with Applications, 38, 2354-2364. [Google Scholar] [CrossRef]
|
|
[5]
|
Lemmens, A. and Croux, C. (2006) Bagging and Boosting Classification Trees to Predict Churn. Journal of Marketing Research, 43, 276-286. [Google Scholar] [CrossRef]
|
|
[6]
|
Huang, C., Ke, S. and Tsai, C. (2019) A Deep Learning Model for Customer Churn Prediction Based on Attention Mechanism. Expert Systems with Applications, 129, 93-103.
|
|
[7]
|
Nguyen, B. and Mutum, D.S. (2012) A Review of Customer Churn in the Mobile Telecommunications Industry. Marketing Review, 12, 327-351.
|
|
[8]
|
王重仁, 韩冬梅. 基于社交网络分析和XGBoost算法的互联网客户流失预测研究[J]. 信息技术与网络安全, 2017, 36(23): 58-61.
|