基于改进的Tomek Link方法预测电信客户流失
Predicting Telecom Customer Churn Based on Improved Tomek Link Method
DOI: 10.12677/orf.2024.143320, PDF,   
作者: 郑诗滢:福建师范大学数学与统计学院,福建 福州
关键词: Tomek LinkLogistic RegressionXGB-RFETabNet客户流失Tomek Link Logistic Regression XGB-RFE TabNet Customer Churn
摘要: 为提高电信客户流失检测的准确性和效率,提出了一综合方法。首先应用XGB-RFE方法对特征进行筛选,以选择最相关的特征,其次采用基于分层交叉验证框架下的数据平衡技术,来处理不平衡数据。通过这两种方法的综合应用,旨在提高模型性能和可解释性。结果表明,基于分层交叉验证框架下的Tomek Link欠采样技术,显著提高了各个模型的性能。此外,将该方法应用于TabNet模型中,同样取得了良好的效果。这一综合方法对于电信客户流失预测具有实际应用价值,有望提高流失检测率,改善业务决策。
Abstract: In order to address the issue of low detection rates in telecom customer churn, a comprehensive approach is proposed. Firstly, the XGB-RFE method is applied for feature selection to choose the most relevant features. Secondly, a data balancing technique is employed under a stratified cross-validation framework to address the issue of class imbalance. Through the combined application of these two methods, the aim is to enhance model performance and interpretability. The results demonstrate that the Tomek Link under sampling technique under the stratified cross-validation framework significantly improves the performance of various models. Additionally, successfully applying this method to the TabNet model also yields favorable results. Therefore, this comprehensive approach holds practical value for predicting telecom customer churn, with the potential to improve churn detection rates and enhance business decision-making.
文章引用:郑诗滢. 基于改进的Tomek Link方法预测电信客户流失[J]. 运筹与模糊学, 2024, 14(3): 843-851. https://doi.org/10.12677/orf.2024.143320

参考文献

[1] 周婉婷, 赵志杰, 刘阳, 等. 电子商务客户流失的DBN预测模型研究[J]. 计算机工程与应用, 2022, 58(11): 84-92.
[2] 李波, 谢玖祚. 生成对抗网络的银行不平衡客户流失预测研究[J]. 重庆理工大学学报: 自然科学, 2021, 35(8): 136-143.
[3] Wei, C.P. and Chiu, I.T. (2002) Turning Telecommunications Call Details to Churn Prediction: A Data Mining Approach. Expert Systems with Applications, 23, 103-112. [Google Scholar] [CrossRef
[4] 周颖, 吕巍, 井淼. 基于数据挖掘技术的移动通信行业客户细分[J]. 上海交通大学学报, 2007(7): 1142-1145.
[5] 张俊春, 王庶民, 徐峰. 深度生存分析在电信客户流失预测中的应用[J]. 价值工程, 2021(20): 165-167.
[6] 邱一卉. 基于剪枝随机森林的电信行业客户流失预测[J]. 厦门大学学报: 自然科学版, 2014, 53(6): 817-823.
[7] 李艳霞, 柴毅, 胡友强, 等. 不平衡数据分类方法综述[J]. 控制与决策, 2019(4): 16.
[8] Chen, T., He, T. and Benesty, M. (2016) XGboost: Extreme Gradient Boosting.
https://api.semanticscholar.org/CorpusID:225317050
[9] TANKY (2020) Telco Customer Churn: IBM Dataset.
https://www.kaggle.com/datasets/yeanzc/telco-customer-churn-ibm-dataset/data
[10] Devi, D., Biswas, S.K. and Purkayastha, B. (2016) Redundancy-Driven Modified Tomek-Link Based Undersampling: A Solution to Class Imbalance. Pattern Recognition Letters, 93, 3-12. [Google Scholar] [CrossRef
[11] Arik, S.O. and Pfister, T. (2019) TabNet: Attentive Interpretable Tabular Learning. arXiv preprint arXiv:1908.07442.