基于概率校准的客户流失预测模型研究
Research on Customer Churn Prediction Model Based on Probability Calibration
摘要: 本文首先利用逻辑回归、线性判别分析、K近邻、支持向量机、贝叶斯判别和决策树模型给出了客户流失预测模型,定量分析了客户是否会流失。然后对用于客户流失预测的分类器进行十折交叉验证,选取了精度最高的线性判别分析模型。其输出结果只有流失或者未流失,不能得到客户流失的概率,我们进一步将线性判别分析模型的输出结果转化成了概率,即概率校准。最后利用Brier分数和概率校准曲线等评价标准对校准前后的模型进行了评估,得到了更好的结果。
Abstract:
In this paper, we first used logistic regression, linear discriminant analysis, k-nearest neighbor, support vector machine, Bayesian discriminant and decision tree model to give a customer churn prediction model and quantitative analysis of whether the customer will be lost. Then, through the ten-fold cross validation of the six models, we selected the linear discriminant analysis model with the highest accuracy. The output result of the linear discriminant analysis model is only loss or no loss, and we cannot get the probability of customer loss, so we further transformed the output result of the linear discriminant analysis model into probability, namely probability calibration. Finally, the Brier score and probability calibration curve were used to evaluate the model before and after calibration, and better results were obtained.
参考文献
|
[1]
|
顾正云. 信用评分模型有效性比较[D]: [硕士学位论文]. 南京: 南京大学, 2011.
|
|
[2]
|
张国政, 陈维煌, 刘呈辉. 基于Logistic模型的商业银行个人消费信贷风险评估研究[J]. 金融理论与实践, 2015(3): 53-57.
|
|
[3]
|
肖铮. 常用的三种分类算法及其比较分析[J]. 重庆科技学院学报(自然科学版), 2020, 22(5): 101-106.
|
|
[4]
|
姜飞, 杨明, 刘雨欣. 基于支持向量机混合采样的不平衡数据分类方法[J]. 数学的实践与认识, 2021, 51(1): 88-96.
|
|
[5]
|
曹玲玲, 潘建寿. 基于Fisher判别分析的贝叶斯分类器[J]. 计算机工程,2011, 37(10): 162-164.
|
|
[6]
|
李衍. 移动互联网背景下客户流失预测研究[D]: [硕士学位论文]. 厦门: 厦门大学, 2018.
|
|
[7]
|
肖瑶, 谢贵才, 朱兵. 浅谈分类问题中的概率校准[J]. 中国统计, 2018(5): 35-37.
|
|
[8]
|
姜正申, 刘宏志. 基于概率校准的集成学习[J]. 计算机应用, 2016, 36(2): 291-294, 407.
|
|
[9]
|
罗艳虹, 李治, 余红梅, 郭虎生, 曹红艳, 王蕾, 宋春英, 郭兴萍, 张岩波. 基于代价敏感性和概率校准的先天性心脏病概率预测模型研究[J]. 中国卫生统计, 2019, 36(1): 36-39.
|