可解释机器学习在电商销量预测中的不确定性量化研究
Study on Uncertainty Quantification of Explainable Machine Learning in E-Commerce Sales Forecasting
摘要: 电商销量预测是企业进行库存管理与营销决策的关键,然而传统机器学习模型常被视为“黑箱”,且缺乏对预测不确定性的量化,制约了其在实战中的应用。本文研究可解释机器学习在电商销量预测中的不确定性量化问题,以成交商品件数为研究对象,整合了时间序列分析、机器学习、SHAP可解释性方法与Bootstrap不确定性量化技术。研究首先通过描述性分析与平稳性检验明确了销量的时序特征,继而构建Lasso回归模型进行预测,并利用SHAP方法解析特征贡献度与预测逻辑;最后,通过Bootstrap抽样构建预测置信区间,并分析误差分布以量化不确定性。结果表明,Lasso回归在测试集上的RMSE (4974)显著低于对比模型,兼具预测精度与可解释性;SHAP分析清晰揭示了“交易金额”与“历史销量波动”是核心驱动特征;基于置信区间的不确定性量化能有效识别大促期间的高风险时段,为电商决策提供了兼具洞察力与风险感知的预测支持。
Abstract: E-commerce sales forecasting is critical for inventory management and marketing decision-making. However, traditional machine learning models are often regarded as “black boxes” and lack quantification of prediction uncertainty, which limits their practical application. This study investigates the integration of explainable machine learning with uncertainty quantification for e-commerce sales forecasting, using the number of transactions as the research subject. The research combines time series analysis, machine learning, SHAP explainability methods, and Bootstrap uncertainty quantification techniques. First, descriptive analysis and stationarity tests were conducted to clarify the temporal characteristics of sales. Subsequently, a Lasso regression model was constructed for forecasting, and the SHAP method was employed to interpret feature contributions and prediction logic. Finally, confidence intervals for predictions were constructed via Bootstrap sampling, and error distribution was analyzed to quantify uncertainty. Results show that the Lasso regression achieved a significantly lower RMSE (4974) on the test set compared to benchmark models, demonstrating both predictive accuracy and interpretability. SHAP analysis clearly revealed that “transaction amount” and “historical sales fluctuations” are the core driving features. The uncertainty quantification based on confidence intervals effectively identified high-risk periods during major promotions, providing e-commerce decision-making with predictive support that combines insight with risk awareness.
参考文献
|
[1]
|
姜晓红, 曹慧敏. 基于ARIMA模型的电商销售预测及R语言实现[J]. 物流科技, 2019, 42(4): 52-56+69.
|
|
[2]
|
张晓颖, 贺伊雯, 王立越. 基于集成学习的电商销量预测研究分析[J]. 长春大学学报, 2024, 34(4): 1-7.
|
|
[3]
|
胡诣文, 张天佑, 张旭, 周才英. 基于机器学习的二手车价格预测算法研究[J]. 信息技术与信息化, 2022(10): 52-55.
|
|
[4]
|
Dickey, D.A. and Fuller, W.A. (1979) Distribution of the Estimators for Autoregressive Time Series with a Unit Root. Journal of the American Statistical Association, 74, 427-431. [Google Scholar] [CrossRef]
|
|
[5]
|
Pearson, K. (1895) Notes on Regression and Inheritance in the Case of Two Parents. Proceedings of the Royal Society of London, 58, 240-242.
|
|
[6]
|
Tibshirani, R. (1996) Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58, 267-288. [Google Scholar] [CrossRef]
|
|
[7]
|
Lundberg, S.M. and Lee, S.I. (2017) A Unified Approach to Interpreting Model Predictions. Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, 4-9 December 2017, 4765-4774.
|
|
[8]
|
Efron, B. (1979) Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics, 7, 1-26. [Google Scholar] [CrossRef]
|
|
[9]
|
Hyndman, R.J. and Koehler, A.B. (2006) Another Look at Measures of Forecast Accuracy. International Journal of Forecasting, 22, 679-688. [Google Scholar] [CrossRef]
|