销量预测中四类模型的性能对比研究
Performance Comparison of Four Models in Sales Forecasting
摘要: 本研究系统评估了传统统计方法与现代集成学习、深度学习技术在零售销量预测中的适用性差异,为模型选型提供实证依据。基于M5 (沃尔玛日销量)数据集,在统一的数据切分、滚动回测与贝叶斯优化框架下,对SARIMA、Prophet、N-BEATS与XGBoost四类模型在店铺级与SKU级预测任务中的性能进行对比。店铺级以CA_1门店聚合销量为对象,在7天、14天与28天预测窗口下使用平均绝对误差(MAE)、均方根误差(RMSE)与平均绝对百分比误差(MAPE)评估模型表现;SKU级从CA_1门店随机抽取150个SKU,按历史销量三分位划分为高、中、低销量组,在28天预测窗口下通过3次滚动回测,并结合MAE、RMSE与均方根标度误差(RMSSE)进行综合评价,同时采用特征消融检验外生变量的作用。结果表明,XGBoost在两类任务中均表现出最优或近最优性能:店铺级28天预测中MAPE低至4.47%;SKU级平均RMSSE为0.933 ± 0.288,在低销量、高稀疏组中RMSSE为1.178 ± 0.332,相较N-BEATS提升约12.6%。各模型表现具有明显差异:SARIMA在强周度季节性场景下较为稳定;N-BEATS在高销量、低稀疏序列上具备竞争力但对稀疏性较敏感;Prophet对局部突变刻画不足。特征消融显示,价格与日历特征对稀疏SKU预测提升尤为显著。综上,本研究量化了预测跨度与稀疏度对模型性能的影响,验证了XGBoost在零售需求预测中的鲁棒性,并为零售领域的模型选型与预测优化提供了更为科学的决策支撑。
Abstract: This study systematically evaluates the differences in applicability between traditional statistical methods and modern ensemble learning and deep learning techniques for retail sales forecasting, providing empirical evidence to support model selection. Using the M5 (Walmart daily sales) dataset, we compare the performance of four model families—SARIMA, Prophet, N-BEATS, and XGBoost—on store-level and SKU-level forecasting tasks under a unified framework of data splitting, rolling backtesting, and Bayesian optimization. For the store-level task, aggregated sales of the CA_1 store are used as the forecasting target, and model performance is evaluated over 7-day, 14-day, and 28-day horizons using mean absolute error (MAE), root mean squared error (RMSE), and mean absolute percentage error (MAPE). For the SKU-level task, 150 SKUs are randomly sampled from the CA_1 store and stratified into high-, medium-, and low-sales groups based on historical sales tertiles; under a 28-day forecasting horizon, three rolling backtests are conducted and a comprehensive evaluation is performed using MAE, RMSE, and root mean squared scaled error (RMSSE). In addition, feature ablation is employed to examine the role of exogenous variables. The results indicate that XGBoost achieves the best or near-best performance in both tasks: in the store-level 28-day forecasting, MAPE is as low as 4.47%; at the SKU level, the average RMSSE is 0.933 ± 0.288, and in the low-sales, highly sparse group, RMSSE is 1.178 ± 0.332, representing an improvement of approximately 12.6% compared with N-BEATS. Model performance differs markedly across methods: SARIMA is relatively stable in scenarios with strong weekly seasonality; N-BEATS is competitive on high-sales, low-sparsity series, but is sensitive to sparsity; Prophet is insufficient in characterizing local abrupt changes. Feature ablation shows that price and calendar features yield particularly significant improvements for forecasting sparse SKUs. Overall, this study quantifies the impact of forecasting horizon and sparsity on model performance, verifies the robustness of XGBoost in retail demand forecasting, and provides a more scientifically grounded basis for model selection and forecasting optimization in the retail domain.
参考文献
|
[1]
|
Bzdok, D., Altman, N. and Krzywinski, M. (2018) Statistics versus Machine Learning. Nature Methods, 15, 233-234. [Google Scholar] [CrossRef] [PubMed]
|
|
[2]
|
袁瑞萍, 魏辉, 傅之家, 等. 融合CNN和WDF模型的电商企业商品销量预测研究[J]. 计算机工程与应用, 2025, 61(2): 335-343.
|
|
[3]
|
霍佳震, 徐骏, 陈铭洲. 基于EEMD-HW-GBDT模型的零售商品销量多步预测[J]. 工业工程与管理, 2024, 29(1): 133-141.
|
|
[4]
|
向易, 丛丽丽, 王玮鹏, 等. 层次时间序列预测方法与应用综述[J]. 计算机科学, 2025, 52(S2): 550-556.
|
|
[5]
|
Dubey, A.K., Kumar, A., García-Díaz, V., Kumar Sharma, A. and Kanhaiya, K. (2021) Study and Analysis of SARIMA and LSTM in Forecasting Time Series Data. Sustainable Energy Technologies and Assessments, 47, Article ID: 101474. [Google Scholar] [CrossRef]
|
|
[6]
|
Taylor, S.J. and Letham, B. (2018) Forecasting at Scale. The American Statistician, 72, 37-45. [Google Scholar] [CrossRef]
|
|
[7]
|
李扬, 肖勇波, 辛诚, 等. 信息不完全下基于关联匹配的工程物资需求预测[J/OL]. 系统管理学报, 2025: 1-20. https://link.cnki.net/urlid/31.1977.N.20250814.1553.002, 2026-02-06.
|
|
[8]
|
成耀, 张铎, 周宇, 何金凤, 程实. 基于模糊聚类的电商企业不平衡财务数据风险预测方法[J]. 电子商务评论, 2025, 14(1): 640-647.
|
|
[9]
|
Oreshkin, B.N., Carpov, D., Chapados, N., et al. (2019) N-BEATS: Neural Basis Expansion Analysis for Interpretable Time Series Forecasting.
|
|
[10]
|
李坤, 陈剑钧, 李国胜, 等. 小样本学习研究综述[J]. 机电工程技术, 2025, 54(6): 160-168.
|
|
[11]
|
范黎林, 杨凯, 毛文涛, 等. 融合结构化信息与时序演化信息的多变量间歇性时间序列预测[J]. 控制与决策, 2024, 39(1): 263-270.
|
|
[12]
|
Qian, W., Rolling, C.A., Cheng, G. and Yang, Y. (2022) Combining Forecasts for Universally Optimal Performance. International Journal of Forecasting, 38, 193-208. [Google Scholar] [CrossRef]
|