基于统计时间序列分析的用户数量预测方法
Interval Prediction Model of User Number Based on Time Series
摘要: 针对Wordle游戏用户报告数量的预测问题,利用用户数据时序性等特点,建立融合指数平滑和统计时间序列预测ARMA的ES-ARMA模型,并分析其统计性质给出了区间预测的方法。此外,为进一步验证提出方法的有效性,本文选择了MSE、RMSE、MAE、R2等4种指标和传统时间序列模型ARIMA、深度序列预测网络模型LSTM、决策树模型XGBoost等经典预测模型进行对比分析提出方法的有效性。最终,在该4个指标上的实验结果表明ES-ARMA预测Wordle游戏用户报告数量与实际更吻合,也充分证明了模型的有效性;并在此基础上,从理论上给出了2023-03-01的用户报告数量在95%置信度下的预测区间。与此同时,还分析了不同模型的特点与效果,为其它回归预测问题模型的选择提供了有价值参考。
Abstract: To address the prediction problem of the number of user reports in the Wordle game, an ES-ARMA model was established by integrating exponential smoothing and the statistical time series prediction ARMA model, utilizing the temporal characteristics of user data. The statistical properties of this model were analyzed, and an interval prediction method was provided. Furthermore, to validate the effectiveness of the proposed method, traditional time series models such as ARIMA, deep sequence prediction network models like LSTM, and Boosting ensemble machine learning models like XGBoost were selected for comparative analysis. Four evaluation metrics, namely MSE, RMSE, MAE, and R2, were used for method evaluation. Ultimately, the experimental results on these four metrics indicated that the ES-ARMA model’s predictions of the number of user reports in the Wordle game were more consistent with the actual results, thereby fully demonstrating the model’s effectiveness. Based on this, the prediction interval of the number of user reports on 2023-03-01 at a 95% confidence level was theoretically provided. Additionally, the characteristics and effectiveness of different models were analyzed, offering valuable references for model selection in other regression prediction problems.
文章引用:罗廷金, 燕俊名, 王铭悦, 梁天乐, 吴桂林. 基于统计时间序列分析的用户数量预测方法[J]. 建模与仿真, 2025, 14(3): 622-635. https://doi.org/10.12677/mos.2025.143252

参考文献

[1] COMAP Inc (2023) MCM Problem C: Predicting Wordle Results.
https://www.contest.comap.com/undergraduate/contests/mcm/contests/2023/problems/
[2] 陈华友, 周礼刚, 李金培. 统计预测与决策[M]. 北京: 科学出版社, 2018: 102-178.
[3] 杨海民, 潘志松, 白玮.时间序列预测方法综述[J]. 计算机科学, 2019, 46(1): 21-28.
[4] 刘曾好. 基于深度学习的时间序列数据挖掘与应用[D]: [硕士学位论文]. 北京: 中国科学技术大学, 2022.
[5] Hochreiter, S. and Schmidhuber, J. (1997) Long Short-Term Memory. Neural Computation, 9, 1735-1780. [Google Scholar] [CrossRef] [PubMed]
[6] 张心宇, 刘源, 宋佳凝. 基于LSTM神经网络的短期轨道预报[J]. 系统工程与电子技术, 2022, 44(3): 939-947.
[7] 陈中林, 杨翠丽, 乔俊飞. 基于TG-LSTM神经网络的非完整时间序列预测[J]. 控制理论与应用, 2022, 39(5): 867-878.
[8] 沈学华, 周志华, 吴建鑫, 陈兆乾. Boosting和Bagging综述[J]. 计算机工程与应用, 2000, 36(12): 31-32, 40.
[9] Chen, T. and Guestrin, C. (2016) XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, 13-17 August 2016, 785-794. [Google Scholar] [CrossRef
[10] 蒲怡苇. 基于Boosting算法的多因子选股方案研究[D]: [硕士学位论文]. 成都: 电子科技大学, 2022.
[11] 杨剑锋, 乔佩蕊, 李永梅, 王宁. 机器学习分类问题及算法研究综述[J]. 统计与决策, 2019, 35(6): 36-40.