基于多特征量分析XGBoost-EMD-LSTM模型的股票价格预测
Stock Price Prediction Based on Multi-Feature Analysis Using the XGBoost-EMD-LSTM
摘要: 高效且准确地预测股票价格一直是研究者们不断追求的目标,由于股票价格呈现出明显的非线性、非平稳性及时序性的特征,提升股价预测的准确性成为了研究的重点。提出一种基于多特征量分析极端梯度提升(eXtreme gradient boosting, XGBoost)模型以及基于经验模态分解(Empirical Mode Decomposition, EMD)和长短期记忆(long short-term memory, LSTM)模型的组合预测方法对股票收盘价进行预测。首先,建立XGBoost模型对股票收盘价进行预测,并将预测结果作为新变量添加到原始数据的特征中,形成新的多元时间序列的数据;其次,用EMD算法分解新数据得到有限个本征函数;最后,建立LSTM模型并建模,得到最后的预测结果。通过仿真实验结果表明,本文的模型相比于RNN模型、单特征LSTM模型、多特征LSTM模型、XGBoost模型、EMD-LSTM模型、EMD-RNN模型,所提出模型的预测精度更高、效果更好。说明提出的方法能够有效提高预测准确性,具有一定的参考价值,对于投资者的决策具有积极的指导意义。
Abstract: Efficient and accurate stock price prediction has been a continuous pursuit for researchers. Due to the significant nonlinearity, non-stationarity, and time-series characteristics of stock prices, improving prediction accuracy has become a key research focus. This paper proposes a combined forecasting method for stock closing prices based on multi-feature analysis using the eXtreme Gradient Boosting (XGBoost) model and a hybrid model incorporating Empirical Mode Decomposition (EMD) and Long Short-Term Memory (LSTM). First, an XGBoost model is constructed to predict stock closing prices, and the prediction results are added as new variables to the original dataset, forming a new multivariate time series. Next, the EMD algorithm is applied to decompose the new data into a finite number of intrinsic mode functions. Finally, an LSTM model is built and trained to obtain the final prediction results. Stimulation results demonstrate that the proposed model outperforms the RNN model, single-feature LSTM model, multi-feature LSTM model, XGBoost model, EMD-LSTM model, and EMD-RNN model in terms of prediction accuracy and overall effectiveness. This indicates that the proposed method can significantly enhance prediction accuracy, providing valuable reference insights and offering positive guidance for investors’ decision-making.
文章引用:郑洁, 颜七笙. 基于多特征量分析XGBoost-EMD-LSTM模型的股票价格预测[J]. 应用数学进展, 2025, 14(4): 558-571. https://doi.org/10.12677/aam.2025.144186

参考文献

[1] Panigrahi, S., Pattanayak, R.M., Sethy, P.K. and Behera, S.K. (2021) Forecasting of Sunspot Time Series Using a Hybridization of ARIMA, ETS and SVM Methods. Solar Physics, 296, Article No. 6. [Google Scholar] [CrossRef
[2] Xing, D., Li, H., Li, J. and Long, C. (2021) Forecasting Price of Financial Market Crash via a New Nonlinear Potential GARCH Model. Physica A: Statistical Mechanics and Its Applications, 566, Article 125649. [Google Scholar] [CrossRef
[3] Urom, C., Chevallier, J. and Zhu, B. (2020) A Dynamic Conditional Regime-Switching GARCH CAPM for Energy and Financial Markets. Energy Economics, 85, Article 104577. [Google Scholar] [CrossRef
[4] Kim, C.B. (2018) Leverage Effect of HRCI Volatility and the Volatility Impact on Korean Export Container Volume before and after the Global Financial Crisis: Application of ARIMA-EGARCH and GIRF. The Asian Journal of Shipping and Logistics, 34, 227-233. [Google Scholar] [CrossRef
[5] Rounaghi, M.M. and Nassir Zadeh, F. (2016) Investigation of Market Efficiency and Financial Stability between S&P 500 and London Stock Exchange: Monthly and Yearly Forecasting of Time Series Stock Returns Using ARMA Model. Physica A: Statistical Mechanics and Its Applications, 456, 10-21. [Google Scholar] [CrossRef
[6] 邹婕, 李路. 基于随机森林的SA-BiGRU模型的股票价格预测研究[J]. 中国物价, 2023(11): 52-56.
[7] 李馨蕊. 基于RF-SVM算法的多因子量化选股研究[D]: [硕士学位论文]. 长沙: 湖南大学, 2022.
[8] 闫政旭, 秦超, 宋刚. 基于Pearson特征选择的随机森林模型股票价格预测[J]. 计算机工程与应用, 2021, 57(15): 286-296.
[9] 白军成, 孙秉珍, 郭誉齐, 等. 融合三支聚类与分解集成学习的股票价格预测模型[J]. 运筹与管理, 2024, 33(8): 213-218.
[10] 谢琳. 基于LSTM-XGBoost组合模型的股价预测研究[D]: [硕士学位论文]. 北京: 中央民族大学, 2022.
[11] 邓建军. 基于Attention-LSTM与XGBoost集成机制的中国商品期货投资策略研究[D]: [硕士学位论文]. 成都: 四川大学, 2022.
[12] 黄卿, 谢合亮. 机器学习方法在股指期货预测中的应用研究——基于BP神经网络、SVM和XGBoost的比较分析[J]. 数学的实践与认识, 2018, 48(8): 297-307.
[13] 王燕, 郭元凯. 改进的XGBoost模型在股票预测中的应用[J]. 计算机工程与应用, 2019, 55(20): 202-207.
[14] Hochreiter, S. and Schmidhuber, J. (1997) Long Short-Term Memory. Neural Computation, 9, 1735-1780. [Google Scholar] [CrossRef] [PubMed]
[15] Wang, J., Liao, L., Zhong, K., Deveci, M., du Jardin, P., Tan, J., et al. (2025) MRRFGNN: Multi-Relation Reconstruction and Fusion Graph Neural Network for Stock Crash Prediction. Information Sciences, 689, Article 121507. [Google Scholar] [CrossRef
[16] 曹超凡, 罗泽南, 谢佳鑫, 等. MDT-CNN-LSTM模型的股价预测研究[J]. 计算机工程与应用, 2022, 58(5): 280-286.
[17] 韩莹, 张栋, 孙凯强, 等. 结合长短时记忆网络和宽度学习的股票预测新模型研究[J]. 运筹与管理, 2023, 32(8): 187-192.
[18] 朱菊香, 任明煜, 谷卫, 等. 基于CEEMDAN-IGWO-CNN-LSTM空气质量预测建模[J]. 计算机仿真, 2025, 42(1): 529-537.
[19] 孙存浩, 胡兵, 邹雨轩. 指数趋势预测的BP-LSTM模型[J]. 四川大学学报(自然科学版), 2020, 57(1): 27-31.
[20] Wang, Q. and Zhang, Y. (2022) Research on PM2.5 Pollution Prediction Method in Hefei City Based on CNN-LSTM Hybrid Model. Journal of Physics: Conference Series, 2400, Article 012006. [Google Scholar] [CrossRef
[21] 尤睿凡. 基于时间序列模型与机器学习的组合模型的股票价格指数预测研究[D]: [硕士学位论文]. 济南: 山东大学, 2021.
[22] 张彪, 彭秀艳, 高杰. 基于ELM-EMD-LSTM组合模型的船舶运动姿态预测[J]. 船舶力学, 2020, 24(11): 1413-1421.
[23] Ren, X., Guo, H., Li, S., Wang, S. and Li, J. (2017) A Novel Image Classification Method with CNN-XGBoost Model. Digital Forensics and Watermarking, Magdeburg, 23-25 August 2017, 378-390. [Google Scholar] [CrossRef
[24] 郭长东. 基于XGBoost模型的股票预测研究[D]: [硕士学位论文]. 延吉: 延边大学, 2021.
[25] Chen, T. and Guestrin, C. (2016) XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, 13-17 August 2016, 785-794. [Google Scholar] [CrossRef
[26] Huang, N.E., Shen, Z., Long, S.R., Wu, M.C., Shih, H.H., Zheng, Q., et al. (1998) The Empirical Mode Decomposition and the Hilbert Spectrum for Nonlinear and Non-Stationary Time Series Analysis. Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, 454, 903-995. [Google Scholar] [CrossRef