基于流数据的在线学习算法
Online Learning Algorithm Based on Streaming Data
DOI: 10.12677/aam.2025.142085, PDF,   
作者: 张丽丽:青岛大学数学与统计学院,山东 青岛
关键词: 流数据概念漂移在线学习Streaming Data Concept Drift Online Learning
摘要: 文章提出了一种针对流数据概念漂移现象的在线学习算法。为了提高预测的速度与精度,本文提出了多步预测回归集成模型,并详细描述了结合聚类算法的样本重抽样过程,以应对流数据的高维和大规模问题。通过将重抽样后的样本引入基于滑动窗口的在线自适应框架,结合多步预测回归模型组成本文的在线学习算法,该算法能够及时识别和处理概念漂移现象。此外,还提出了概念漂移的统计理论依据,确保了算法的准确性。针对路口车流量与网站浏览量数据,本文提出了概念漂移的类型,并针对突变漂移提出布尔因子,有效减少了突变漂移的不良影响。在实例评估中,本文方法在准确度和稳定性上均表现良好。
Abstract: This paper proposes an online learning algorithm for the concept drift phenomenon of streaming data. In order to improve the speed and accuracy of prediction, this paper proposes a multi-step prediction regression ensemble model and describes in detail the sample resampling process combined with the clustering algorithm to cope with the high-dimensional and large-scale problems of streaming data. By introducing the resampled samples into an online adaptive framework based on sliding windows and combining them with the multi-step prediction regression model to form the online learning algorithm of this paper, the algorithm can timely identify and handle the concept drift phenomenon. In addition, the paper also proposes a statistical theoretical basis for concept drift to ensure the accuracy of the algorithm. For the intersection traffic flow and website pageview data, this paper proposes the type of concept drift and proposes a Boolean factor for sudden drift, which effectively reduces the adverse effects of sudden drift. In the example evaluation, the method in this paper performs well in both accuracy and stability.
文章引用:张丽丽. 基于流数据的在线学习算法[J]. 应用数学进展, 2025, 14(2): 463-473. https://doi.org/10.12677/aam.2025.142085

参考文献

[1] Dolgintseva, E., Wu, H., Petrosian, O., Zhadan, A., Allakhverdyan, A. and Martemyanov, A. (2024) Comparison of Multi-Step Forecasting Methods for Renewable Energy. Energy Systems. [Google Scholar] [CrossRef
[2] Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., et al. (2017) LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 3149-3157.
[3] Ben Taieb, S., Bontempi, G., Atiya, A.F. and Sorjamaa, A. (2012) A Review and Comparison of Strategies for Multi-Step Ahead Time Series Forecasting Based on the NN5 Forecasting Competition. Expert Systems with Applications, 39, 7067-7083. [Google Scholar] [CrossRef
[4] Wu, Y., Hoi, S.C.H., Mei, T. and Yu, N. (2017) Large-Scale Online Feature Selection for Ultra-High Dimensional Sparse Data. ACM Transactions on Knowledge Discovery from Data, 11, 1-22. [Google Scholar] [CrossRef
[5] Zhao, X., Liang, J. and Dang, C. (2019) A Stratified Sampling Based Clustering Algorithm for Large-Scale Data. Knowledge-Based Systems, 163, 416-428. [Google Scholar] [CrossRef
[6] Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M. and Bouchachia, A. (2014) A Survey on Concept Drift Adaptation. ACM Computing Surveys, 46, 1-37. [Google Scholar] [CrossRef
[7] Ren, H., Zou, C., Chen, N. and Li, R. (2020) Large-Scale Datastreams Surveillance via Pattern-Oriented-Sampling. Journal of the American Statistical Association, 117, 794-808. [Google Scholar] [CrossRef
[8] Kozitsin, V., Katser, I. and Lakontsev, D. (2021) Online Forecasting and Anomaly Detection Based on the ARIMA Model. Applied Sciences, 11, Article 3194. [Google Scholar] [CrossRef
[9] Wu, H., Elizaveta, D., Zhadan, A. and Petrosian, O. (2023) Forecasting Online Adaptation Methods for Energy Domain. Engineering Applications of Artificial Intelligence, 123, Article 106499. [Google Scholar] [CrossRef
[10] Yang, L. and Shami, A. (2021) A Lightweight Concept Drift Detection and Adaptation Framework for IoT Data Streams. IEEE Internet of Things Magazine, 4, 96-101. [Google Scholar] [CrossRef
[11] Lu, J., Liu, A., Dong, F., Gu, F., Gama, J. and Zhang, G. (2018) Learning under Concept Drift: A Review. IEEE Transactions on Knowledge and Data Engineering, 31, 2346-2363. [Google Scholar] [CrossRef
[12] Peng, K., Leung, V.C.M. and Huang, Q. (2018) Clustering Approach Based on Mini Batch Kmeans for Intrusion Detection System over Big Data. IEEE Access, 6, 11897-11906. [Google Scholar] [CrossRef
[13] McInnes, L., Healy, J., Saul, N. and Großberger, L. (2018) UMAP: Uniform Manifold Approximation and Projection. Journal of Open Source Software, 3, Article 861. [Google Scholar] [CrossRef
[14] Yang, L. and Shami, A. (2020) On Hyperparameter Optimization of Machine Learning Algorithms: Theory and Practice. Neurocomputing, 415, 295-316. [Google Scholar] [CrossRef
[15] Paper, D. (2019) Scikit-Learn Regression Tuning. In: Paper, D., Ed., Hands-on Scikit-Learn for Machine Learning Applications, Apress, 189-213. [Google Scholar] [CrossRef