基于XGBoost特征筛选的工业时序数据的重建异常检测算法研究
Research on Reconstruction Anomaly Detection Algorithm of Industrial Time Series Data Based on XGBoost Feature Selection
DOI: 10.12677/CSA.2022.123060, PDF,   
作者: 周旭荣, 郑建立:东华大学信息科学与技术学院,上海
关键词: 时序数据XGBoost卷积编码器解码器Time-Series Data XGBoost Convolution Encoder Decoder
摘要: 针对工业生产中产生的大量时序数据,如何对无用数据进行有效剔除,并且判断传感器所采集数据是否正确,如何对时序数据进行有效异常检测,成为了研究者们关注的问题。在此期间,很多研究者都提出了自己的异常检测算法,但大多只考虑了时序数据的时间性特征,并未将传感器之间的相关性特征考虑进去。所以本文提出一种基于XGBoost特征筛选的多维自注意卷积门控循环编码解码器(MDACGA),对原始的数据集进行有效特征筛选,根据得分,剔除无关变量,提取有效变量。之后利用有效信息构建特征矩阵,采用全卷积编码器来对特征矩阵进行编码,提取不同时间序列间的相关性特征,采用基于注意力机制的ConvGRU来提取不同时间序列间的时间性特征。最后利用卷积解码器对前一步得到的特征矩阵进行联合解码,从而得到重建后的特征矩阵,利用Adam优化器和小批量随机梯度下降法来最小化重建误差。最终利用残差特征矩阵进行异常检测。实验结果显示,该算法达到0.989的准确率、0.996的召回率,足以表明该异常检测算法具有有效性,并且异常检测效果也优于一般基准算法。
Abstract: In view of the large amount of time series data generated in industrial production, how to effectively eliminate useless data, judge whether the data collected by sensors is correct, and how to effectively detect anomalies of time series data have become the focus of researchers. In this period, many researchers have proposed their own anomaly detection algorithm, but most of them only consider the temporal characteristics of time series data, and do not take into account the correlation between sensors. So this paper proposes a Multi-Dimensional Self-Attention Convolutional Gated Re-current Encoder and Decoder (MDACGA) based on XGBoost for feature selection, which can effectively filter the original data set and eliminate irrelevant variables according to the score, extraction of valid variables. Then, the effective information is used to construct the feature matrix, and the full convolution encoder is used to encode the feature matrix and extract the correlation features of different time series. ConvGRU-Attention mechanism is used to extract temporal features of different time series. Finally, a convolution decoder is used to jointly decode the feature matrix obtained in the previous step to get the reconstructed feature matrix, and Adam Optimizer and Mini-Batch Stochastic Gradient Descent are used to minimize the reconstruction error. Finally, anomaly detection is carried out by residual error characteristic matrix. The experimental results show that the accuracy of the algorithm is 0.989 and the recall of the algorithm is 0.996, which shows that the anomaly detection algorithm is effective and the anomaly detection effect is better than the general benchmark algorithm.
文章引用:周旭荣, 郑建立. 基于XGBoost特征筛选的工业时序数据的重建异常检测算法研究[J]. 计算机科学与应用, 2022, 12(3): 590-601. https://doi.org/10.12677/CSA.2022.123060

参考文献

[1] Atkinson, A.C. and Hawkins, D.M. (1981) Identification of Outliers. Biometrics, 37, 860-861. [Google Scholar] [CrossRef
[2] Chandola, V., Banerjee, A. and Kumar, V. (2007) Outlier Detection: A Sur-vey. ACM Computing Surveys, 41, Article 15.
[3] Deng, A. and Hooi, B. (2021) Graph Neural Network-Based Anomaly Detection in Multivariate Time Series.
[4] Hautamäki, V., Kärkkäinen, I. and Fränti, P. (2004) Outlier Detec-tion Using k-Nearest Neighbour Graph. Proceedings of the 17th International Conference on Pattern Recognition, Cam-bridge, 26-26 August 2004, 430-433. [Google Scholar] [CrossRef
[5] Manevitz, L.M. and Yousef, M. (2002) One-Class SVMs for Document Classification. Journal of Machine Learning Research, 2, 139-154.
[6] Zhou, Y., Qin, R., Xu, H., Sadiq, S. and Yu, Y. (2018) A Data Quality Control Method for Seafloor Observatories: The Application of Observed Time Series Data in the East China Sea. Sensors (Switzerland), 18, 2628. [Google Scholar] [CrossRef] [PubMed]
[7] Zong, B., Song, Q., Min, M.R., Cheng, W., Lumezanu, C., Cho, D. and Chen, H. (2018) Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection. 6th International Conference on Learning Representations, Vancouver, 30 April-3 May 2018, 1-19.
[8] Qin, Y., Song, D., Cheng, H., Cheng, W., Jiang, G. and Cottrell, G.W. (2017) A Dual-Stage Attention-Based Recurrent Neural Network for Time Se-ries Prediction. IJCAI International Joint Conference on Artificial Intelligence, Melbourne, 19-25 August 2017. [Google Scholar] [CrossRef
[9] Chen, T. and Guestrin, C. (2016) XGBoost: A Scalable Tree Boosting System. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, 13-17 August 2016, 785-794. [Google Scholar] [CrossRef
[10] Song, D., Xia, N., Cheng, W., Chen, H. and Tao, D. (2018) Deep r-th Root of Rank Supervised Joint Binary Embedding for Multivariate Time Series Retrieval. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, London, 19-23 August 2018, 2229-2238. [Google Scholar] [CrossRef
[11] Zhang, C., Song, D., Chen, Y., Feng, X., Lumezanu, C., Cheng, W., Ni, J., Zong, B., Chen, H. and Chawla, N.V. (2019) A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data. 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Education-al Advances in Artificial Intelligence, EAAI 2019, July 2019, 1409-1416. [Google Scholar] [CrossRef
[12] Shelhamer, E., Long, J. and Darrell, T. (2017) Fully Convolu-tional Networks for Semantic Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 640-651. [Google Scholar] [CrossRef
[13] Klambauer, G., Unterthiner, T., Mayr, A. and Hochreiter, S. (2017) Self-Normalizing Neural Networks. Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, 4-9 December 2017.
[14] Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K. and Woo, W.C. (2015) Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. Annual Conference on Neural Information Processing Systems 2015, Montreal, 7-12 Decem-ber 2015.
[15] Jung, M., Lee, H. and Tani, J. (2018) Adaptive Detrending to Accelerate Convolutional Gated Recurrent Unit Training for Contextual Video Recognition. Neural Networks, 105, 356-370. [Google Scholar] [CrossRef] [PubMed]
[16] Kingma, D.P. and Ba, J.L. (2015) Adam: A Method for Stochas-tic Optimization. 3rd International Conference on Learning Representations, ICLR 2015—Conference Track Proceed-ings, San Diego, 7-9 May 2015.
[17] Malhotra, P., Vig, L., Shroff, G. and Agarwal, P. (2015) Long Short Term Memory Networks for Anomaly Detection in Time Series. 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2015—Proceedings, Bruges, 22-23 April 2015, 89-94.