基于日志解析的大规模微服务架构软件系统异常检测
Anomaly Detection of Large Scale Microservice Architecture Software System Based on Log Parsing
DOI: 10.12677/CSA.2019.912252, PDF,    国家自然科学基金支持
作者: 邰丽媛*, 田春岐:同济大学计算机科学与技术系,上海;王 伟:华东师范大学数据科学学院,上海
关键词: 日志解析异常检测微服务抽象语法树长短期记忆网络Log Parsing Exception Detection Microservice Abstract Syntax Tree Long Short Term Memory Network
摘要: 近几年随着微服务架构的兴起,系统规模越来越庞大,传统的人工定位问题和异常的方法效率低,耗费时间和精力,如何进行自动化的异常检测引起了科研人员的广泛关注,通过日志进行异常检测不失为一种有效的手段。由于微服务架构的软件系统业务复杂,产生的日志数据量庞大,而且这些日志是来自不同集群节点、不同用户请求的非结构化日志,类型多样,格式复杂,因此难以提取有用的日志信息进行异常检测。作者提出了一种通过抽象语法树进行日志源码解析,将非结构化的日志数据转为结构化的数据,再按照时间窗口和事件标识符对结构化的日志进行分组,最后通过长短期记忆网络建立模型以检测系统中的执行路径异常,实验表明,可以有效地检测微服务架构软件系统中的异常,相较于传统的基于统计方法的模型准确率提高了约10%,同时还研究了日志键序列长度和训练数据集大小对异常检测模型效果的影响。
Abstract: In recent years, with the rise of microservice architecture, the scale of the system is becoming larger and larger. The traditional manual positioning problems and anomaly methods are inefficient and time and energy are consumed. How to carry out automatic anomaly detection has attracted extensive attention of researchers. It is an effective means to carry out anomaly detection through logs. Due to the complexity of microservice architecture software system business, the amount of log data generated is huge, and these logs are unstructured logs from different cluster nodes and different user requests, with various types and complex formats, so it is difficult to extract useful log information for anomaly detection. This paper proposes an anomaly detection method that analyzes log source code through an abstract syntax tree, converts unstructured log data into structured data, and then groups the structured logs according to time windows and event identifiers. Long and short term memory networks are modeled to detect abnormal execution paths in the system. The experiment shows that it can effectively detect the anomalies in the microservice architecture software system, and the accuracy of the model is improved by about 10% compared with the traditional statistical method. At the same time, we also study the effect of the length of the log key sequence and the size of the training data set on the anomaly detection model.
文章引用:邰丽媛, 田春岐, 王伟. 基于日志解析的大规模微服务架构软件系统异常检测[J]. 计算机科学与应用, 2019, 9(12): 2266-2276. https://doi.org/10.12677/CSA.2019.912252

参考文献

[1] Dragoni, N., Giallorenzo, S., Lafuente, A.L., et al. (2016) Microservices: Yesterday, Today, and Tomorrow. In: Mazzara, M. and Meyer, B., Eds., Present and Ulterior Software Engineering, Springer, Cham, 195-216. [Google Scholar] [CrossRef
[2] Gabbrielli, M., Giallorenzo, S., Guidi, C., Mauro, J. and Montesi, F. (2016) Self-Reconfiguring Microservices. In: Ábrahám, E., Bonsangue, M. and Johnsen, E., Eds., Theory and Practice of Formal Methods. Lecture Notes in Computer Science, Springer, Cham, 194-210. [Google Scholar] [CrossRef
[3] Thönes, J. (2015) Microservices. IEEE Software, 32, 113-116. [Google Scholar] [CrossRef
[4] 廖湘科, 李姗姗, 董威, 等. 大规模软件系统日志研究综述[J]. 软件学报, 2016, 27(8): 1934-1947.
[5] Alspaugh, S., Chen, B., Lin, J., Ganapathi, A., Hearst, M. and Katz, R. (2014) An-alyzing Log Analysis: An Empirical Study of User Log Mining. LISA14, 62-77.
[6] Lee, G., Lin, J., Liu, C., Lorek, A. and Ryaboy, D. (2012) The Unified Logging Infrastructure for Data Analytics at Twitter. Proceedings of the VLDB En-dowment, 5, 1771-1780. [Google Scholar] [CrossRef
[7] 陆杰, 李丰, 李炼. 分布式系统中的日志分析及应用[J]. 高技术通讯, 2019, 29(4): 303-320.
[8] Tang, L., Li, T. and Perng, C.S. (2011) LogSig: Generat-ing System Events from Raw Textual Logs. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, 785-794. [Google Scholar] [CrossRef
[9] Fu, Q., Lou, J.G., Wang, Y. and Li, J. (2009) Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis. 2009 Ninth IEEE International Conference on Data Mining, Miami, FL, 6-9 December 2009, 149-158. [Google Scholar] [CrossRef
[10] Vaarandi, R. (2004) A Breadth-First Algorithm for Mining Frequent Patterns from Event Logs. In: Aagesen, F.A., Anutariya, C. and Wuwongse, V., Eds., Intelligence in Communication Systems. INTELLCOMM 2004. Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, 293-308. [Google Scholar] [CrossRef
[11] Yamanishi, K. and Maruyama, Y. (2005) Dynamic Syslog Mining for Network Failure Monitoring. Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, Chicago, IL, 21-24 August 2005, 499-508. [Google Scholar] [CrossRef
[12] Zhao, X., Zhang, Y., Lion, D., et al. (2014) LPROF: A Non-Intrusive Request Flow Profiler for Distributed Systems. In: Proceedings of the 11th USENIX Symposium on Op-erating Systems Design and Implementation, Broomfield, CO, 629-644.
[13] Kc, K. and Gu, X. (2011) ELT: Efficient Log-Based Troubleshooting System for Cloud Computing Infrastructures. Proceedings of the 30th IEEE Symposium on Reliable Distributed Systems, Madrid, Spain, 4-7 October 2011, 11-20. [Google Scholar] [CrossRef
[14] Debnath, B, Khan, L, Solaimani, M., et al. (2018) LogLens A Re-al-Time Log Analysis System. IEEE 38th International Conference on Distributed Computing Systems, Vienna, Austria, 2-6 July 2018, 1052-1062. [Google Scholar] [CrossRef
[15] Beschastnikh, I., Brun, Y., Ernst, M.D. and Krishnamurthy, A. (2014) Inferring Models of Concurrent Systems from Logs of Their Behavior with CSight. Proceedings of the 36th In-ternational Conference on Software Engineering, Hyderabad, Italy, 31 May-7 June 2014, 468-479. [Google Scholar] [CrossRef
[16] Beschastnikh, I., Brun, Y., Schneider, S., et al. (2011) Leveraging Existing Instrumentation to Automatically Infer Invariant-Constrained Models. Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, Szeged, Hungary, 5-9 Sep-tember 2011, 267-277. [Google Scholar] [CrossRef
[17] Logstash (2018) Centralize, Transform & Stash Your Data.
https://www.elastic.co/cn/products/logstash
[18] Chen, M., Zheng, A.X., Lloyd, J., Jordan, M.I. and Brewer, E. (2004) Failure Diagnosis Using Decision Trees. Proceedings of the 1st International Conference on Autonomic Compu-ting, New York, 17-18 May 2004, 36-43.
[19] Liang, Y., Zhang, Y., Xiong, H. and Sahoo, R. (2007) Failure Predic-tion in IBM BlueGene/L Event Logs. Seventh IEEE International Conference on Data Mining, Omaha, NE, 28-31 Octo-ber 2007, 583-588. [Google Scholar] [CrossRef
[20] Lin, Q., Zhang, H., Lou, J.G., Zhang, Y. and Chen, X. (2016) Log Clustering Based Problem Identification for Online Service Systems. 201616 Proceedings of the 38th International Con-ference on Software Engineering, Austin, TX, 14-22 May 2016, 102-111. [Google Scholar] [CrossRef
[21] Xu, W., Ling, H., Fox, A., Patterson, D. and Jordan, M.I. (2009) Detecting Large-Scale System Problems by Mining Console Logs. Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, Big Sky, MT, 11-14 October 2009, 117-132. [Google Scholar] [CrossRef
[22] Lou, J.G., Fu, Q., Yang, S., Xu, Y. and Li, J. (2010) Mining In-variants from Console Logs for System Problem Detection. Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, Boston, MA, 23-25 June 2010, 24.
[23] Fu, X., Ren, R., Zhan, J., et al. (2012) LogMas-ter: Mining Event Correlations in Logs of Large-Scale Cluster Systems. Proceedings of the 31st Symposium on Reliable Distributed Systems, Irvine, CA, 8-11 October 2012, 71-80. [Google Scholar] [CrossRef