基于深度自编码器核概率密度的异常检测模型
Deep Autoencoding Kernel Density Estimation Model for Anomaly Detection
DOI: 10.12677/CSA.2021.113070, PDF,    科研立项经费支持
作者: 吕 鹏:烟台大学计算机与控制工程学院,山东 烟台
关键词: 异常检测深度自编码器核密度估计深度学习Anomaly Detection Deep Autoencoder Kernel Density Estimation Deep Learning
摘要: 无监督技术通常依靠数据的概率密度分布来检测异常数据,在该类异常监测模型中,具有低概率密度的对象被认为是异常对象。然而,对高维数据的密度分布建模是困难的,这使得从高维数据中检测异常数据的问题变得极具挑战性。最先进的方法被称为‘两步走’框架,该框架首先对数据应用降维技术进行降维,然后在低维空间进行异常检测来解决此问题。不幸的是,低维空间不一定保留原始高维数据的密度分布,这损害了异常检测的有效性。在这项工作中,本文提出了一种新颖的高维数据异常检测方法,称为AEDE (AutoEncoding kernel Density Estimation model)。核心思想是结合核密度估计(KDE)的密度估计能力和深度自编码器的表示学习能力,以便可以学习能够有效分离异常数据的概率密度分布。通过在自编码器的训练过程中使用概率密度策略,AEDE成功地整合了两部分的优势,即深度自编码器和概率密度模型。本文使用四个公开数据集进行的实验表明,在检测异常方面,AEDE模型明显优于最新方法,F1得分提高了30%。
Abstract: Unsupervised techniques typically rely on the probability density distribution of the data to detect anomalies, where objects with low probability density are considered to be abnormal. However, modeling the density distribution of high dimensional data is known to be hard, making the problem of detecting anomalies from high-dimensional data challenging. The state-of-the-art methods solve this problem by first applying dimension reduction techniques to the data and then detecting anomalies in the low dimensional space. Unfortunately, the low dimensional space does not necessarily preserve the density distribution of the original high dimensional data. This jeopardizes the effectiveness of anomaly detection. In this work, we propose a novel high dimensional anomaly detection method called AEDE. The key idea is to unify the representation learning capacity of deep autoencoder with the density estimation power of kernel density estimation (Auto Encoding kernel Density Estimation model, KDE) such that a probability density distribution of the high dimensional data can be learned that is able to effectively separate the anomalies out. AEDE successfully consolidates the merits of the two worlds, namely variational autoencoder and KDE by using a probability density-aware strategy in the training process of the autoencoder. Our extensive experiments using four benchmark datasets demonstrate that our method significantly outperforms the state-of-the-art methods in detecting anomalies, achieves up to 30% improvement in F1 score.
文章引用:吕鹏. 基于深度自编码器核概率密度的异常检测模型[J]. 计算机科学与应用, 2021, 11(3): 682-689. https://doi.org/10.12677/CSA.2021.113070

参考文献

[1] Tan, S.C., Ting, K.M. and Liu, T.F. (2011) Fast Anomaly Detection for Streaming Data. Proceedings of the 22nd Inter-national Joint Conference on Artificial Intelligence, Barcelona, 16-22 July 2011, 1511-1516.
[2] Liu, F.T., Ting, K.M. and Zhou, Z.-H. (2008) Isolation Forest. 2008 8th IEEE International Conference on Data Mining, Pisa, 15-19 Decem-ber 2008, 413-422. [Google Scholar] [CrossRef
[3] Keller, F., Muller, E. and Bohm, K. (2012) HiCS: High Contrast Subspaces for Density-Based Outlier Ranking. 2012 IEEE 28th International Conference on Data Engi-neering, Arlington, 1-5 April 2012, 1037-1048. [Google Scholar] [CrossRef
[4] 陈科谚, 余蕙君, 张瑚, 等. 唇腭裂在胎儿期发育异常的染色体核型和微阵列分析[J]. 广东医学, 2019, 40(20): 2880-2885.
[5] 卓琳, 赵厚宇, 詹思延. 异常检测方法及其应用综述[J]. 计算机应用研究, 2020(S1): 9-15.
[6] Chandola, V., Banerjee, A. and Kumar, V. (2009) Anomaly Detection: A Survey. ACM Computing Surveys (CSUR), 41, 1-58. [Google Scholar] [CrossRef
[7] Idé, T. and Kashima, H. (2004) Eigenspace-Based Anomaly Detection in Computer Systems. Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2004, 440-449. [Google Scholar] [CrossRef
[8] Yu, W., Aggarwal, C.C., Ma, S. and Wang, H. (2013) On Anoma-lous Hotspot Discovery in Graph Streams. 2013 IEEE 13th International Conference on Data Mining, Dallas, 7-10 De-cember 2013, 1271-1276. [Google Scholar] [CrossRef
[9] Chalapathy, R. and Chawla, S. (2019) Deep Learning for Anomaly Detection: A Survey. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, August 2020, 3507-3508. [Google Scholar] [CrossRef
[10] Kingma, D.P. and Dhariwal, P. (2018) Glow: Generative Flow with Invertible 1x1 Convolutions. Proceedings of the Advances in Neural Information Processing Systems, NeurIPS, 10215-10224.
[11] 李锋, 王泽南. 基于RNN的心电信号异常检测研究[J]. 智慧健康, 2018, 4(31): 10-13.
[12] Ravanbakhsh, M., Nabi, M., Mousavi, H., et al. (2018) Plug-and-Play CNN for Crowd Motion Analysis: An Application in Abnormal Event Detection. IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, 12-15 March 2018, 1689-1698. [Google Scholar] [CrossRef
[13] An, J. and Cho, S. (2015) Variational Autoencoder Based Anomaly Detection Using Reconstruction Probability. Special Lecture on IE, 2, 1-18.
[14] Zenati, H., Romain, M., Foo, C.-S., et al. (2018) Adversarially Learned Anomaly Detection. IEEE Interna-tional Conference on Data Mining (ICDM), Singapore, 17-20 November 2018, 727-736. [Google Scholar] [CrossRef
[15] Schlegl, T., Seeböck, P., Waldstein, S.M., et al. (2019) f-AnoGAN: Fast Unsupervised Anomaly Detection with Generative Adversarial Networks. Medical Image Analysis, 54, 30-44. [Google Scholar] [CrossRef] [PubMed]
[16] Zong, B., Song, Q., Min, M.R., et al. (2018) Deep Auto-encoding Gaussian Mixture Model for Unsupervised Anomaly Detection. Proceedings of the International Conference on Learning Representations, Vancouver.
[17] Günter, S., Schraudolph, N.N. and Vishwanathan, S.V.N. (2007) Fast Iterative Kernel Principal Component Analysis. The Journal of Machine Learning Research, 8, 1893-1918.
[18] Chen, Y., Zhou, X.S. and Huang, T.S. (2001) One-Class SVM for Learning in Image Retrieval. Proceedings 2001 Internation-al Conference on Image Processing, Thessaloniki, 7-10 October 2001, 34-37.