基于主机的高级持续威胁检测技术综述
A Survey of Host-Based Advanced Persistent Threat Detection Technology
DOI: 10.12677/CSA.2022.121024, PDF,    国家科技经费支持
作者: 徐志强:中国科学院大学网络空间安全学院,北京;中国科学院信息工程研究所,北京;文 雨:中国科学院信息工程研究所,北京
关键词: 高级持续威胁主机实体威胁检测信息安全Advanced Persistent Threat Host Entity Threat Detection Cyber Security
摘要: 近年高级持续威胁(Advanced Persistent Threat, APT)已成为威胁国家安全、组织机构利益和个人隐私的严重网络空间安全危害。APT具有攻击过程复杂、隐蔽性高和破坏性强的特点,极难被检测和防御。而主机系统通常是APT活动的主要攻击目标。因此关注基于主机的APT检测技术的研究进展和未来趋势具有重要意义。本文首先总结了APT的生命周期和各攻击阶段特点及主机安全问题。接着介绍了主机实体类型及其行为数据类型。然后系统化总结了基于主机实体行为的APT检测技术。又归纳了威胁检测评价数据集和评价指标。最后总结了当前技术挑战并展望了未来研究方向。
Abstract: Recently, Advanced Persistent Threat (Advanced Persistent Threat, APT) has become a serious problem in cyber security that threatens national security, organizational interests and personal privacy. APTs are difficult to be defended against and detected because of their complex attack process, high concealment, and strong destruction. Host systems are often the primary target of APT activities. Therefore, it is of great significance to focus on the research progress and future trend of host-based APT detection. This paper first summarizes the life cycle of APT and characteristics of each attack stage and host security issues. It then introduces the types of host entities and the types of their behavior data. Then host entity behavior based APT detection techniques are systematically summarized. The evaluation methods of threat detection techniques are introduced, including data sets and evaluation metrics. Finally, the technical challenges and future research are concluded.
文章引用:徐志强, 文雨. 基于主机的高级持续威胁检测技术综述[J]. 计算机科学与应用, 2022, 12(1): 233-251. https://doi.org/10.12677/CSA.2022.121024

参考文献

[1] Chen, T.M. and Abu-Nimeh, S. (2011) Lessons from Stuxnet. Computer, 44, 91-93. [Google Scholar] [CrossRef
[2] Lelli, A. (2018) The Trojan. Hydraq Incident: Analysis of the Aurora 0-Day Exploit. http://www.symantec.com/connect/blogs/trojanhydraq-inc
[3] Arquilla, J. and Guzdial, M. (2021) The SolarWinds Hack, and a Grand Challenge for CS Education. Communications of the ACM, 64, 6-7. [Google Scholar] [CrossRef
[4] Chen, P., Desmet, L. and Huygens, C. (2014) A Study on Advanced Per-sistent Threats. In: De Decker, B. and Zúquete, A., Eds., Communications and Multimedia Security. CMS 2014. Lecture Notes in Computer Science, Vol. 8735, Springer, Berlin, 63-72. [Google Scholar] [CrossRef
[5] Zhu, Z. and Dumitras, T. (2018) ChainSmith: Automatically Learning the Semantics of Malicious Campaigns by Mining Threat Intelligence Reports. 2018 IEEE European Sympo-sium on Security and Privacy (EuroSP), London, 24-26 April 2018, 458-472. [Google Scholar] [CrossRef
[6] Khonji, M., Iraqi, Y. and Jones, A. (2013) Phishing Detection: A Literature Survey. IEEE Communications Surveys & Tutorials, 15, 2091-2121. [Google Scholar] [CrossRef
[7] Kaspersky (2021) Malware Reports: IT Threat Evolution Q3.
https://securelist.com/it-threat-evolution-in-q3-2021-pc-statistics/104982/
[8] Christiaan, B., Douglas, F., Paula, G., et al. (2015) Mcafee Labs Threats Report. Technical Report, McAfee.
[9] Symantec Internet Security Threat Report (2019).
https://www.phishingbox.com/downloads/Symantec-Security-Internet-Threat-Report-ISRT-2019.pdf
[10] Data Breach Investigations Report (2021).
https://www.verizon.com/business/resources/reports/dbir/
[11] Kaspersky (2009).
https://usa.kaspersky.com/resource-center/threats/zeus-virus
[12] Kocak, T. and Kaya, I. (2006) Low-Power Bloom Filter Architecture for Deep Packet Inspection. IEEE Communications Letters, 10, 210-212. [Google Scholar] [CrossRef
[13] Borders, K. and Prakash, A. (2004) Web Tap: Detecting Covert Web Traffic. Proceedings of the 11th ACM conference on Computer and Communications Security, Washington DC, 25-29 October 2004, 110-120. [Google Scholar] [CrossRef
[14] Eberle, W., Graves, J. and Holder, L. (2010) Insider Threat Detec-tion Using a Graph-Based Approach. Journal of Applied Security Research, 6, 32-81. [Google Scholar] [CrossRef
[15] Kammueller, F., Kerber, M. and Probst, C.W. (2016) Towards Formal Analysis of Insider Threats for Auctions. MIST’16: Proceedings of the 8th ACM CCS International Workshop on Managing Insider Security Threats, Vienna, 28 October 2016, 23-34. [Google Scholar] [CrossRef
[16] Myers, J., Grimaila, M.R. and Mills, R.F. (2009) Towards Insider Threat Detection Using Web Server Logs. CSIIRW’09: Proceedings of the 5th Annual Workshop on Cyber Security and Information Intelligence Research: Cyber Security and Information Intelligence Challenges and Strategies, 54, 1-4. [Google Scholar] [CrossRef
[17] Mathew, S., Petropoulos, M., Ngo, H.Q., et al. (2010) A Da-ta-Centric Approach to Insider Attack Detection in Database Systems. In: Jha, S., Sommer, R. and Kreibich, C., Eds., Recent Advances in Intrusion Detection. RAID 2010. Lecture Notes in Computer Science, Vol. 6307, Springer, Berlin, Heidelberg, 382-401. [Google Scholar] [CrossRef
[18] Ben, J.W. and Kheir, N. (2016) A Grey-Box Approach for Detecting Malicious User Interactions in Web Applications. MIST’16: Proceedings of the 8th ACM CCS International Workshop on Managing Insider Security Threats, Vienna, 28 October 2016, 1-12.
[19] 王琴琴, 周昊, 严寒冰, 等. 基于恶意代码传播日志的网络安全态势分析[J]. 信息安全学报, 2019, 4(5): 14-24.
[20] Jing, X.Y., Zheng, Y. and Witold, P. (2018) Security Data Collection and Data Analytics in the Internet: A Survey. IEEE Communications Surveys & Tutorials, 21, 586-618. [Google Scholar] [CrossRef
[21] Wardman, B., Warner, G., Mccalley, H., et al. (2010) Reeling in Big Phish with a Deep MD5 Net. Journal of Digital Forensics, Security and Law, 5, 33-56. [Google Scholar] [CrossRef
[22] 汪嘉来, 张超, 戚旭衍, 等. Windows平台恶意软件智能检测综述[J]. 计算机研究与发展, 2021, 58(5): 977-994.
[23] 潘亚峰, 周天阳, 朱俊虎, 等. 基于ATT&CK的APT攻击语义规则构建[J]. 信息安全学报, 2021, 6(3): 77-90.
[24] Mayhew, M., Atighetchi, M., Adler, A., et al. (2015) Use of Machine Learning in Big Data Analytics for Insider Threat Detection. MILCOM IEEE Military Commu-nications Conference, Tampa, FL, 26-28 October 2015, 915-922. [Google Scholar] [CrossRef
[25] S. Grubb. Redhat Linux Audit (2020).
https://people.redhat.com/sgrubb/audit/
[26] Event Tracing for Windows (ETW) (2020).
https://docs.microsoft.com/en-us/windows/win32/etw/
[27] Dtrace (2017). http://dtrace.org/blogs/
[28] Sysdig (2017).
https://sysdig.com/
[29] Kamra, A., Terzi, E. and Bertino, E. (2008) Detecting Anomalous Access Patterns in Relational Databases. The VLDB Journal, 17, 1063-1077. [Google Scholar] [CrossRef
[30] CALO Project (2015) Enron Email Dataset.
https://www.cs.cmu.edu/~./enron/
[31] CERT Dataset (2016). http://www.cert.org/insider-threat/tools/index
[32] Ben. S. M. RUU Dataset. http://www1.cs.columbia.edu/ids/RUU/data/
[33] Masquerading User Data (1998). http://www.schonlau.net/intrusion.html
[34] Harilal, A., Toffalini, F., Castellanos, J., et al. (2017) Twos: A Dataset of Malicious Insider Threat Behavior Based on a Gamified Competition. Proceedings of the 2017 International Workshop on Managing Insider Security Threats, Dallas, TX, 30 October 2017, 45-56. [Google Scholar] [CrossRef
[35] 中国计算机学会推荐国际学术会议和期刊目录[Z].
https://www.ccf.org.cn/ccf/contentcore/resource/download?ID=144845, 2019.
[36] King, S.T. and Chen, P.M. (2003) Backtracking Intrusions. Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), Bolton Landing, NY, 19-22 October 2003, 223-236. [Google Scholar] [CrossRef
[37] King, S.T., Mao, Z.M., Lucchetti, D.G., et al. (2005) Enriching Intrusion ALERTS through Multi-Host Causality. Proceedings of the Annual Network and Distributed System Security Symposium (NDSS), San Diego, CA, 20 January 2005, 1-12.
[38] Wang, F., Kwon, Y., Ma, S.Q., et al. (2018) Lprov: Practical Library-Aware Provenance Tracing. Proceedings of the 34th Annual Computer Security Applications Conference (ACSAC), San Juan, PR, 3-7 December 2018, 605-617. [Google Scholar] [CrossRef
[39] Sitaraman, S. and Venkatesan, S. (2005) Forensic Analysis of File System Intrusions Using Improved Backtracking. Third IEEE International Workshop on Information Assurance, Col-lege Park, MD, 23-24 March 2005, 154-163. [Google Scholar] [CrossRef
[40] Goel, A., Po, K., Farhadi, K., et al. (2005) The Taser Intrusion Recovery System. ACM SIGOPS Operating Systems Review, 39, 163-176. [Google Scholar] [CrossRef
[41] Lee, K.H., Zhang, X.Y. and Xu, D.Y. (2013) High Accuracy At-tack Provenance via Binary-Based Execution Partition. Proceedings of the Annual Network and Distributed System Security Symposium (NDSS), San Diego, CA, 24-27 February 2013, 1-16.
[42] Ma, S.Q., Zhai, J., Wang, F., et al. (2017) MPI: Multiple Perspective Attack Investigation with Semantic Aware Execution Partitioning. 26th USENIX Security Symposium (USENIX Security), 1111-1128.
[43] Yang, R.Q., Ma, S.Q., Xu, H.T., et al. (2020) UISCOPE: Accurate, Instrumentation-Free, and Visible Attack Investigation for GUI Applications. Proceedings of the Annual Network and Distributed System Security Symposium (NDSS), San Diego, CA, 23-26 February 2020, 1-18. [Google Scholar] [CrossRef
[44] Hassan, W.U., Noureddine, M.A., Datta, P., et al. (2020) OmegaLog: High-Fidelity Attack Investigation via Transparent Multi-Layer Log Analysis. Proceedings of the Annual Network and Distributed System Security Symposium (NDSS), San Diego, CA, 23-26 February 2020, 1-16. [Google Scholar] [CrossRef
[45] Yu, L., Ma, S.Q., Zhang, Z., et al. (2021) ALchemist: Fusing Ap-plication and Audit Logs for Precise Attack Provenance without Instrumentation. Proceedings of the Annual Network and Distributed System Security Symposium (NDSS), Virtual, 21-25 February 2021, 1-18. [Google Scholar] [CrossRef
[46] Jordan, H., Scholz, B. and Subotic, P. (2016) Souffle: On Synthe-sis of Program Analyzers. In: Chaudhuri, S. and Farzan, A., Eds., Computer Aided Verification. CAV 2016. Lecture Notes in Computer Science, Vol. 9780, Springer, Cham, 422-430. [Google Scholar] [CrossRef
[47] Ji, Y., Lee, S., Downing, E., et al. (2017) RAIN: Refinable Attack Investigation with On-Demand Inter-Process Information Flow Tracking. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS), Dallas, TX, 30 October-3 November 2017, 377-390. [Google Scholar] [CrossRef
[48] Ji, Y., Lee, S., Fazzini, M., et al. (2018) Enabling Refinable Cross-Host Attack Investigation with Efficient Data Flow Tagging and Tracking. 27th USENIX Security Symposium (USENIX Security), Baltimore, MD, 15-17 August 2018, 1705-1722.
[49] Liu, Y.S., Zhang, M., Li, D., et al. (2018) Towards a Timely Causality Analysis for Enterprise Security. Proceedings of the Annual Network and Distributed System Security Symposium (NDSS), San Diego, CA, 18-21 February 2018, 1-15. [Google Scholar] [CrossRef
[50] Hossain, M.N., Sheikhi, S. and Sekar, R. (2020) Combating De-pendence Explosion in Forensic Analysis Using Alternative Tag Propagation Semantics. IEEE Symposium on Security and Privacy (SP), San Francisco, CA, 18-21 May 2020, 1139-1155. [Google Scholar] [CrossRef
[51] Fang, P.C., Gao, P., Liu, C.L., et al. (2022) Back-Propagating System Dependency Impact for Attack Investigation. 31st USENIX Security Symposium (USENIX Security), Boston, MA, 10-12 August 2021, 1-18.
[52] Hossain, M.N., Milajerdi, S.M., Wang, J., et al. (2017) Sleuth: Real-Time At-Tack Scenario Reconstruction from Cots Audit Data. 26th USENIX Security Symposium (USENIX Security), Vancouver, BC, 16-18 August 2017, 487-504.
[53] Hassan, W.U., Guo, S., Li, D., et al. (2019) NODOZE: Combatting Threat Alert Fatigue with Automated Provenance Triage. Proceedings of the Annual Network and Distributed System Security Symposium (NDSS), San Diego, CA, 24-27 February 2019, 1-15. [Google Scholar] [CrossRef
[54] Milajerdi, S.M., Gjomemo, R., Eshete, B., et al. (2019) HOLMES: Real-Time APT Detection through Correlation of Suspicious Information Flows. IEEE Symposium on Security and Privacy (SP), San Francisco, CA, 19-23 May 2019, 1137-1152. [Google Scholar] [CrossRef
[55] Milajerdi, S.M., Eshete, B., Gjomemo, R., et al. (2019) POIROT: Aligning Attack Behavior with Kernel Audit Records for Cyber Threat Hunting. Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, London, 11-15 November 2019, 1795-1812. [Google Scholar] [CrossRef
[56] Zhao, J., Yan, Q.B., Lin, X.D., et al. (2020) Cyber Threat Intelli-gence Modeling Based on Heterogeneous Graph Convolutional Network. 23rd International Symposium on Research in Attacks, Intrusions and Defenses (RAID), San Sebastian, 14-16 October 2020, 241-256.
[57] Satvat, K., Gjomemo, R. and Venkatakrishnan, V.N. (2021) EXTRACTOR: Extracting Attack Behavior from Threat Reports. IEEE Symposium on Security and Privacy (SP), Vienna, 6-10 September 2021, 598-615. [Google Scholar] [CrossRef
[58] Han, X.Y., Pasquier, T., Bates, A., et al. (2020) UNICORN: Runtime Provenance-Based Detector for Advanced Persistent Threats. Proceedings of the Annual Network and Distributed System Security Symposium (NDSS), San Diego, CA, 23-26 February 2020, 1-18. [Google Scholar] [CrossRef
[59] Alsaheel, A., Nan, X.Y., Ma, S.Q., et al. (2021) ATLAS: A Se-quence-Based Learning Approach for Attack Investigation. 30th USENIX Security Symposium (USENIX Security), Vancouver, BC, 11-13 August 2021, 3005-3022.
[60] Fang, Y., Wang, C.S., Fang, Z.Y., et al. (2022) LMTracker: Lateral Movement Path Detection Based on Heterogeneous Graph Embedding. Neurocomputing, 474, 37-47. [Google Scholar] [CrossRef
[61] Ma, S.Q., Zhang, X.Y., Xu, D.Y., et al. (2016) ProTracer: Towards Practical Provenance Tracing by Alternating Between Logging and Tainting. Proceedings of the Annual Network and Distributed System Security Symposium (NDSS), San Diego, CA, 21-24 February 2016, 1-15. [Google Scholar] [CrossRef
[62] Xu, Z., Wu, Z.Y., Li, Z.C., et al. (2016) High Fidelity Data Re-duction for Big Data Security Dependency Analyses. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS), Vienna, 24-28 October 2016, 504-516. [Google Scholar] [CrossRef
[63] Hossain, M.N., Wang, J.A., Sekar, R., et al. (2018) Depend-ence-Preserving Data Compaction for Scalable Forensic Analysis. 27th USENIX Security Symposium (USENIX Security), Baltimore, MD, 15-17 August 2018, 1723-1740.
[64] Tang, Y.T., Li, D., Li, Z.C., et al. (2018) NodeMerge: Template Based Efficient Data Reduction for Big-Data Causality Analysis. Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (CCS), Toronto, 15-19 October 2018, 1324-1337. [Google Scholar] [CrossRef
[65] Fei, P., Li, Z., Wang, Z.Y., et al. (2021) SEAL: Storage-Efficient Causality Analysis on Enterprise Logs with Query-Friendly Compression. 30th USENIX Security Symposium (USENIX Security), Vancouver, BC, 11-13 August 2021, 2987-3004.
[66] Hassan, W.U., Lemay, M., Aguse, N., et al. (2018) Towards Scalable Cluster Auditing through Grammatical Inference over Provenance Graphs. Proceedings of the Annual Network and Distributed System Security Symposium (NDSS), San Diego, CA, 18-21 February 2018, 1-15. [Google Scholar] [CrossRef
[67] Ma, S.Q., Zhai, J., Kwon, Y.H., et al. (2018) Kernel-Supported Cost-Effective Audit Logging for Causality Tracking. 2018 USENIX Annual Technical Conference (USENIX ATC), Boston, MA, 11-13 July 2018, 241-254.
[68] Bates, A., Tian, D., Moyer, T., et al. (2015) Trustworthy Whole-System Provenance for the Linux Kernel. 24th USENIX Security Symposium (USENIX Security), Washington DC, 12-14 August 2015, 319-334.
[69] Kwon, Y., Wang, F., Wang, W.H., et al. (2018) MCI: Modeling-Based Causality Inference in Audit Logging for Attack Investigation. Proceedings of the Annual Network and Distributed System Security Symposium (NDSS), San Diego, CA, 18-21 February 2018, 1-15. [Google Scholar] [CrossRef
[70] Kwon, Y.H., Kim, D., Sumner, W.N., et al. (2016) LDX: Causality Inference by Lightweight Dual Execution. ASPLOS’16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 44, 503-515. [Google Scholar] [CrossRef
[71] Gao, P., Xiao, X.S., Li, Z.C., et al. (2018) AIQL: Enabling Efficient Attack Investigation from System Monitoring Data. 2018 USENIX Annual Technical Conference (USENIX ATC), Boston, MA, 11-13 July 2018, 113-125.
[72] Xu, Z.Q., Fang, P.C., Liu, C.L., et al. (2022) DEPCOMM: Graph Summarization on System Audit Logs for Attack Investigation. IEEE Symposium on Security and Privacy (SP), San Francisco, CA, 22-26 May 2021, 70-87.
[73] Holgado, P., Villagra, V.A. and Vazquez, L. (2020) Real-Time Multistep Attack Prediction Based on Hidden Markov Models. IEEE Transactions on Dependable and Secure Computing, 17, 134-147. [Google Scholar] [CrossRef
[74] Shen, Y., Mariconti, E., Vervier, P.A., et al. (2018) Tiresias: Predicting Security Events through Deep Learning. Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (CCS), Toronto, 15-19 October 2018, 592-605. [Google Scholar] [CrossRef
[75] Shen, Y., Stringhini, G., et al. (2019) ATTACK2VEC: Leveraging Temporal Word Embeddings to Understand the Evolution of Cyberattacks. 28th USENIX Security Symposium (USENIX Security), Santa Clara, CA, 14-16 August 2019, 905-921.
[76] Siadati, H., Memon, N., et al. (2017) Detecting Structurally Anomalous Logins within Enterprise Networks. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS), Dallas, TX, 30 October-3 November 2017, 1273-1284. [Google Scholar] [CrossRef
[77] Bohara, A., Noureddine, M.A., Fawaz, A., et al. (2017) An Unsu-pervised Multi-Detector Approach for Identifying Malicious Lateral Movement. IEEE 36th Symposium on Reliable Distributed Systems (SRDS), Hong Kong, 26-29 September 2017, 224-233. [Google Scholar] [CrossRef
[78] Liu, F.C., Wen, Y., Zhang, D.X., et al. (2019) Log2vec: A Hetero-geneous Graph Embedding Based Approach for Detecting Cyber Threats within Enterprise. Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (CCS), London, 11-15 November 2019, 1777-1794. [Google Scholar] [CrossRef
[79] Liu, F.C., Wen, Y., Wu, Y.N., et al. (2020) MLTracer: Malicious Logins Detection System via Graph Neural Network. IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Guangzhou, 29 December-1 January 2021, 715-726. [Google Scholar] [CrossRef
[80] Creech, G. and Hu, J. (2013) Generation of a New IDS Test Dataset: Time to Retire the KDD Collection. 2013 IEEE Wireless Communications and Networking Conference (WCNC), Shanghai, 7-10 April 2013, 4487-4492. [Google Scholar] [CrossRef
[81] Haider, W., Hu, J., Slay, J., et al. (2017) Generating Realistic Intrusion Detection System Dataset Based on Fuzzy Qualitative Modeling. Journal of Network and Computer Applica-tions, 87, 185-192. [Google Scholar] [CrossRef
[82] DARPA Intrusion Detection Evaluation Dataset (1998).
https://www.ll.mit.edu/r-d/datasets/1998-darpa-intrusion-detection-evaluation-dataset
[83] KDD Cup 1999 Data (1999). http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
[84] Camina, J.B., Hernandez, G.C., Monroy, R., et al. (2014) The Windows-Users and Intruder Simulations Logs Dataset (WUIL): An Experimental Framework for Masquerade Detection Mechanisms. Expert Systems with Applications, 41, 919-930. [Google Scholar] [CrossRef
[85] Kent, A.D. (2015) Comprehensive, Multi-Source Cyber-Security Events Data Set.
https://csr.lanl.gov/data/cyber1/
[86] Garg, A., Rahalkar, R., Upadhyaya, S., et al. (2006) Pro-filing Users in GUI Based Systems for Masquerade Detection. Proceedings of the 2006 IEEE Workshop on Information Assurance, West Point, NY, 21-23 June 2006, 48-54. [Google Scholar] [CrossRef
[87] DARPA (2018) Transparent Computing Engagement 3 Data Release.
https://github.com/darpa-i2o/Transparent-Computing
[88] Pei, K., Gu, Z.S. and Saltaformaggio, B. (2016) HERCULE: Attack Story Reconstruction via Community Discovery on Correlated Log Graph. Proceedings of the 32nd Annual Conference on Computer Security Applications (ACSAC), Los Angeles, CA, 5-8 December 2016, 583-595. [Google Scholar] [CrossRef
[89] Xu, J.H., Wen, Y., Yang, C., et al. (2020) An Approach for Poisoning Attacks against RNN-Based Cyber Anomaly Detection. IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Guangzhou, 29 December-1 January 2021, 1680-1687. [Google Scholar] [CrossRef