基于图嵌入和多模态深度学习的加密流量分类
Encrypted Traffic Classification Based on Graph Embedding and Multimodal Deep Learning
DOI: 10.12677/CSA.2022.125142, PDF,   
作者: 杨瑞鹏:中国科学院信息工程研究所,北京;中国科学院大学网络空间安全学院,北京;于爱民, 蔡利君, 孟 丹:中国科学院信息工程研究所,北京
关键词: 图嵌入加密流量分类深度学习Graph Embedding Encrypted Traffic Classification Deep Learning
摘要: 伴随着互联网的发展,新的应用层出不穷,并且加密流量所占流量比例不断提高。与此同时,多数网络恶意攻击也以加密的形式在网络中传播。因此,对网络流量进行精细化分类有利于提高网络管理水平和减少网络安全风险。传统的基于端口和基于负载的分类方法对海量应用和加密流量已经不再适用,使用机器学习的加密流量分类方法的性能受限于流量的统计特征。在本文中,我们提出了基于图嵌入和多模态深度学习的加密流量分类方法。该方法使用多模态深度学习模型联合了两种类型的特征——流序列特征和图嵌入特征。实验结果表明,我们所提出的方法明显优于已有最先进的方法,并且具有很好的抗干扰能力。
Abstract: With the rapid development of the Internet, more and more new applications are emerging, and the proportion of encrypted traffic continues to increase. At the same time, most network malicious attacks are also spread in the network in encrypted form. Therefore, the refined classification of net-work traffic is compulsory to improve the efficiency of the network and reduce network security risks. Traditional port-based and load-based classification methods are no longer applicable to massive applications and encrypted traffic, and the performance of encrypted traffic classification methods using machine learning is limited by the statistical characteristics of traffic. In this paper, we propose an encrypted traffic classification method based on graph embedding and multimodal deep learning. The method combines two types of features—flow sequence features and graph embedding features using a multimodal deep learning model. Experimental results show that our proposed method significantly outperforms the existing state-of-the-art methods and has a good anti-interference ability.
文章引用:杨瑞鹏, 于爱民, 蔡利君, 孟丹. 基于图嵌入和多模态深度学习的加密流量分类[J]. 计算机科学与应用, 2022, 12(5): 1425-1435. https://doi.org/10.12677/CSA.2022.125142

参考文献

[1] Lin, C.H. and Lai, Y.Y. (2004) A Fingerprint-Based User Authentication Scheme for Multimedia Systems. Proceedings of the 2004 IEEE International Conference on Multimedia & Expo (ICME 2004), Taipei, 27-30 June 2004, 935-938.
[2] Internet Assigned Numbers Authority (2010) Port Numbers. http://www.iana.org/assignments/port-numbers
[3] Karagiannis, T., Broido, A., Faloutsos, M. and Claffy, K.C. (2004) Transport Layer Identification of P2P Traffic. IMC’04: Proceedings of the 4th ACM SIGCOMM Conference on Internet Measurement, 121-134. [Google Scholar] [CrossRef
[4] Moore, A. and Papagiannaki, K. (2005) Toward the Accurate Identification of Network Applications. Passive and Active Measurement Workshop (PAM 2005), Boston, 31 March-1 April 2005, 41-54. [Google Scholar] [CrossRef
[5] Madhukar, A. and Williamson, C. (2006) A Longitudinal Study of P2P Traffic Classification. 14th IEEE International Symposium on Modeling Analysis, and Simulation of Com-puter and Telecommunication Systems, Monterey, 11-14 September 2006, 179-188.
[6] Knuth, D.E., Morris, J.H. and Pratt, V.R. (1977) Fast Pattern Matching in Strings. SIAM Journal on Computing, 6, 323-350. [Google Scholar] [CrossRef
[7] Boyer, R.S. and Moore, J.S. (1977) A Fast String Searching Algorithm. Communications of the ACM, 20, 762-772. [Google Scholar] [CrossRef
[8] Aho, A.V. and Corasick, M.J. (1975) Efficient String Matching: An Aid to Bibliographic Search. Communications of the ACM, 18, 333-340. [Google Scholar] [CrossRef
[9] Cohn, D.A., Ghahramani, Z. and Jordan, M.I. (1996) Active Learning with Statistical Models. Journal of Artificial Intelligence Research, 4, 129-145. [Google Scholar] [CrossRef
[10] Nguyen, T. and Armitage, G. (2007) A Survey of Techniques for Internet Traffic Classification Using Machine Learning. IEEE Communications Surveys & Tutorials, 10, 56-76. [Google Scholar] [CrossRef
[11] Lang, T., Aritage, G., Branch, P., et al. (2004) A Synthetic Traf-fic Model for Quake3. Proceedings of 2004 ACM SIGCHI International Conference on Advances in Computer Enter-tainment Technology, Singapore, 3-5 June 2004, 233-238. [Google Scholar] [CrossRef
[12] McGregor, A.J., Hall, M.A., Lorier, P. and Brunskill, J. (2004) Flow Clustering Using Machine Learning Techniques. In: Barakat, C. and Pratt, I., Eds., Passive and Active Network Measurement (PAM 2004). Lecture Notes in Computer Science, Springer, Berlin, 205-214. [Google Scholar] [CrossRef
[13] Taylor, V.F., Spolaor, R., Conti, M. and Martinovic, I. (2016) AppScanner: Automatic Fingerprinting of Smartphone Apps from Encrypted Network Traffic. 2016 IEEE European Symposium on Security and Privacy (EuroS&P), Saarbruecken, 21-24 March 2016, 439-454. [Google Scholar] [CrossRef
[14] Conti, M., Mancini, L., Spolaor, R. and Verde, N. (2016) Analyzing Android Encrypted Network Traffic to Identify User Actions. IEEE Transactions on Information Forensics and Security, 11, 114-125. [Google Scholar] [CrossRef
[15] Anderson, B. and McGrew, D. (2016) Identifying Encrypted Malware Traffic with Contextual Flow Data. ACM Workshop on Artificial Intelligence and Security, Vienna, 28 October 2016, 35-46. [Google Scholar] [CrossRef
[16] Anderson, B., Paul, S. and McGrew, D. (2016) Deciphering Mal-ware’s Use of TLS (without Decryption). arXiv:1607.01639.
[17] Anderson, B. and McGrew, D. (2017) Machine Learning for Encrypted Malware Traffic Classification: Accounting for Noisy Labels and Non-Stationarity. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, 13-17 August 2017, 1723-1732. [Google Scholar] [CrossRef
[18] Liu, Z.M., Zheng, V.W., Zhao, Z., Zhu, F.W., Chang, K.C.C., Wu, M.H. and Ying, J. (2017) Semantic Proximity Search on Heterogeneous Graph by Proximity Em-bedding. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17), 31, 154-160.
[19] Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M. and Monfardini, G. (2009) The Graph Neural Network Model. IEEE Transactions on Neural Networks, 20, 61-80. [Google Scholar] [CrossRef
[20] Perozzi, B., Al-Rfou, R. and Skiena, S. (2014) DeepWalk: Online Learning of Social Representations. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’14), New York, 24-27 August 2014, 701-710. [Google Scholar] [CrossRef
[21] Grover, A. and Leskovec, J. (2016) Node2Vec: Scalable Feature Learning for Networks. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16), San Francisco, 13-17 August 2016, 855-864. [Google Scholar] [CrossRef] [PubMed]
[22] Ribeiro, L.F.R., Saverese, P.H.P. and Figueiredo, D.R. (2017) Struc2Vec: Learning Node Representations from Structural Identity. Proceedings of the 23rd ACM SIGKDD Interna-tional Conference on Knowledge Discovery and Data Mining (KDD ’17), Halifax, 13-17 August 2017, 385-394.
[23] Tang, J., Qu, M., Wang, M.Z., Zhang, M., Yan, J. and Mei, Q.Z. (2015) LINE: Large-Scale Information Network Embedding. Proceedings of the 24th International Conference on World Wide Web (WWW’15), Florence, 18-22 May 2015, 1067-1077. [Google Scholar] [CrossRef
[24] Korczyński, M. and Duda, A. (2014) Markov Chain Fingerprinting to Classify Encrypted Traffic. IEEE INFOCOM 2014—IEEE Conference on Computer Communications, Toronto, 27 April-2 May 2014, 781-789. [Google Scholar] [CrossRef
[25] Shen, M., Wei, M., Zhu, L. and Wang, M. (2017) Classi-fication of Encrypted Traffic with Second-Order Markov Chains and Application Attribute Bigrams. IEEE Transactions on Information Forensics and Security, 12, 1830-1843. [Google Scholar] [CrossRef
[26] Shen, M., Wei, M., Zhu, L., Wang, M. and Li, F. (2016) Certifi-cate-Aware Encrypted Traffic Classification Using Second-Order Markov Chain. 2016 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS), Beijing, 20-21 June 2016, 1-10.
[27] Liu, C., Cao, Z.G., Xiong, G., Gou, G.P., Yiu, S.M. and He, L.T. (2018) Mampf: Encrypted Traffic Classification Based on Multi-Attribute Markov Proba-bility Fingerprints. 2018 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS), Banff, 4-6 June 2018, 1-10.
[28] Chen, W., Jia, X., Chang, H.J., et al. (2021) Fs-Net: Fast Shape-Based Network for Category-Level 6D Ob-ject Pose Estimation with Decoupled Rotation Mechanism. Proceedings of the IEEE/CVF Conference on Computer Vi-sion and Pattern Recognition, Nashville, 20-25 June 2021, 1581-1590. [Google Scholar] [CrossRef