加密流量数据集类别不平衡的研究
Research on Class Imbalance in Encrypted Traffic Datasets
摘要: 近年来,随着深度学习技术的迅猛发展,网络安全领域的研究人员开始探索利用深度学习解决加密流量分类问题。然而,目前公开的加密流量数据集存在严重的类别不平衡问题,这对于深度学习分类方法的性能造成了一定的影响。从头构建一个完整的加密流量数据集是耗时且昂贵的。为了克服这个问题,本文提出了一种基于改进的生成对抗网络(GAN)的加密流量生成模型。该模型通过在GAN模型中添加数据包的统计特征和网络流的类别标签作为条件约束,从而生成逼真的流量数据,进而扩充数据集。实验证明,在使用经过本文方法增强的数据集时,基于深度学习的加密流量分类器展现出比使用随机过采样(ROS)、合成少数类过采样技术(SMOTE)和传统的对抗生成网络(GAN)技术更出色的性能。
Abstract: In recent years, with the rapid development of deep learning technology, researchers in the field of network security have begun to explore using deep learning to solve the problem of encrypted traffic classification. However, currently available encrypted traffic datasets suffer from serious class imbalance issues, which can adversely affect the performance of deep learning classification methods. Creating a complete encrypted traffic dataset from scratch is both time-consuming and expensive. To address this issue, this paper proposes an improved generative adversarial network (GAN) based model for generating encrypted traffic data. The model adds packet statistics feature vectors as conditional constraints to the GAN model, thereby generating realistic traffic data to expand the dataset. Experimental results show that when using our method to enhance the dataset, the deep learning-based encrypted traffic classifier exhibits better performance than that using random oversampling (ROS), synthetic minority oversampling technique (SMOTE), and traditional GAN techniques.
文章引用:王晓. 加密流量数据集类别不平衡的研究[J]. 理论数学, 2024, 14(1): 23-33. https://doi.org/10.12677/PM.2024.141004

参考文献

[1] Vu, L., Van Tra, D. and Nguyen, Q.U. (2016) Learning from Imbalanced Data for Encrypted Traffic Identification Problem. Proceedings of the Seventh Symposium on Information and Communication Technology, ser. SoICT’16, New York, NY, 147-152. [Google Scholar] [CrossRef
[2] Japkowicz, N. (2000) Learning from Imbal-anced Data Sets: A Comparison of Various Strategies. AAAI Workshop on Learning from Imbalanced Data Sets.
[3] Chawla, N.V., Bowyer, K.W., Hall, L.O., et al. (2002) SMOTE: Synthetic Minority Over-Sampling Tech-nique. Journal of Artificial Intelligence Research, 16, 321-357. [Google Scholar] [CrossRef
[4] Wang, Q., Li, L., Jiang, B., et al. (2020) Malicious Domain Detection Based on K-Means and Smote. International Conference on Computational Science, Amsterdam, The Netherlands, Springer, Cham, 468-481. [Google Scholar] [CrossRef
[5] Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al. (2014) Generative Adversarial Nets. Advances in Neural Information Processing Systems, Montreal, 2672-2680.
[6] Vu, L., Bui, C.T. and Nguyen, Q.U. (2017) A Deep Learning Based Method for Handling Imbalanced Problem in Network Traffic Classification. Eighth International Symposium on Information & Communication Technology, New York, December 2017, 333-339. [Google Scholar] [CrossRef
[7] Dainotti, A., Pescape, A. and Claffy, K.C. (2012) Issues and Future Directions in Traffic Classification. Network IEEE, 26, 35-40. [Google Scholar] [CrossRef
[8] Mirza, M. and Osindero, S. (2014) Conditional Generative Adversarial Nets. Computer Science, 2672-2680.
[9] Zeiler, M.D., Krishnan, D., Taylor, G.W., et al. (2010) Deconvolutional Networks. Computer Vision & Pattern Recognition, San Francisco, CA, 13-18 June 2010, 2528-2535. [Google Scholar] [CrossRef
[10] Wang, W., Zhu, M., Wang, J., et al. (2017) End-to-End En-crypted Traffic Classification with One-Dimensional Convolution Neural Networks. IEEE International Conference on Intelligence and Security Informatics (ISI), Beijing, 22-24 July 2017, 43-48. [Google Scholar] [CrossRef
[11] Lin, K., Xu, X. and Gao, H. (2021) TSCRNN: A Novel Classifi-cation Scheme of Encrypted Traffic Based on Flow Spatiotemporal Features for Efficient Management of IIoT. Computer Networks, 190, Article ID: 107974. [Google Scholar] [CrossRef
[12] Lashkari, A.H., Kaur, G. and Rahali, A. (2020) DIDarknet: A Contemporary Approach to Detect and Characterize the Darknet Traffic Using Deep Image Learning. Proceedings of the 2020 10th International Conference on Communication and Network Security (ICCNS 2020), New York, 27-29 November 2020, 1-13.
[13] Lashkari, A.H., Draper-Gil, G., Mamun, M.S.I., et al. (2016) Characterization of Encrypted and VPN Traffic Using Time-Related Features. Proceedings of the 2nd International Conference on Information Systems Security and Privacy ICISSP, 1, 407-414. [Google Scholar] [CrossRef
[14] Lashkari, A.H., Gil, G.D., Mamun, M.S.I., et al. (2017) Characterization of Tor Traffic Using Time Based Features. International Conference on Information Systems Security & Privacy, Porto, 253-262.
[15] Zeng, Y., Gu, H., Wei, W., et al. (2019) Deep-Full-Range: A Deep Learning Based Network Encrypted Traffic Classification and Intrusion Detection Framework. IEEE Access, 7, 45182-45190. [Google Scholar] [CrossRef