基于生成式对抗网络的恶意URL数据生成与检测
Generating and Detection Malicious URL Based on Generative Adversarial Networks
DOI: 10.12677/CSA.2020.105096, PDF,   
作者: 郑 阳*:新疆大学信息科学与工程学院,新疆 乌鲁木齐;努尔布力:新疆大学网络中心,新疆 乌鲁木齐
关键词: 恶意网页识别机器学习生成对抗网络多判别器分类Malicious Web Page Detection Machine Learning Generative Adversarial Network Multiple Discriminator Classification
摘要: 针对基于机器学习的恶意网页识别中对数据集的收集和标注敏感的问题,提出了一种基于生成式对抗网络(GAN)的检测方法,并且设计了编码器,将恶意URL进行字符级编码。通过使用少量样本训练模型,通过GAN拟合真实样本的能力,生成恶意网页样本。本文在传统GAN的基础上增加了一个判别器用来判别良性和恶性网页,达到了判别恶意网页的作用。最后通过横纵对比实验,分别验证了生成数据的可行以及判别模型可以达到当前有监督分类器相当的效果。
Abstract: Malicious web page recognition based on machine learning is sensitive to data collection and annotation. This paper proposes a method of generating and detecting malicious web pages based on Generative Adversarial Networks (GAN). Design an encoder in order to encode malicious URL at character level. A small number of samples were used to train the model, and the ability of GAN to fit real samples was used to generate malicious web page samples. On the basis of traditional GAN, this paper adds a discriminator to discriminate benign and malignant web pages, and achieves the function of discriminating malicious web pages. Finally, the feasibility of the generated data and the effectiveness of the discriminant model with the currently supervised classifier are verified by vertical and horizontal comparison experiments.
文章引用:郑阳, 努尔布力. 基于生成式对抗网络的恶意URL数据生成与检测[J]. 计算机科学与应用, 2020, 10(5): 935-943. https://doi.org/10.12677/CSA.2020.105096

参考文献

[1] 中国互联网信息中心. 第44次中国互联网发展状况统计报告[EB/OL]. http://www.cac.gov.cn/2019-08/30/c_1124938750.htm, 2019-08-30
[2] 瑞星. 2019年中国网络安全报告[EB/OL]. http://it.rising.com.cn/dongtai/19692.html, 2020-01-15
[3] 赛门铁克. 2018年互联网安全威胁报告[EB/OL].
https://www.symantec.com/security-center/threat-report, 2019-04
[4] 沙泓州, 刘庆云, 柳厅文, 周舟, 郭莉, 方滨兴. 恶意网页识别研究综述[J]. 计算机学报, 2016, 39(3): 529-542.
[5] Goodfellow, I.J., Bengio, Y. and Cour-ville, A. (2017) Deep Learning. MIT Press, Cambridge, 1-3.
[6] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., et al. (2014) Generative Adversarial Nets. Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems, Montreal, 8-13 December 2014, 2672-2680.
[7] Prakash, P., Kumar, M., Kompella, R.R., et al. (2010) PhishNet: Predictive Blacklisting to Detect Phishing Attacks. Proceedings of the 29th IEEE International Conference on Computer Communications, San Diego, 15-19 March 2010, 1-5.
[8] Sahoo, D., Liu, C.H. and Hoi, S.C.H. (2017) Malicious URL Detection Using Machine Learning: A Survey.
[9] Moshchuk, A., Bragin, T., Deville, D., et al. (2007) Execution-Based Detection of Malicious Web Content. Proceedings of 16th USENIX Security Symposium, Boston, 3:1-3:16.
[10] Rieck, K., Krueger, T. and Dewald, A. (2010) Efficient Detection and Prevention of Drive-by-Download Attacks. Proceedings of the 26th Annual Computer Security Applications Conference, Austin, 6-10 December 2010, 31-39.
[11] Tobiyama, S., Yamaguchi, Y., Shimada, H., et al. (2016) Malware Detection with Deep Neural Network Using Process Behavior. Proceedings of the 40th Annual Computer Software and Applications Confer-ence, Atlanta, 10-14 June 2016, 577-582. [Google Scholar] [CrossRef
[12] 张洋, 柳厅文, 沙泓州, 时金桥. 基于多元属性特征的恶意域名检测[J]. 计算机应用, 2016, 36(4): 941-944 + 984.
[13] Rumelhart, D.E., Hinton, G.E. and Williams, R.J. (1986) Learning Representations by Back-Propagating Errors. Nature, 323, 533-536. [Google Scholar] [CrossRef
[14] He, D., Chen, W., Wang, L., et al. (2013) A Game-Heoretic Ma-chine Learning Approach for Revenue Maximization in Sponsored Search. In: International Joint Conference on Artifi-cial Intelligence, AAAI Press, Beijing, 206-212.
[15] 王坤峰, 苟超, 段艳杰, 等. 生成式对抗网络GAN的研究进展与展望[J]. 自动化学报, 2017, 43(3): 321-332.
[16] Srivastava, N., Hinton, G., Krizhevsky, A., et al. (2014) Drop-out: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research, 15, 1929-1958.