基于TextCNN的涉密文本识别
Confidential Text Recognition Based on TextCNN
DOI: 10.12677/AAM.2022.1111813, PDF,   
作者: 张 珂*:成都信息工程大学网络空间安全学院,四川 成都;陈虹瑾:四川传媒学院有声语言艺术学院,四川 成都
关键词: 涉密文本word2vecTextCNNClassified Texts word2vec TextCNN
摘要: 保密工作直接关系到社会稳定、经济增长、国家安全。新时代信息化和网络迅速普及和发展,办公信息化逐渐成为了主流,在带给办公便利的同时也导致了泄密行为的发生。人工筛选涉密文本极为浪费时间,并且可能会出现人为失误。本文利用爬虫技术构建涉密文本数据集,结合word2vec和TextCNN模型在自建数据集上进行训练。实现准确识别出包含涉密信息的文本。经过实验对比测试,相较于传统的卷积神经网络,TextCNN结合word2vec在自建数据集上达成的效果更好。
Abstract: Confidentiality is directly related to social stability, economic growth and national security. With the rapid popularization and development of informatization and network in the new era, office in-formatization has gradually become the mainstream, which brings convenience to office work and also leads to the occurrence of leakage of secrets. Manual screening of classified texts is a waste of time and may result in human error. In this paper, we use crawler technology to build a secret re-lated text dataset, and combine word2vec and TextCNN models to train on the self built dataset to accurately identify the text containing secret related information. Through experimental compari-son test, compared with the traditional convolutional neural network, TextCNN combined with word2vec achieves better results on the self built dataset.
文章引用:张珂, 陈虹瑾. 基于TextCNN的涉密文本识别[J]. 应用数学进展, 2022, 11(11): 7681-7687. https://doi.org/10.12677/AAM.2022.1111813

参考文献

[1] Kim, Y. (2019) Convolutional Neural Networks for Sentence Classification. arXiv preprint arXiv:1408.5882.
[2] Cambria, E., Liu, Q., Decherchi, S., et al. (2022) SenticNet 7: A Commonsense-Based Neuro-symbolic AI Framework for Explainable Sentiment Analysis. Proceedings of LREC 2022.
[3] 曹宇, 李天瑞, 贾真, 殷成凤. BGRU: 中文文本情感分析的新方法[J]. 计算机科学与探索, 2019, 13(6): 973-981.
[4] Chen, Y., Yuan, J., You, Q., et al. (2018) Twitter Sentiment Analysis via Bi-Sense Emoji Embedding and Attention-Based LSTM. Pro-ceedings of the 26th ACM international conference on Multimedia, 117-125.
[5] Rehman, A.U., Malik, A.K., Raza, B., et al. (2019) A Hybrid CNN-LSTM Model for Improving Accuracy of Movie Reviews Sentiment Analysis. Multimedia Tools and Applications, 78, 26597-26613.
[6] Mikolov, T., Chen, K., Corrado, G., et al. (2013) Efficient Estimation of Word Representations in Vector Space. arXiv e-prints.
[7] 李志杰, 耿朝阳, 宋鹏. LSTM-TextCNN联合模型的短文本分类研究[J]. 西安工业大学学报, 2020, 40(3): 6.
[8] 李悦, 汤鲲. 基于TextCNN的政策文本分类[J]. 电子设计工程, 2022(12): 43-47.
[9] 于海. 基于卷积神经网络的非结构化文本敏感信息检测系统的设计与实现[D]: [硕士学位论文]. 北京: 北京邮电大学, 2019.
[10] 刘春磊, 武佳琪, 檀亚宁. 基于TextCNN的用户评论情感极性判别[J]. 电子世界, 2019(3): 2.
[11] 张浩然, 谢云熙, 张艳荣. 基于TextCNN的文本情感分类系统[J]. 哈尔滨商业大学学报(自然科学版), 2022(3): 285-292.