BAG:基于注意力机制融合Bert和GCN的文本分类模型
BAG: Text Classification Based on Attention Mechanism Combining BERT and GCN
DOI: 10.12677/SEA.2023.122023, PDF,    国家自然科学基金支持
作者: 李 想, 汪 伟:上海理工大学机器智能研究院,上海;上海理工大学光电信息与计算机工程学院,上海;马致远*:上海理工大学机器智能研究院,上海;韩士洋:上海理工大学机器智能研究院,上海;上海理工大学机械工程学院,上海
关键词: 深度学习文本分类图卷积网络注意力机制Deep Learning Text Classification Graph Convolution Network Attention Mechanism
摘要: 通过图的方式来建模文本分类任务是近年来研究的热点。现有基于图神经网络的方法虽然取得了一定的性能提升,但缺乏有效利用预训练语言模型获得的文本语义和图结构语义,且建图规模相对较大,由此带来的训练开销导致相关方法难以在低算力平台上使用。针对这些问题,在通过图神经网络构建传导式文本分类模型的过程中,利用注意力机制来融合异构图中的结构语义和预训练语言模型提供的字符级语义,在保留部分模型参数进行训练的基础上,提出了一种改进的文本分类模型BAG。实验结果表明,BAG能在更低显存的机器上进行训练,且在四个数据集上的准确率比其他文本分类模型更高。在对比同样基于图的TextGCN和BertGCN时,最高时分别高出10.82%和3.14%。
Abstract: In recent years, text classification based on graph is the research focus. Although existing work of text classification based on graph neural network have achieved performance improvement, they lack the text semantics and graph semantics obtained by effectively using the pretrained language model. Moreover, the scale of graph is large and the training cost caused by this leads to the difficulty of using relevant methods on low computational power platforms. For this problem, in the process of building a transductive text classification model through graph neural network, attention mechanism is used to fuse the structural semantics in heterogeneous graphs and the token-level semantics provided by the pretrained language model. On the basis of retaining some model parameters for training, an improved text classification model BAG is proposed. The experimental results show that BAG can be trained on a lower memory machine, and accuracy on four datasets is higher than other text classification models. When comparing TextGCN and BertGCN, they are 10.82% and 3.14% higher at the highest.
文章引用:李想, 马致远, 汪伟, 韩士洋. BAG:基于注意力机制融合Bert和GCN的文本分类模型[J]. 软件工程与应用, 2023, 12(2): 230-241. https://doi.org/10.12677/SEA.2023.122023

参考文献

[1] 闫秘. 基于fastText的垃圾邮件过滤算法研究[D]: [硕士学位论文]. 广州: 华南理工大学, 2020.
[2] Li, Z.H., Fan, Y.Y., Jiang, B., et al. (2019) A Survey on Sentiment Analysis and Opinion Mining for Social Multimedia. Multimedia Tools and Applications, 78, 6939-6967. [Google Scholar] [CrossRef
[3] Kpiebaareh, M., Wu, W.P., et al. (2021) A Graph-Based Opinion Mining Approach for Reducing Information Loss and Overload in Product Reviews Analysis. Proceedings of International Conference on Compute and Data Analysis, Sanya, 2-4 February 2021, 143-148. [Google Scholar] [CrossRef
[4] Kalaivani, K.S., Uma, S. and Kanimozhiselvi, C.S. (2020) A Review on Feature Extraction Techniques for Sentiment Classification. Proceedings of International Conference on Computing Methodologies and Communication, Erode, 11-13 March 2020, 679-683. [Google Scholar] [CrossRef
[5] Song, P., Geng, C.Y. and Li, Z.J. (2019) Research on Text Classification Based on Convolutional Neural Network. Proceedings of International Conference on Computer Network, Electronic and Automation, Xi’an, 27-29 September 2019, 229-232. [Google Scholar] [CrossRef
[6] Li, Q., Peng, H., Li, J.X., et al. (2020) A Survey on Text Classification: From Shallow to Deep Learning.
[7] Peters, M.E., Neumann, M., Iyyer, M., et al. (2018) Deep Contextualized Word Representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1, 2227-2237. [Google Scholar] [CrossRef
[8] Yang, Z.L., Dai, Z.H., Yang, Y.M., et al. (2019) XLNet: Generalized Autoregressive Pretraining for Language Understanding. Proceedings of International Conference on Neural Information Processing Systems, Vancouver, 8-14 December 2019, 5753-5763.
[9] Devlin, J., Chang, M.W., Lee, K., et al. (2018) BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. Proceedings of NAACL-HLT 2019, Minneapolis, 2-7 June 2019, 4171-4186.
[10] Huang, L.Z., Ma, D.H., Li, S.J., et al. (2019) Text Level Graph Neural Network for Text Classification. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, November 2019, 3444-3450. [Google Scholar] [CrossRef
[11] 邓朝阳, 仲国强, 王栋. 基于注意力门控图神经网络的文本分类[J]. 计算机科学, 2022, 49(6): 326-334.
[12] Yao, L., Mao, C.S. and Luo, Y. (2019) Graph Convolutional Networks for Text Classification. Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, 29-31 January 2019, 7370-7377. [Google Scholar] [CrossRef
[13] Lin, Y.X., Meng, Y.X., Sun, X.F., et al. (2021) BertGCN: Transductive Text Classification by Combining GCN and BERT. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, August 2021, 1456-1462. [Google Scholar] [CrossRef
[14] Kipf, T.N. and Welling, M. (2016) Semi-Supervised Classification with Graph Convolutional Networks.
[15] Vaswani, A., Shazeer, N., Parmer, N., et al. (2017) Attention Is All You Need. Proceedings of International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 6000-6010.
[16] Sun, X.F., Yang, D.Y., Li, X.Y., et al. (2021) Interpreting Deep Learning Models in Natural Language Processing: A Review.
[17] Deng, X.L., Li, Y.Q., Weng, J., et al. (2019) Feature Selection for Text Classification: A Review. Multimedia Tools and Applications, 78, 3797-3816. [Google Scholar] [CrossRef
[18] Wang, Q., Xu, H.L. and Li, Y.L. (2021) Classification of News Texts Based on Bayes Algorithm. Proceedings of International Conference on Electronic Information Technology and Computer Engineering, Xiamen, 22-24 October 2021, 1288-1291. [Google Scholar] [CrossRef
[19] Yu, Y., Si, X.S., Hu, C.H., et al. (2019) A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures. Neural Computation, 31, 1235-1270. [Google Scholar] [CrossRef] [PubMed]
[20] 闫跃, 霍其润, 李天昊, 等. 融合多重注意力机制的卷积神经网络文本分类设计与实现[J]. 小型微型计算机系统, 2021, 42(2): 362-367.
[21] Kim, Y. (2014) Convolutional Neural Networks for Sentence Classification. Proceedings of Conference on Empirical Methods in Natural Language Processing, Doha, 25-29 October 2014, 1746-1751. [Google Scholar] [CrossRef
[22] Wang, X.Y., Jiang, W.J. and Luo, Z.Y. (2016) Combination of Convolutional and Recurrent Neural Network for Sentiment Analysis of Short Texts. Proceedings of International Conference on Computational Linguistics, Osaka, 11-16 December 2016, 2428-2437.
[23] Yang, Z.C., Yang, D.Y., Dyer, C., et al. (2016) Hierarchical Attention Networks for Document Classification. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, June 2016, 1480-1489. [Google Scholar] [CrossRef
[24] Brown, T.B., Mann, B., Ryder, N., et al. (2020) Language Models Are Few-Shot Learners.
[25] Hu, S.D., Ding, N., Wang, H.D., et al. (2021) Knowledgeable Prompt-Tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Volume 1, 2225-2240. [Google Scholar] [CrossRef
[26] Su, X., Wang, R. and Dai, X.Y. (2022) Contrastive Learning-Enhanced nearest Neighbor Mechanism for Multi-Label Text Classification. Proceedings of Annual Meeting of the Association for Computational Linguistics, Dublin, 22-27 May 2022, 672-679. [Google Scholar] [CrossRef
[27] Gunel, B., Du, J.F., Conneau, A., et al. (2020) Supervised Contrastive Learning for Pre-Trained Language Model Fine-Tuning.
[28] Peng, H., Li, J.X., He, Y., et al. (2018) Large-Scale Hierarchical Text Classification with Recursively Regularized Deep Graph-CNN. Proceedings of World Wide Web Conference, Lyon, 23-27 April 2018, 1063-1072. [Google Scholar] [CrossRef
[29] 杨慧敏. 基于交互孪生网络的复合对话模型[D]: [硕士学位论文]. 南京: 南京信息工程大学, 2020.
[30] 蒋浩泉, 张儒清, 郭嘉丰, 等. 图卷积网络与自注意机制在文本分类任务上的对比分析[J]. 中文信息学报, 2021, 35(12): 84-93.
[31] Minaee, S., Kalchbrenner, N., Cambria, E., et al. (2021) Deep Learning-Based Text Classification: A Comprehensive Review. ACM Computing Surveys, 54, 1-40. [Google Scholar] [CrossRef