融合DeBERTa模型与图卷积网络的文本分类方法研究
Research on Text Classification Methods by Fusing DeBERTa Model with Graph Convolutional Networks
DOI: 10.12677/airr.2024.134072, PDF,    科研立项经费支持
作者: 刘 琪, 肖克晶*, 曹少中, 张 寒, 姜 丹:北京印刷学院信息工程学院,北京
关键词: 文本分类预训练模型图神经网络Text Classification Pre-Trained Models Graph Neural Network
摘要: 文本分类作为自然语言处理领域的一个核心任务,旨在实现对文本数据的自动化归类,使其对应到预先设定的类别之中。BertGCN模型结合了BERT和GCN两者的优势,从而能够有效地处理文本和图结构数据。然而,该模型在应对复杂的文本分类任务时,仍然存在一定的局限性。BERT使用绝对位置编码来表示每个词在序列中的位置,不能很好地捕捉句子中词语之间的相对关系,同时,BERT模型将词语的内容信息和位置信息结合在一起进行处理,可能导致模型难以区分这两种不同的信息。为了克服这些限制,我们提出了DeGraph-Net模型,通过引入DeBERTa模型,来提升文本分类的效果。DeBERTa使用相对位置编码,更好地表示词语间的相对位置关系。此外,DeBERTa将词语的内容信息和位置信息分开处理,避免了内容信息和位置信息的混淆,提高了模型分类的准确率。实验结果表明,DeGraph-Net模型在三个基准文本分类数据集上均取得了显著的性能提升,验证了该模型在复杂文本分类任务中的有效性。
Abstract: Text classification is a core task in the field of natural language processing, which aims to automatically categorize text data into predefined categories. The BertGCN model combines the advantages of both BERT and GCN, enabling it to effectively handle both text and graph-structured data. However, there are still some limitations when it comes to handling complex text classification tasks. BERT uses absolute position encoding to represent the position of each word in a sequence, which may not effectively capture the relative relationships between words in a sentence. Additionally, by combining content and position information, the BERT model may struggle to differentiate between these two distinct types of information. To overcome these limitations, we propose the DeGraph-Net model. We enhance text classification performance by incorporating the DeBERTa model. DeBERTa uses relative position encoding, which better represents the relative positional relationships between words. Additionally, DeBERTa processes the content information and location information of words separately, preventing the confusion between these two types of data and improving the model’s classification accuracy. Experimental results demonstrate that the DeGraph-Net model achieves significant performance improvements on three benchmark text classification datasets, validating the model’s effectiveness in complex text classification tasks.
文章引用:刘琪, 肖克晶, 曹少中, 张寒, 姜丹. 融合DeBERTa模型与图卷积网络的文本分类方法研究[J]. 人工智能与机器人研究, 2024, 13(4): 715-725. https://doi.org/10.12677/airr.2024.134072

参考文献

[1] Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M. and Gao, J. (2021) Deep Learning-Based Text Classification: A Comprehensive Review. ACM Computing Surveys, 54, 1-40. [Google Scholar] [CrossRef
[2] Zhang, C., Li, Q. and Song, D. (2019) Aspect-Based Sentiment Classification with Aspect-Specific Graph Convolutional Networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, November 2019, 4568-4578. [Google Scholar] [CrossRef
[3] Xiao, S., Liu, Z., Han, W., Zhang, J., Shao, Y., Lian, D., et al. (2022) Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval. Proceedings of the ACM Web Conference 2022, Lyon, 25-29 April 2022, 286-296. [Google Scholar] [CrossRef
[4] Pang, B. and Lee, L. (2008) Opinion Mining and Sentiment Analysis. Foundations and Trends® in Information Retrieval, 2, 1-135. [Google Scholar] [CrossRef
[5] Kim, H., Howland, P., Park, H., et al. (2005) Dimension Reduction in Text Classification with Support Vector Machines. Journal of Machine Learning Research, 6, 37-53.
[6] Fernández, J., Montañés, E., Díaz, I., Ranilla, J. and Combarro, E.F. (2004) Text Categorization by a Machine-Learning-Based Term Selection. In: Galindo, F., Takizawa, M. and Traunmüller, R., Eds., Database and Expert Systems Applications, Springer, Berlin, 253-262. [Google Scholar] [CrossRef
[7] Mladenić, D., Brank, J., Grobelnik, M. and Milic-Frayling, N. (2004) Feature Selection Using Linear Classifier Weights: Interaction with Classification Models. Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, 25-29 July 2004, 234-241. [Google Scholar] [CrossRef
[8] Tang, D., Qin, B. and Liu, T. (2015) Document Modeling with Gated Recurrent Neural Network for Sentiment Classification. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, September 2015, 1422-1432. [Google Scholar] [CrossRef
[9] Lai, S., Xu, L., Liu, K. and Zhao, J. (2015) Recurrent Convolutional Neural Networks for Text Classification. Proceedings of the AAAI Conference on Artificial Intelligence, 29, 2267-2273. [Google Scholar] [CrossRef
[10] Yang, Z., Yang, D., Dyer, C., He, X., Smola, A. and Hovy, E. (2016) Hierarchical Attention Networks for Document Classification. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, June 2016, 1480-1489. [Google Scholar] [CrossRef
[11] Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C. and Yu, P.S. (2021) A Comprehensive Survey on Graph Neural Networks. IEEE Transactions on Neural Networks and Learning Systems, 32, 4-24. [Google Scholar] [CrossRef] [PubMed]
[12] Zhou, J., Cui, G., Hu, S., Zhang, Z., Yang, C., Liu, Z., et al. (2020) Graph Neural Networks: A Review of Methods and Applications. AI Open, 1, 57-81. [Google Scholar] [CrossRef
[13] Yao, L., Mao, C. and Luo, Y. (2019) Graph Convolutional Networks for Text Classification. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 7370-7377. [Google Scholar] [CrossRef
[14] Zhang, Y., Yu, X., Cui, Z., Wu, S., Wen, Z. and Wang, L. (2020) Every Document Owns Its Structure: Inductive Text Classification via Graph Neural Networks. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, July 2020, 334-339. [Google Scholar] [CrossRef
[15] Tayal, K., Rao, N., Agarwal, S., Jia, X., Subbian, K. and Kumar, V. (2020) Regularized Graph Convolutional Networks for Short Text Classification. Proceedings of the 28th International Conference on Computational Linguistics: Industry Track, December 2020, 236-242. [Google Scholar] [CrossRef
[16] Mikolov, T., Chen, K., Corrado, G., et al. (2013) Efficient Estimation of Word Representations in Vector Space.
[17] Vaswani, A., Shazeer, N., Parmar, N., et al. (2017) Attention Is All You Need. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 6000-6010.
[18] Devlin, J., Chang, M.W., Lee, K., et al. (2018) Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding.
[19] Yang, Y. and Cui, X. (2021) Bert-Enhanced Text Graph Neural Network for Classification. Entropy, 23, Article No. 1536. [Google Scholar] [CrossRef] [PubMed]
[20] Lv, S., Dong, J., Wang, C., Wang, X. and Bao, Z. (2024) RB-GAT: A Text Classification Model Based on RoBERTa-BiGRU with Graph Attention Network. Sensors, 24, Article No. 3365. [Google Scholar] [CrossRef] [PubMed]
[21] Liu, Y., Ott, M., Goyal, N., et al. (2019) Roberta: A Robustly Optimized Bert Pretraining Approach.
[22] Zhang, D., Tian, L., Hong, M., Han, F., Ren, Y. and Chen, Y. (2018) Combining Convolution Neural Network and Bidirectional Gated Recurrent Unit for Sentence Semantic Classification. IEEE Access, 6, 73750-73759. [Google Scholar] [CrossRef
[23] Veličković, P., Cucurull, G., Casanova, A., et al. (2017) Graph Attention Networks.
[24] Lin, Y., Meng, Y., Sun, X., Han, Q., Kuang, K., Li, J., et al. (2021) Bertgcn: Transductive Text Classification by Combining GNN and Bert. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, August 2021, 1456-1462. [Google Scholar] [CrossRef
[25] Kipf, T.N. and Welling, M. (2016) Semi-Supervised Classification with Graph Convolutional Networks.
[26] Qiu, X., Sun, T., Xu, Y., Shao, Y., Dai, N. and Huang, X. (2020) Pre-Trained Models for Natural Language Processing: A Survey. Science China Technological Sciences, 63, 1872-1897. [Google Scholar] [CrossRef
[27] Pennington, J., Socher, R. and Manning, C. (2014) Glove: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, October 2014, 1532-1543. [Google Scholar] [CrossRef
[28] Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., et al. (2018) Deep Contextualized Word Representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1, 2227-2237. [Google Scholar] [CrossRef
[29] Graves, A. (2012) Long Short-Term Memory. In: Graves, A., Ed., Supervised Sequence Labelling with Recurrent Neural Networks, Springer, Berlin, 37-45. [Google Scholar] [CrossRef
[30] Howard, J. and Ruder, S. (2018) Universal Language Model Fine-Tuning for Text Classification. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Volume 1, 328-339. [Google Scholar] [CrossRef
[31] Bahdanau, D., Cho, K. and Bengio, Y. (2014) Neural Machine Translation by Jointly Learning to Align and Translate.
[32] Lan, Z., Chen, M., Goodman, S., et al. (2019) Albert: A Lite BERT for Self-Supervised Learning of Language Representations.
[33] He, P., Liu, X., Gao, J., et al. (2020) Deberta: Decoding-Enhanced BERT with Disentangled Attention.
[34] Hamilton, W., Ying, Z. and Leskovec, J. (2017) Inductive Representation Learning on Large Graphs. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 1025-1035.
[35] Lu, Z., Du, P. and Nie, J. (2020) VGCN-BERT: Augmenting BERT with Graph Embedding for Text Classification. In: Jose, J.M., et al., Eds., Advances in Information Retrieval, Springer International Publishing, Berlin, 369-382. [Google Scholar] [CrossRef
[36] Zhang, J., Zhang, H., Xia, C., et al. (2020) Graph-Bert: Only Attention Is Needed for Learning Graph Representations.
[37] Wu, F., Souza, A., Zhang, T., et al. (2019) Simplifying Graph Convolutional Networks. International Conference on Machine Learning PMLR, Long Beach, 9-15 June 2019, 6861-6871.