基于BERT知识蒸馏的情感分析模型
BERT-Based Knowledge Distillation for Sentiment Analysis Model
摘要: 目前,BERT预训练语言模型在文本分析领域得到普遍应用,在情感分析任务上更是取得了SOTA级别的表现,但是在边缘设上部署BERT模型仍具有挑战性。而一般用于解决情感分析的传统的机器学习模型(SVM,NB和LR)较易部署,但精度不如BERT模型。本文旨在实现对两种不同方法的优势进行融合,训练出一个精度高且易部署的模型并用于解决情感分析任务。先前的工作大多是将BERT模型蒸馏进一个浅层的神经网络结构,这种方法能够减少BERT模型的参数,但依然保留了上百万的参数,难以在边缘设备上部署。本文提出将已经训练好的BERT模型设定为教师模型,将传统机器学习模型(SVM,NB和LR)设定为学生模型,并在输出层面完成知识转移。训练学生模型使用教师模输出的软标签和logits,并证明了学生模型在软标签上进行训练可以简化学生模型的学习过程,同时强调了从教师模型获得的所有文本特征之间的关系是平等的,并用蒸馏后的学生模型在IMDB数据集上进行验证。实验结果表明,蒸馏了BERT预训练语言模型知识的传统机器学习模型在测试数据上的性能得到明显提升,相比较于基线模型精度都提高了1%以上,且参数量维持在BERT模型的1/10水平。
Abstract: In recent years, pretrained language models have achieved remarkable performance in various fields, including computer vision, natural language processing, and multimodal tasks. However, while pretrained language models excel in accuracy across various tasks, they come with high computational costs and long inference times. On the other hand, traditional machine learning models are more easily deployable but often lag behind in accuracy compared to pretrained lan-guage models. This paper aims to combine the strengths of both approaches to create a highly ac-curate and deployable model. Inspired by knowledge distillation techniques (teacher-student models), this study sets a pretrained BERT language model as the teacher model and traditional ma-chine learning models as the student models. It extracts additional soft-label knowledge from the large, high-weight teacher model to train lightweight student models. The goal is to use the distilled student models to tackle sentiment analysis tasks in textual data, with a focus on the benchmark movie review dataset. The main steps include data preprocessing, feature extraction, and knowledge distillation. The research results demonstrate that traditional machine learning models, distilled with knowledge from large pretrained language models, significantly improve performance on test data, with accuracy gains of more than 1% compared to baseline models, and the number of parameters is maintained at 1/10 level of the BERT model.
文章引用:孙杨杰, 常青玲. 基于BERT知识蒸馏的情感分析模型[J]. 计算机科学与应用, 2023, 13(10): 1938-1947. https://doi.org/10.12677/CSA.2023.1310192

参考文献

[1] Devlin, J., Chang, M.W., Lee, K. and Toutanova, K. (2019) BERT: Pretraining of Deep Bidirectional Transformers for Language Understanding. 2019 Conference of the North American Chapter of the Association for Computational Lin-guistics, Vol. 1, 4171-4186.
[2] 刘欢, 张智雄, 王宇飞. BERT模型的主要优化改进方法研究综述[J]. 数据分析与知识发现, 2021, 5(1): 3-15.
[3] 邵仁荣, 刘宇昂, 张伟, 等. 深度学习中知识蒸馏研究综述[J]. 计算机学报, 2022, 45(8): 1638-1673.
[4] Hinton, G., Vinyals, O. and Dean, J. (2015) Distilling the Knowledge in a Neural Net-work. 1-9. http://arxiv.org/abs/1503.02531
[5] Li, H., Ma, Y., Ma, Z. and Zhu, H. (2021) Weibo Text Sentiment Analysis Based on Bert and Deep Learning. Applied Sciences, 11, Article No. 10774. [Google Scholar] [CrossRef
[6] Wei, S., Yu, D. and Lv, C. (2020) A Distilled BERT with Hidden State and Soft Label Learning for Sentiment Classification. Journal of Physics: Conference Series, 1693, Article ID: 012076. [Google Scholar] [CrossRef
[7] Vashisht, G. and Sinha, Y.N. (2021) Sentimental Study of CAA by Location-Based Tweets. International Journal of Information Technology, 13, 1555-1567. [Google Scholar] [CrossRef] [PubMed]
[8] Ali Salmony, M.Y. and Rasool Faridi, A. (2021) Supervised Sentiment Analysis on Amazon Product Reviews: A Survey. 2nd International Conference on Intelligent Engineering and Management (ICIEM), London, 28-30 April 2021, 132-138. [Google Scholar] [CrossRef
[9] Wiegand, M., Balahur, A., Roth, B., Klakow, D. and Montoyo, A. (2010) A Survey on the Role of Negation in Sentiment Analysis. Proceedings of the Workshop on Nega-tion and Speculation in Natural Language Processing, Uppsala, 10 July 2010, 60-68. http://dl.acm.org/citation.cfm?id=1858959.1858970
[10] Ruffy, F. and Chahal, K. (2019) The State of Knowledge Dis-tillation for Classification. 1-8. http://arxiv.org/abs/1912.10850
[11] Gou, J., Yu, B., Maybank, S.J. and Tao, D. (2021) Knowledge Distillation: A Survey. International Journal of Computer Vision, 129, 1789-1819. [Google Scholar] [CrossRef
[12] 曾桢, 王擎宇. 融合BERT中间隐藏层的方面级情感分析模型[J]. 科学技术与工程, 2023, 23(12): 5161-5169.
[13] Jiao, X., et al. (2020) TinyBERT: Distilling BERT for Natural Language Understanding. Findings of the Association for Computational Linguistics: EMNLP, 16-20 November 2020, 4163-4174. [Google Scholar] [CrossRef
[14] Sanh, V., Debut, L., Chaumond, J. and Wolf, T. (2019) DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter. 2-6. http://arxiv.org/abs/1910.01108
[15] Du, C., Sun, H., Wang, J., Qi, Q. and Liao, J. (2020) Adversarial and Do-main-Aware BERT for Cross-Domain Sentiment Analysis. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 5-10 July 2020, 4019-4028. [Google Scholar] [CrossRef
[16] Ryu, M. and Lee, K. (2020) Knowledge Distillation for BERT Unsupervised Domain Adaptation. 1-11. http://arxiv.org/abs/2010.11478
[17] Sun, S., Cheng, Y., Gan, Z. and Liu, J. (2020) Patient Knowledge Distillation for BERT Model Compression. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Pro-cessing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, November 2019, 4323-4332. [Google Scholar] [CrossRef
[18] Song, J. (2019) Distilling Knowledge from User Information for Document Level Sentiment Classification. 2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW), Macao, 8-12 April 2019, 169-176. [Google Scholar] [CrossRef
[19] Qin, Q., Hu, W. and Liu, B. (2020) Using the Past Knowledge to Improve Sentiment Classification. Findings of the Association for Computational Linguistics: EMNLP 2020, November 2020, 1124-1133. [Google Scholar] [CrossRef
[20] Ren, F., Feng, L., Xiao, D., Cai, M. and Cheng, S. (2020) DNet: A Lightweight and Efficient Model for Aspect Based Sentiment Analysis. Expert Systems with Applica-tions, 151, Article ID: 113393. [Google Scholar] [CrossRef
[21] Shuang, K., Yang, Q., Loo, J., Li, R. and Gu, M. (2020) Feature Distillation Network for Aspect-Based Sentiment Analysis. Information Fusion, 61, 13-23. [Google Scholar] [CrossRef
[22] Li, Y. and Li, W. (2021) Data Distillation for Text Classification. Association for Computing Machinery, 1.
[23] Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y. and Potts, C. (2011) Learning Word Vectors for Sentiment Analysis. 49th Annual Meeting of the Association for Computational Lin-guistics: Human Language Technologies, Vol. 1, 142-150.
[24] Kumar, K., Harish, B.S. and Darshan, H.K. (2018) Sentiment Analysis on IMDb Movie Reviews Using Hybrid Feature Extraction Method. International Journal of Inter-active Multimedia and Artificial Intelligence, 5, 109-114. [Google Scholar] [CrossRef