基于机器学习与先进transformer模型的情感预测
Emotion Prediction Based on Machine Learning and Advanced Transformer Model
DOI: 10.12677/AAM.2023.123111, PDF,   
作者: 孙 睿:天津商业大学理学院,天津;周艳聪*:天津商业大学信息工程学院,天津
关键词: 朴素贝叶斯Logistic回归BERTDistilBERTRoBERTaNaive Bayes Logistic Regression BERT DistilBERT RoBERTa
摘要: 本文立足于针对文本的情感分析,以Yelp数据集为例进行评估。Yelp评论的评级预测可以通过多种方式进行,如情绪分析和五星评级分类。在本文中,我们将基于评论文本对餐馆的评级进行预测。在分析了原始数据分布之后,首先创建了一个平衡的训练子数据集,后分割数据集、提取特征,同时应用朴素贝叶斯和Logistic回归两种机器学习方法和基于transformer的BERT、DistilBERT和RoBERTa三种深度学习模型进行评估比较。从训练时间和训练效果两个方面给出结果,为读者提供实际的选择依据。
Abstract: Based on the emotional analysis of the text, this paper takes Yelp data set as an example to evaluate it. The rating prediction of Yelp reviews can be made in many ways, such as sentiment analysis and five-star rating classification. In this paper, we will predict the rating of restaurants based on the review text. After analyzing the distribution of the original data, a balanced training sub-data set is first created, then the data set is segmented and features are extracted. At the same time, two ma-chine learning methods, naive Bayes and Logistic regression, and three deep learning models based on transformer, BERT, DistilBERT and RoBERTa, are applied to evaluate and compare. The results are given from two aspects: training time and training effect, which provides practical basis for readers to choose.
文章引用:孙睿, 周艳聪. 基于机器学习与先进transformer模型的情感预测[J]. 应用数学进展, 2023, 12(3): 1090-1099. https://doi.org/10.12677/AAM.2023.123111

参考文献

[1] Liu, S.Q. (2020) Sentiment Analysis of Yelp Reviews: A Comparison of Techniques and Models. ArXiv: 2004.13851.
[2] 刘兵. 情感分析: 挖掘观点、情感和情绪[M]. 北京: 机械工业出版社, 2019: 149-156.
[3] Zhou, Z.Y. and Liu, F.A. (2021) Filter Gate Network Based on Multi-Head Attention for Aspect-Level Sentiment Classification. Neurocomputing, 441, 214-225, [Google Scholar] [CrossRef
[4] Yang, Z.L., Dai, Z.H., Yang, Y.M., Carbonell, J., Salakhutdinov, R.R. and Le, Q.V. (2019) XLNet: Generalized Autoregressive Pretraining for Lan-guage Understanding. ArXiv:1906.08237
[5] Guda, B.P.R., Garimella, A. and Chhaya, N. (2021) EmpathBERT: A Bert-Based Framework for Demographic-Aware Empathy Prediction. ArXiv Preprint ArXiv: 2102.00272.
[6] Yu, B.Y., Zhou, J.X., Zhang, Y. and Cao, Y.N. (2017) Identifying Restaurant Features via Sentiment Analysis on Yelp Re-views. ArXiv: 1709.08698.
[7] Asghar, N. (2016) Yelp Dataset Challenge: Review Rating Prediction. ArXiv: 1605.05362.
[8] Perez, L. (2017) Predicting Yelp Star Reviews Based on Network Structure with Deep Learning. ArXiv: 1712.04350.
[9] Yelp Open Dataset.
https://www.yelp.com/dataset
[10] Cui, Y. (2015) An Evaluation of Yelp Dataset. ArXiv: 1512.06915.
[11] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M. and Duchesnay, E. (2011) Scikit-Learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825-2830.
[12] Hastie, T., Tibshirani, R. and Friedman, J. (2001) The Elements of Statistical Learning. In: Springer Series in Statistics, Springer, New York. [Google Scholar] [CrossRef
[13] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L. and Polosukhin, I. (2017) Attention Is All You Need. ArXiv: 1706.03762.
[14] Devlin, J., Chang, M.W., Lee, K. and Toutanova, K. (2018) BERT: Pre-Training of Deep Bi-directional Transformers for Language Understanding. ArXiv: 1810.04805.
[15] Sanh, V., Debut, L., Chaumond, J. and Wolf, T. (2019) Distilbert, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter. ArXiv: 1910.01108.
[16] Liu, Y.H., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D.Q., Levy, O., Lewis, M., Zettlemoyer, L. and Stoyanov, V. (2019) Roberta: A Robustly Optimized BERT Pretraining Approach. ArXiv: 1907.11692.
[17] Guda, B.P.R., Srivastava, M. and Karkhanis, D. (2022) Sentiment Analysis: Predicting Yelp Scores. ArXiv: 2201.07999.