基于ViLT的社交媒体领域图文情感分析方法
Image-Text Sentiment Analysis Method in Social Media Domain Based on ViLT
DOI: 10.12677/ORF.2023.136722, PDF,   
作者: 杨 靖:上海工程技术大学电子电气工程学院,上海
关键词: 跨模态情感分析注意力机制特征融合Cross-Modal Sentiment Analysis Attention Mechanism Feature Fusion
摘要: 现有的图文情感分析方法更多地集中于图文信息的特征提取方面,较少关注不同模态之间的特征对齐,针对这一问题提出了一种基于ViLT (Vision-and-Language Transformer)的社交媒体领域图文情感分析方法。结合社交媒体文本长度较短、语法不规范等特点,选用BERTweet作为文本编码器,利用ViLT模型将图片切片投影的方法提取图像特征。将文本特征与图像特征进行拼接,送入同一个Transformer模块,得到基于图文多模态分析的情感结果。并充分挖掘文本与图像自身的特征得出两个基于单模态的情感分析结果,最后对三种情感分析结果使用加权融合策略确定最终的情感极性。该方法在公开数据集上进行了实验,验证了本文情感分类方法的有效性。
Abstract: The existing image-text sentiment analysis methods focus more on feature extraction of image and text information with less attention to feature alignment between different modalities. Therefore, this paper proposes an image-text sentiment analysis method in the social media domain based on Vision-and-Language Transformer (ViLT). Combining the features of short length and irregular syntax of social media texts, BERTweet is chosen as the text encoder and image features are extracted by slicing and projecting images using ViLT model. The text features and image features are stitched together and sent to the same Transformer module to get the sentiment results based on the multimodal analysis of the graphical text. And the features of text and image themselves are fully exploited to derive two unimodal-based sentiment analysis results. Finally, the final sentiment polarity is determined using a weighted fusion strategy for the three sentiment analysis results. The method is experimented on a public dataset to verify the effectiveness of the sentiment classification method in this dissertation.
文章引用:杨靖. 基于ViLT的社交媒体领域图文情感分析方法[J]. 运筹与模糊学, 2023, 13(6): 7346-7358. https://doi.org/10.12677/ORF.2023.136722

参考文献

[1] Chen, T., SalahEldeen, H., He, X., et al. (2015) VELDA: Relating an Image Tweet’s Text and Images. Proceedings of the AAAI Conference on Artificial Intelligence, 29, 30-36. [Google Scholar] [CrossRef
[2] Kim, W., Son, B. and Kim, I. (2021) ViLT: Vision-and-Language Transformer without Convolution or Region Supervision. International Conference on Machine Learning, 18-24 July 2021, 5583-5594.
[3] Xu, N., Mao, W. and Chen, G. (2018) A Co-Memory Network for Multimodal Sentiment Analysis. Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, Ann Arbor, 8-12 July 2018, 929-932. [Google Scholar] [CrossRef
[4] Zadeh, A., Liang, P., Poria, S., et al. (2018) Multi-Attention Recurrent Network for Human Communication Comprehension. Proceedings of the AAAI Conference on Artificial Intelligence, 32, 5642-5649. [Google Scholar] [CrossRef
[5] Cai, G. and Xia, B. (2015) Convolutional Neural Networks for Multimedia Sentiment Analysis. 4th CCF Conference, NLPCC 2015, Nanchang, 9-13 October 2015, 159-167. [Google Scholar] [CrossRef
[6] Huang, F., Zhang, X., Zhao, Z., et al. (2019) Image-Text Sentiment Analysis via Deep Multimodal Attentive Fusion. Knowledge-Based Systems, 167, 26-37. [Google Scholar] [CrossRef
[7] You, Q., Cao, L., Jin, H., et al. (2016) Robust Visu-al-Textual Sentiment Analysis: When Attention Meets Tree-Structured Recursive Neural Networks. Proceedings of the ACM International Conference on Multimedia, Amsterdam, 15-19 October 2016, 1008-1017. [Google Scholar] [CrossRef
[8] 凌海彬, 缪裕青, 张万桢, 等. 多特征融合的图文微博情感分析[J]. 计算机应用研究, 2020, 37(7): 1935-1939, 1951.
[9] 蔡宇扬, 蒙祖强. 基于模态信息交互的多模态情感分析[J]. 计算机应用研究, 2023, 40(9): 2603-2608.
[10] Xu, N., Mao, W. and Chen, G. (2019) Multi-Interactive Memory Network for Aspect Based Multimodal Sentiment Analysis. Proceedings of the AAAI Conference on Arti-ficial Intelligence, 33, 371-378. [Google Scholar] [CrossRef
[11] Yu, J. and Jiang, J. (2019) Adapting BERT for Tar-get-Oriented Multimodal Sentiment Classification. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, Macao, 10-16 August 2019, 5408-5414. [Google Scholar] [CrossRef
[12] Khan, Z. and Fu, Y. (2021) Exploiting BERT for Multimodal Target Sentiment Classification through Input Space Translation. Proceedings of the 29th ACM International Con-ference on Multimedia, 20-24 October 2021, 3034-3042. [Google Scholar] [CrossRef
[13] Devlin, J., Chang, M., Lee, K., et al. (2019) BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, 2-7 June 2019, 4171-4186.
[14] Nguyen, D., Vu, T. and Nguyen, A. (2020) BERTweet: A Pre-Trained Language Model for English Tweets. Proceedings of the 2020 Conference on Empirical Methods in Natural Lan-guage Processing: System Demonstrations, October 2020, 9-14. [Google Scholar] [CrossRef
[15] Yang, X., Feng, S., Wang, D., et al. (2020) Image-Text Multimodal Emotion Classification via Multi-View Attentional Network. IEEE Transactions on Multimedia, 23, 4014-4026. [Google Scholar] [CrossRef
[16] Zhang, Q., et al. (2018) Adaptive Co-Attention Network for Named Entity Recognition in Tweets. Proceedings of the Association for the Advance of Artificial In-telligence, 32, 5674-5681. [Google Scholar] [CrossRef
[17] Thelwall, M., Buckley, K., Paltoglou, G., et al. (2010) Sentiment Strength Detection in Short Informal Text. Journal of the Association for Information Science and Technology, 61, 2544-2558. [Google Scholar] [CrossRef
[18] Cao, D., Ji, R., Lin, D., et al. (2016) A Cross-Media Public Sentiment Analysis System for Microblog. Multimedia Systems, 22, 479-486. [Google Scholar] [CrossRef
[19] 郭可心, 张宇翔. 基于多层次空间注意力的图文评论情感分析方法[J]. 计算机应用, 2021, 41(10): 2835-2841.
[20] Ma, D., Li, S., Zhang, X., et al. (2017) Interactive At-tention Networks for Aspect-Level Sentiment Classification. Proceedings of the International Joint Conference on Artificial Intelligence, Melbourne, 19-25 August 2017, 4068-4074. [Google Scholar] [CrossRef
[21] Chen, P., Sun, Z., Bing, L., et al. (2017) Recurrent Attention Network on Memory for Aspect Sentiment Analysis. Proceedings of the Conference on Empirical Methods in Natural Language Processing, Copenhagen, September 2017, 452-461. [Google Scholar] [CrossRef
[22] Fan, F., Feng, Y. and Zhao, D. (2018) Multi-Grained Attention Network for Aspect-Level Sentiment Classification. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, October-November 2018, 3433-3442. [Google Scholar] [CrossRef