|
[1]
|
Gandhi, A., Adhvaryu, K., Poria, S., Cambria, E. and Hussain, A. (2023) Multimodal Sentiment Analysis: A Systematic Review of History, Datasets, Multimodal Fusion Methods, Applications, Challenges and Future Directions. Information Fusion, 91, 424-444. [Google Scholar] [CrossRef]
|
|
[2]
|
Li, J., Wang, X., Liu, Y. and Zeng, Z. (2024) CFN-ESA: A Cross-Modal Fusion Network with Emotion-Shift Awareness for Dialogue Emotion Recognition. IEEE Transactions on Affective Computing, 15, 1919-1933. [Google Scholar] [CrossRef]
|
|
[3]
|
Pan, B., Hirota, K., Jia, Z. and Dai, Y. (2023) A Review of Multimodal Emotion Recognition from Datasets, Preprocessing, Features, and Fusion Methods. Neurocomputing, 561, Article 126866. [Google Scholar] [CrossRef]
|
|
[4]
|
Yuan, Y., Li, Z. and Zhao, B. (2025) A Survey of Multimodal Learning: Methods, Applications, and Future. ACM Computing Surveys, 57, 1-34. [Google Scholar] [CrossRef]
|
|
[5]
|
Yang, J., Yu, Y., Niu, D., Guo, W. and Xu, Y. (2023) ConFEDE: Contrastive Feature Decomposition for Multimodal Sentiment Analysis. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, 9-14 July 2023, 7617-7630. [Google Scholar] [CrossRef]
|
|
[6]
|
Hu, Z., Wang, L., Lan, Y., Xu, W., Lim, E., Bing, L., et al. (2023) LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 6-10 December 2023, 5254-5276. [Google Scholar] [CrossRef]
|
|
[7]
|
Hyeon, J., Oh, Y., Lee, Y. and Choi, H. (2025) Enhancing Speech Emotion Recognition through Segmental Average Pooling of Self-Supervised Learning Features. 2025 IEEE International Conference on Big Data and Smart Computing (BigComp), Kota Kinabalu, 9-12 February 2025, 191-198. [Google Scholar] [CrossRef]
|
|
[8]
|
Lian, Z., Sun, H., Sun, L., Wen, Z., Zhang, S., Chen, S., et al. (2024) MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition. Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing, Melbourne, 28 October 2024-1 November 2024, 41-48. [Google Scholar] [CrossRef]
|
|
[9]
|
Ma, H., Wang, J., Lin, H., Zhang, B., Zhang, Y. and Xu, B. (2024) A Transformer-Based Model with Self-Distillation for Multimodal Emotion Recognition in Conversations. IEEE Transactions on Multimedia, 26, 776-788. [Google Scholar] [CrossRef]
|
|
[10]
|
Zhao, H., Ju, Y. and Gao, Y. (2024) Bilevel Relational Graph Representation Learning-Based Multimodal Emotion Recognition in Conversation. 2024 IEEE International Conference on Multimedia and Expo (ICME), Niagara Falls, 15-19 July 2024, 1-6. [Google Scholar] [CrossRef]
|
|
[11]
|
Lian, Z., Chen, L., Sun, L., Liu, B. and Tao, J. (2023) GCNet: Graph Completion Network for Incomplete Multimodal Learning in Conversation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45, 8419-8432. [Google Scholar] [CrossRef] [PubMed]
|
|
[12]
|
Zhang, D., Chen, F. and Chen, X. (2023) DualGATs: Dual Graph Attention Networks for Emotion Recognition in Conversations. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Toronto, 9-14 July 2023, 7395-7408. [Google Scholar] [CrossRef]
|
|
[13]
|
Zou, H., Lv, F., Zheng, D., Chng, E.S. and Rajan, D. (2025) Large Language Models Meet Contrastive Learning: Zero-Shot Emotion Recognition across Languages. 2025 IEEE International Conference on Multimedia and Expo (ICME), Nantes, 30 June 2025-4 July 2025, 1-6. [Google Scholar] [CrossRef]
|
|
[14]
|
Wang, L., Yang, J., Wang, Y., Qi, Y., Wang, S. and Li, J. (2024) Integrating Large Language Models (LLMs) and Deep Representations of Emotional Features for the Recognition and Evaluation of Emotions in Spoken English. Applied Sciences, 14, Article 3543. [Google Scholar] [CrossRef]
|
|
[15]
|
Kadiyala, R.M.R. (2024) Cross-Lingual Emotion Detection through Large Language Models. Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis, Bangkok, 15 August 2024, 464-469. [Google Scholar] [CrossRef]
|
|
[16]
|
Devlin, J., Chang, M.W., Lee, K. et al. (2019) BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, 2 June-7 June 2019, 4171-4186.
|
|
[17]
|
Touvron, H., Lavril, T., Izacard, G., et al. (2023) Llama: Open and Efficient Foundation Language Models. arXiv:2302.13971.
|
|
[18]
|
Zhang, Y., Wang, M., Tiwari, P., Li, Q., Wang, B. and Qin, J. (2023) DialogueLLM: Context and Emotion Knowledge-Tuned LLaMA Models for Emotion Recognition in Conversations. arXiv:2310.11374.
|
|
[19]
|
Cheng, Z., Cheng, Z., Hauptmann, A., He, J., Lian, Z., Lin, Y., et al. (2024) Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning. Proceedings of the 38th International Conference on Neural Information Processing Systems, Vancouver, 10-15 December 2024, 110805-110853. [Google Scholar] [CrossRef]
|
|
[20]
|
Aruna Gladys, A., Vetriselvi, V. and Rajasekar, S.K. (2024) Multimodal Emotion Cause Pair Extraction in Conversations Using Knowledge Distillation and Large Language Models. 2024 International Conference on Computational Intelligence and Network Systems (CINS), Dubai, 28-29 November 2024, 1-8. [Google Scholar] [CrossRef]
|
|
[21]
|
Georgiou, E., Katsouros, V., Avrithis, Y. and Potamianos, A. (2025) DeepMLF: Multimodal Language Model with Learnable Tokens for Deep Fusion in Sentiment Analysis. arXiv, arXiv:2504.11082.
|
|
[22]
|
Dutta, S. and Ganapathy, S. (2025) LLM Supervised Pre-Training for Multimodal Emotion Recognition in Conversations. ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, 6-11 April 2025, 1-5. [Google Scholar] [CrossRef]
|
|
[23]
|
Hochreiter, S. and Schmidhuber, J. (1997) Long Short-Term Memory. Neural Computation, 9, 1735-1780. [Google Scholar] [CrossRef] [PubMed]
|
|
[24]
|
Liu, Z., Shen, Y., Lakshminarasimhan, V.B., Liang, P.P., Bagher Zadeh, A. and Morency, L. (2018) Efficient Low-Rank Multimodal Fusion with Modality-Specific Factors. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, 15-20 July 2018, 2247-2256. [Google Scholar] [CrossRef]
|
|
[25]
|
Yu, W., Xu, H., Yuan, Z. and Wu, J. (2021) Learning Modality-Specific Representations with Self-Supervised Multi-Task Learning for Multimodal Sentiment Analysis. Proceedings of the AAAI Conference on Artificial Intelligence, 35, 10790-10797. [Google Scholar] [CrossRef]
|
|
[26]
|
Rahman, W., Hasan, M.K., Lee, S., Bagher Zadeh, A., Mao, C., Morency, L., et al. (2020) Integrating Multimodal Information in Large Pretrained Transformers. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5-10 July 2020, 2359-2371. [Google Scholar] [CrossRef] [PubMed]
|
|
[27]
|
Yang, Y., Dong, X. and Qiang, Y. (2025) MSE-Adapter: A Lightweight Plugin Endowing LLMs with the Capability to Perform Multimodal Sentiment Analysis and Emotion Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 39, 25642-25650. [Google Scholar] [CrossRef]
|
|
[28]
|
Zadeh, A., Chen, M., Poria, S., Cambria, E. and Morency, L. (2017) Tensor Fusion Network for Multimodal Sentiment Analysis. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, 7-11 September 2017, 1103-1114. [Google Scholar] [CrossRef]
|
|
[29]
|
Hu, J., Liu, Y., Zhao, J. and Jin, Q. (2021) MMGCN: Multimodal Fusion via Deep Graph Convolution Network for Emotion Recognition in Conversation. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Online, 1-6 August 2021, 5666-5675. [Google Scholar] [CrossRef]
|
|
[30]
|
Li, J., Wang, X., Lv, G. and Zeng, Z. (2024) GA2MIF: Graph and Attention Based Two-Stage Multi-Source Information Fusion for Conversational Emotion Detection. IEEE Transactions on Affective Computing, 15, 130-143. [Google Scholar] [CrossRef]
|