融合多尺度结构与通道注意力机制的多模态情感识别研究
Multimodal Emotion Recognition Based on Multi-Scale Structures and Channel Attention Mechanism
摘要: 基于现有生理信号情感识别研究中存在的特征表达能力不足、模态干扰严重及情感分析精度较低等问题。对此,本文提出一种基于脑电图(EEG)、心电图(ECG)与皮肤电(EDA)信号融合的多模态情感识别方法。针对生理信号特征表达能力弱的问题,设计了融合标准卷积与1D-Inception结构的多尺度特征提取模块(MSI-Block),在提取丰富特征的同时控制参数复杂度;引入通道交互注意力机制提升关键模态响应,减少冗余干扰;并采用双向长短期记忆网络(BiLSTM)建模融合特征的时序信息。实验在DEAP数据集上验证了该方法的有效性,在Valence、Arousal和Valence-Arousal四分类任务中准确率分别达到90.72%、89.48%和83.62%,显著优于传统单模态与双模态方法,表明所提方法具有良好的情感识别性能与稳定性。
Abstract: To address the limitations in existing physiological signal-based emotion recognition studies—such as insufficient feature representation, severe modality interference, and low emotion classification accuracy—this paper proposes a multimodal emotion recognition method that fuses electroencephalogram (EEG), electrocardiogram (ECG), and electrodermal activity (EDA) signals. To enhance the feature extraction capability for physiological signals, a Multi-Scale Inception Block (MSI-Block) is designed by integrating standard convolution with a 1D-Inception structure, enabling rich feature extraction while controlling model complexity. A channel-wise interactive attention mechanism is introduced to enhance the response of key modalities and suppress redundant interference. Furthermore, a bidirectional long short-term memory (BiLSTM) network is employed to model the temporal dependencies of the fused features. Experiments conducted on the DEAP dataset demonstrate the effectiveness of the proposed approach, achieving classification accuracies of 90.72%, 89.48%, and 83.62% in valence, arousal, and valence-arousal tasks, respectively. The results significantly outperform traditional unimodal and bimodal approaches, indicating that the proposed method provides robust and reliable emotion recognition performance.
文章引用:徐晓婧. 融合多尺度结构与通道注意力机制的多模态情感识别研究[J]. 计算机科学与应用, 2025, 15(8): 21-33. https://doi.org/10.12677/csa.2025.158194

参考文献

[1] 权学良, 曾志刚, 蒋建华, 等. 基于生理信号的情感计算研究综述[J]. 自动化学报, 2021, 47(8): 1769-1784.
[2] 中文信息处理发展报告(2021)第十五章情感计算研究进展、现状及趋势[C]//中国中文信息学会. 中文信息处理发展报告(2021). 2021: 13.
[3] 章蕴晗. 多生理信号驱动的情绪识别关键技术研究[D]: [硕士学位论文]. 武汉: 华中师范大学, 2020.
[4] 焦蕊. 基于深度学习的情绪识别技术研究[D]: [硕士学位论文]. 北京: 中央民族大学, 2022.
[5] Wang, Z. and Wang, Y. (2025) Emotion Recognition Based on Multimodal Physiological Electrical Signals. Frontiers in Neuroscience, 19, Article 1512799. [Google Scholar] [CrossRef] [PubMed]
[6] Keelawat, P., Thammasan, N., Numao, M. and Kijsirikul, B. (2021) A Comparative Study of Window Size and Channel Arrangement on EEG-Emotion Recognition Using Deep CNN. Sensors, 21, Article 1678. [Google Scholar] [CrossRef] [PubMed]
[7] Lawhern, V.J., Solon, A.J., Waytowich, N.R., Gordon, S.M., Hung, C.P. and Lance, B.J. (2018) EEGNet: A Compact Convolutional Neural Network for EEG-Based Brain-Computer Interfaces. Journal of Neural Engineering, 15, Article ID: 056013. [Google Scholar] [CrossRef] [PubMed]
[8] Wang, L., Hao, J. and Zhou, T.H. (2023) ECG Multi-Emotion Recognition Based on Heart Rate Variability Signal Features Mining. Sensors, 23, Article 8636. [Google Scholar] [CrossRef] [PubMed]
[9] Lopez, E., Chiarantano, E., Grassucci, E. and Comminiello, D. (2023) Hypercomplex Multimodal Emotion Recognition from EEG and Peripheral Physiological Signals. 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), Rhodes Island, 4-10 June 2023, 1-5. [Google Scholar] [CrossRef
[10] Kumar, S.P., Selvaraj, J., Krishnakumar, R. and Sahayadhas, A. (2020) Detecting Distraction in Drivers Using Electroencephalogram (EEG) Signals. 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC), Erode, 11-13 March 2020, 635-639. [Google Scholar] [CrossRef
[11] Fan, D., Liu, M., Zhang, X. and Gong, X. (2023) Human Emotion Recognition Based on Galvanic Skin Response Signal Feature Selection and SVM. arXiv: 2307.05383. [Google Scholar] [CrossRef
[12] Wu, W., Chen, X., Wang, Z., et al. (2020) Self-Supervised Representation Learning for Multimodal Physiological Signals. Proceedings of the 28th ACM International Conference on Multimedia, 12-16 October 2020, 2289-2297.
[13] Zhang, J., Wang, Y., Lin, Y., et al. (2021) EEG-Inception: A Deep Learning Framework for EEG-Based Emotion Recognition. Frontiers in Neuroscience, 15, Article 674647.
[14] Salami, M. and Subasi, A. (2022) A Novel Deep Learning Model for EEG-Based Emotion Recognition Using Inception and Causal Convolution. Computers in Biology and Medicine, 140, Article ID: 105045.
[15] Li, X., Chen, H., Zhang, J., et al. (2018) Multimodal Emotion Recognition Using Facial Expression, EEG and Eye Tracking Data. Proceedings of the 2018 International Conference on Multimodal Interaction, Boulder, 16-20 October 2018, 598-602.
[16] Tripathi, S., Acharya, S., Sharma, R., Mittal, S. and Bhattacharya, S. (2017) Using Deep and Convolutional Neural Networks for Accurate Emotion Classification on DEAP Data. Proceedings of the AAAI Conference on Artificial Intelligence, 31, 4746-4752. [Google Scholar] [CrossRef
[17] Du, J., Li, H., Zeng, J., et al. (2024) SCA-Net: A Self-Attention Based Channel-Aware Network for Multimodal Emotion Recognition. IEEE Transactions on Affective Computing, 9, 160926-160937.