基于交叉注意力融合与模糊C均值聚类的多模态抑郁识别
The Multimodal Depression Recognition Based on Cross-Attention Fusion and Fuzzy C-Means Clustering
摘要: 抑郁症的早期识别对有效干预至关重要,捕获深层次的音频特征和具有长距离依赖关系的文本特征并进行特征融合提升模态耦合能力是当前主要挑战。基于深度学习和多模态数据建立端到端的抑郁症识别模型:采用VGGish-NetVLAD-GRU模型提取音频深层时序特征;RoBERTa-BiLSTM模型捕捉文本长程语义依赖;通过交叉注意力融合实现语音–文本特征的动态权重分配与跨模态语义对齐;引入模糊C均值聚类算法(Fuzzy C-means, FCM),基于概率隶属度对情感相近的样本进行软划分,实现抑郁症分类。实验结果表明,在EATD-Corpus和CMDC中文数据集上该模型准确率分别达97.0%和94.0%,F1值分别达97.0%和94.0%。在EATD-Corpus数据集上设计对照实验,交叉注意力缺失准确率下降13%,FCM缺失准确率降低7%。
Abstract: Early identification of depression is crucial for effective intervention, with current primary challenges being the extraction of deep audio features and long-range dependent text features, along with enhancing modal coupling capability through feature fusion. An end-to-end depression recognition model was established based on deep learning and multimodal data: the VGGish-NetVLAD-GRU model was employed to extract deep temporal audio features; the RoB-ERTa-BiLSTM model captured long-range semantic dependencies in text; dynamic weight allocation and cross-modal semantic alignment of speech-text features were achieved through parametric cross-attention fusion; the Fuzzy C-means (FCM) clustering algorithm was introduced to perform soft partitioning of samples with similar emotional characteristics based on probabilistic membership, thereby enabling depression classification. Experimental results demonstrated that the model achieved accuracies of 97.0% and 94.0% on the EATD-Corpus and CMDC Chinese datasets, respectively, with corresponding F1 scores of 97.0% and 94.0%. Ablation studies on the EATD-Corpus dataset showed that the absence of cross-attention led to a 13% decrease in accuracy, while the absence of FCM resulted in a 7% reduction in accuracy.
文章引用:王亚腾, 张金珠, 王姝童. 基于交叉注意力融合与模糊C均值聚类的多模态抑郁识别[J]. 计算机科学与应用, 2026, 16(2): 366-380. https://doi.org/10.12677/csa.2026.162066

参考文献

[1] 熊俊. 如何辨别抑郁症的表现[J]. 特别健康, 2019(21): 46.
[2] Pandey, A. and Vishwakarma, D.K. (2024) Progress, Achievements, and Challenges in Multimodal Sentiment Analysis Using Deep Learning: A Survey. Applied Soft Computing, 152, Article 111206. [Google Scholar] [CrossRef
[3] Shen, Y., Yang, H. and Lin, L. (2022) Automatic Depression Detection: An Emotional Audio-Textual Corpus and a Gru/Bilstm-Based Model. ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 23-27 May 2022, 6247-6251. [Google Scholar] [CrossRef
[4] 张亚洲, 和玉, 戎璐, 等. 基于上下文知识增强型Transformer网络的抑郁检测[J]. 计算机工程, 2024, 50(8): 75-85.
[5] 赵小明, 谌自强, 张石清. 基于跨模态特征重构与解耦网络的多模态抑郁症检测方法[J]. 计算机应用研究, 2025, 42(1): 236-241.
[6] Chen, Z., Wang, D., Lou, L., Zhang, S., Zhao, X., Jiang, S., et al. (2025) Text-Guided Multimodal Depression Detection via Cross-Modal Feature Reconstruction and Decomposition. Information Fusion, 117, Article 102861. [Google Scholar] [CrossRef
[7] Li, S., Xiao, Y. and Hu, S. (2025) A Depression Detection Method Based on Multi-Modal Feature Fusion Using Cross-attention. 2025 8th International Conference on Advanced Algorithms and Control Engineering (ICAACE), Shanghai, 21-23 March 2025, 1825-1831. [Google Scholar] [CrossRef
[8] Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T. and Sivic, J. (2016) NetVLAD: CNN Architecture for Weakly Supervised Place Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 5297-5307. [Google Scholar] [CrossRef
[9] Rovetta, S., Mnasri, Z., Masulli, F., et al. (2020) Emotion Recognition from Speech: An Unsupervised Learning Approach. International Journal of Computational Intelligence Systems, 14, 23-35. [Google Scholar] [CrossRef
[10] Zou, B., Han, J., Wang, Y., Liu, R., Zhao, S., Feng, L., et al. (2023) Semi-Structural Interview-Based Chinese Multimodal Depression Corpus towards Automatic Preliminary Screening of Depressive Disorders. IEEE Transactions on Affective Computing, 14, 2823-2838. [Google Scholar] [CrossRef
[11] Wang, Y., Chen, S., Liu, J., et al. (2025) Unveiling Sex Difference in Factors Associated with Suicide Attempt among Chinese Adolescents with Depression: A Machine Learning-Based Study. Journal of Mental Health, 34, 409-419.