驾驶场景下的多模态情绪识别系统设计与实现
Design and Implementation of a Multimodal Emotion Recognition System Specialized for Driving Scenarios
DOI: 10.12677/sea.2026.153040, PDF,   
作者: 冯建斌, 魏倩楠:辽宁科技大学电子与信息工程学院,辽宁 鞍山
关键词: 情绪识别多模态融合驾驶安全ResNet-18YOLOEmotion Recognition Multimodal Fusion Driving Safety ResNet-18 YOLO
摘要: 在真实驾驶环境中,车内光线、道路噪声、驾驶员头部偏转等因素都会干扰情绪识别结果。单独依靠某一种数据源时,模型判断容易出现不稳定。围绕这一情况,本文设计并实现了一套驾驶场景下的多模态情绪识别系统,将视频表情、静态图像以及语音文本三类信息放入同一识别流程中。系统主要包含三个识别模块:视频流模块以ResNet-18为基础,在残差结构中加入SE注意力机制,用来加强眼部、嘴部等表情区域的响应;图像模块采用MobileNetV3,并配合旋转、裁剪、亮度扰动等数据增强方式,提高模型对不同拍摄条件的适应能力;语音文本模块分别利用Wav2Vec 2.0和BERT提取声学特征与语义特征。实验基于DMED数据集完成,融合模型准确率达到94.2%,比单模态模型提高4.1%。同时,本文使用PyQt5完成可视化界面开发,实现了数据输入、预处理、模型推理和结果展示等功能,整体上能够满足车载场景下的基本实时检测需求。
Abstract: In real driving environments, factors such as in-vehicle lighting, road noise, and driver head-pose changes can interfere with emotion recognition results. When relying on only one type of data source, the model’s judgment is prone to instability. To address this issue, this paper designs and implements a multimodal emotion recognition system for driving scenarios, integrating video-based facial expressions, static images, and speech-text information into a unified recognition process. The system mainly consists of three recognition modules: the video-stream module is based on ResNet-18 and incorporates an SE attention mechanism into the residual structure to enhance the response to facial regions such as the eyes and mouth; the image module adopts MobileNetV3 and uses data augmentation methods such as rotation, cropping, and brightness disturbance to improve the model’s adaptability to different shooting conditions; the speech-text module uses Wav2Vec 2.0 and BERT to extract acoustic and semantic features, respectively. Experiments are conducted on the DMED dataset, and the fusion model achieves an accuracy of 94.2%, which is 4.1 percentage points higher than that of the best single-modal model. In addition, this paper develops a visualization interface using PyQt5, enabling data input, preprocessing, model inference, and result display. Overall, the system can meet the basic real-time detection requirements in vehicle-mounted scenarios.
文章引用:冯建斌, 魏倩楠. 驾驶场景下的多模态情绪识别系统设计与实现[J]. 软件工程与应用, 2026, 15(3): 423-432. https://doi.org/10.12677/sea.2026.153040

参考文献

[1] 王海涌, 田爱爱, 张丹. 基于多特征融合的列车司机疲劳驾驶检测[J/OL]. 计算机应用与软件: 1-10.
https://link.cnki.net/urlid/31.1260.tp.20260205.1351.004, 2026-06-18.
[2] 段函作, 潘溢洲, 寇嘉铭, 等. 基于改进ResNet18模型的驾驶员面部表情识别方法[J]. 传感器与微系统, 2025, 44(6): 29-32+37.
[3] 岑承瑞, 李海侠. 基于MobileNetV3的人脸微表情识别系统研究[J]. 现代信息科技, 2025, 9(24): 77-82.
[4] 曹荣贺, 吴晓龙, 冯畅, 等. 基于Wav2vec2.0与语境情感信息补偿的对话语音情感识别[J]. 信号处理, 2023, 39(4): 698-707.
[5] 侯米潇. 基于判别语义学习的情绪识别方法研究[D]: [博士学位论文]. 哈尔滨: 哈尔滨工业大学, 2024.
[6] 刘勇. 基于多模态生物电信号的情绪识别方法研究[D]: [硕士学位论文]. 长春: 长春理工大学, 2021.
[7] Xiang, G., Yao, S., Deng, H., Wu, X., Wang, X., Xu, Q., et al. (2024) A Multi-Modal Driver Emotion Dataset and Study: Including Facial Expressions and Synchronized Physiological Signals. Engineering Applications of Artificial Intelligence, 130, Article ID: 107772. [Google Scholar] [CrossRef
[8] Hu, C., Gu, S., Yang, M., Han, G., Lai, C.S., Gao, M., et al. (2024) MDEmoNet: A Multimodal Driver Emotion Recognition Network for Smart Cockpit. 2024 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, 6-8 January 2024, 1-6. [Google Scholar] [CrossRef