基于语音活动检测的阵列信号测向研究

doi:10.12677/csa.2025.154112

期刊菜单

基于语音活动检测的阵列信号测向研究
Research on Direction of Arrival Based on Voice Activity Detection

DOI: 10.12677/csa.2025.154112, PDF, 科研立项经费支持
作者: 李纪元, 赵乾曜：北京印刷学院信息工程学院，北京；田益民^*, 孙兆永：北京印刷学院基础部，北京
关键词: 语音活动检测；阵列信号测向；MUSIC算法；MFCC；Voice Activity Detection (VAD)； Direction of Arrival (DOA)； MUSIC Arithmetic； MFCC

摘要: 为研究长时信号中对具有特定特征的声音来源方向进行检测的问题，本课题提出一种基于多特征自适应的语音信号活动检测对长时阵列信号进行检测，将结合多子空间拟合(MUSIC)算法与语音活动检测(VAD)技术，提出一种新型的信号处理方法，旨在提高对特征明显且目标具有特定属性的信号源的检测精度和定位准确性。通过语音信号MFCC特征和语音信号能量特征来设置自适应阈值，对特定声源的特征进行语音活动检测，以提高语音活动检测的准确性。再通过检测到的语音信号活动片段进行阵列信号测向，通过MUSIC算法实现对长时信号中不同时段不同来源方向的特定声源进行检测。

Abstract: To investigate the problem of detecting the direction of sound sources with specific features in long-term signals, this project proposes a voice signal activity detection method based on multi feature adaptation for detecting long-term array signals. By combining the Multi Subspace Fitting (MUSIC) algorithm with Voice Activity Detection (VAD) technology, a new signal processing method is proposed to improve the detection and localization accuracy of signal sources with obvious features and specific target attributes. By setting adaptive thresholds based on the MFCC features and energy features of voice signals, voice activity detection can be performed on specific sound source features to improve the accuracy of voice activity detection. Then, the direction of arrival is determined by detecting active voice signal segments, and the MUSIC algorithm is used to detect specific sound sources in different time periods and source directions in long-term signals.

文章引用：李纪元, 田益民, 赵乾曜, 孙兆永. 基于语音活动检测的阵列信号测向研究[J]. 计算机科学与应用, 2025, 15(4): 394-405. https://doi.org/10.12677/csa.2025.154112

参考文献

[1]	伦向敏, 王乃英. 基于特定频率声源的实时定位系统研究[J]. 仪表技术, 2014(8): 32-34, 37.
[2]	刘思伟, 吕海波, 慕德俊. 基于G.729的自适应实时语音活动检测方法研究[J]. 计算机工程与应用, 2007, 43(34): 57-60.
[3]	雷静, 何培宇, 徐自励. 低信噪比下多参数融合的自适应语音端点检测[J]. 信号处理, 2020, 36(8): 1205-1211.
[4]	赵新燕, 王炼红, 彭林哲. 基于自适应倒谱距离的强噪声语音端点检测[J]. 计算机科学, 2015, 42(9): 83-85, 117.
[5]	Bao, X. and Zhu, J. (2012) A Novel Voice Activity Detection Based on Phoneme Recognition Using Statistical Model. EURASIP Journal on Audio, Speech, and Music Processing, 2012, Article No. 1. [Google Scholar] [CrossRef]
[6]	Kang, T.G. and Kim, N.S. (2016) DNN-Based Voice Activity Detection with Multi-Task Learning. IEICE Transactions on Information and Systems, 99, 550-553. [Google Scholar] [CrossRef]
[7]	黄毅伟. 基于分布式传声器网络的声源定位研究[D]: [博士学位论文]. 北京: 中国科学院声学研究所, 2021.
[8]	Catic, J., Dau, T., Buchholz, J. and Gran, F. (2010) The Effect of a Voice Activity Detector on the Speech Enhancement Performance of the Binaural Multichannel Wiener Filter. EURASIP Journal on Audio, Speech, and Music Processing, 2010, Article ID: 840294. [Google Scholar] [CrossRef]
[9]	Zhu, Z., Zhang, L., Pei, K. and Chen, S. (2023) A Robust and Lightweight Voice Activity Detection Algorithm for Speech Enhancement at Low Signal-to-Noise Ratio. Digital Signal Processing, 141, Article ID: 104151. [Google Scholar] [CrossRef]
[10]	Kucuk, A., Ganguly, A., Hao, Y. and Panahi, I.M.S. (2019) Real-Time Convolutional Neural Network-Based Speech Source Localization on Smartphone. IEEE Access, 7, 169969-169978. [Google Scholar] [CrossRef] [PubMed]
[11]	Varzandeh, R., Doclo, S. and Hohmann, V. (2024) Speech-aware Binaural DOA Estimation Utilizing Periodicity and Spatial Features in Convolutional Neural Networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 32, 1198-1213. [Google Scholar] [CrossRef]
[12]	张远驰, 胡进. 一种基于MUSIC算法的宽带信号DOA估计[J]. 电声技术, 2023, 47(10): 97-99.

为你推荐

友情链接