基于改进先验信噪比的新型单声道语音增强算法
A New Single Channel Speech Enhancement Algorithm Based on Improved A-Priori SNR
摘要: 针对多数语音增强算法中存在的“音乐噪声残留”问题,提出一种新型先验信噪比估计算法。由于先验信噪比的估计准确度决定语音增强系统的整体性能,而融合耦合因子(CC, Convex-Combination)算法是应用最广的先验信噪比估计算法。虽然其实时性强且失真小,但其抑制音乐噪声能力欠缺。为解决这一缺陷,本文将改进先验信噪比估计中的最大似然估计部分,通过融入平滑参数将后验信噪比递归平滑,代替最大似然估计中的后验信噪比。经仿真实验结果证明,所提出的算法相对于CC算法具有更好的音乐噪声抑制能力。
Abstract: Aiming at the problem of “music noise residue” existing in most speech enhancement algorithms, a new a-priori SNR estimation algorithm is proposed. Since the accuracy of the a-priori SNR estimation determines the overall performance of the speech enhancement system, the Con-vex-Combination (CC) algorithm is the most widely used a-priori SNR estimation algorithm. Although its real-time performance and distortion are small, its ability to suppress music noise is lacking. In order to solve this defect, this paper will improve the part of the maximum likelihood estimation in a-priori SNR estimation, and recursively smooth the a-posteriori signal-to-noise ratio by incorporating smoothing parameters, instead of the a-posteriori signal-to-noise ratio in the maximum likelihood estimation. The simulation results show that the proposed algorithm has better music noise suppression ability than CC algorithm.
文章引用:陈晨, 高颖, 张顺, 韩蕊蕊, 张硕. 基于改进先验信噪比的新型单声道语音增强算法[J]. 电路与系统, 2018, 7(3): 75-83. https://doi.org/10.12677/OJCS.2018.73010

参考文献

[1] 刘伟, 陈晨, 高颖. 一种融合相位信息先验信噪比估计算法的研究[J]. 电声技术, 2017, 41(11/12): 84-87.
[2] Cho, J.W. and Park, H.M. (2016) Independent Vector Analysis Followed by HMM-Based Feature Enhancement for Robust Speech Recognition. Signal Process, 120, 200-208. [Google Scholar] [CrossRef
[3] Boll, S.F. (1979) Suppression of Acoustic Noise in Speech Using Spectral Subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27, 113-120. [Google Scholar] [CrossRef
[4] Ephraim, Y. and Malah, D. (1984) Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator. IEEE Transaction on Acoustic Speech Signal Process, 32, 1109-1121. [Google Scholar] [CrossRef
[5] Ephraim, Y. and Harry, L.V.T. (1995) A Signal Subspace Approach for Speech Enhancement. IEEE Transactions on Speech and Audio Processing, 3, 251-266. [Google Scholar] [CrossRef
[6] 孙海东. 基于新型先验信噪比估计的语音增强算法研究[D]: [硕士学位论文]. 烟台: 烟台大学, 2015.
[7] Plapous, C. and Marro, C. (2006) Improved Signal-to-Noise Ratio Estimation for Speech Enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 14, 2098-2108. [Google Scholar] [CrossRef
[8] Yong, P.C., Nordholm, S. and Dam, H.H. (2013) Optimization and Evaluation of Sigmoid Function with A Priori SNR Estimate for Real-Time Speech Enhancement. Speech Communications, 55, 358-376. [Google Scholar] [CrossRef
[9] Shen, S., Ou, S., Wei, J., et al. (2017) A Priori SNR Estimator Based on a Convex Combination of Two DD Approaches for Speech Enhancement. 2016 IEEE International Conference on Signal and Image Processing, Beijing, 13-15 August 2016, 750-754.
[10] Hasan, T. and Hasan, Md.K. (2010) MMSE Estimator for Speech Enhance-ment Considering the Constructive and Destructive Interference of Noise. IEI Signal Processing, 4, 1-4. [Google Scholar] [CrossRef
[11] 陈国明. 语音增强技术研究[D]: [博士学位论文]. 南京: 东南大学, 2007.
[12] 沈锁金. 语音增强技术中的先验信噪比估计算法研究[D]: [硕士学位论文]. 烟台: 烟台大学, 2017.
[13] Lu, Y. and Loizou, P.C. (2008) A Geometric Approach to Spectral Subtraction. Speech Communication, 50, 453. [Google Scholar] [CrossRef] [PubMed]
[14] Sun, H., Ou, S., Liu, R., et al. (2015) A Variable Momentum Factor Algo-rithm for a Priori SNR Estimation in Speech Enhancement. 2014 7th International Congress on Image and Signal Processing, Dalian, 14-16 October 2014, 888-892.
[15] Taal, C.H., Hendriks, R.C., Heusdens, R., et al. (2011) An Algorithm for Intelligibility Prediction of Time-Frequency Weighted Noisy Speech. IEEE Transactions on Audio Speech & Language Processing, 19, 2125-2136. [Google Scholar] [CrossRef
[16] Pei, C.Y., Nordholm, S. and Hai, H.D. (2013) Optimization and Evaluation of Sigmoid Function with A Priori SNR Estimate for Real-Time Speech Enhancement. Speech Communication, 55, 358-376. [Google Scholar] [CrossRef