基于稀疏回归深度神经网络的单通道语音增强
Single-Channel Speech Enhancement Based on Sparse Regressive Deep Neural Network
DOI: 10.12677/SEA.2017.61002, PDF, HTML, XML, 下载: 1,905  浏览: 3,740  国家自然科学基金支持
作者: 孙海霞, 李思昆:国防科学技术大学,湖南 长沙
关键词: 语音增强深度神经网络正则化技术网络压缩谱减法Speech Enhancement DNN Regularization Technique Network Compression Spectral Subtraction
摘要: 语音增强可以改进语音质量,抑制、降低噪声干扰,提高信噪比,在手机等语音通信设备中广泛应用。近年来,由于深度神经网络学习的语音增强技术,可有效克服传统神经网络语音消噪算法易陷于局部最优的不足,取得更好的语音消噪效果,成为语音增强技术领域的研究热点。本文针对已有深度神经网络模型泛化能力较弱、存储开销较大等问题,研究提出一种基于稀疏回归深度神经网络的语音增强算法。该算法通过在预训练阶段引入丢弃法(Dropout)和稀疏约束正则化技术改进训练模型保持预训练和调优阶段模型结构一致性,提升模型泛化能力。通过权值共享和权值量化进行网络压缩,降低存储开销。用谱减法进行后处理,有效去除稳态噪声,提高语音质量。仿真实验结果表明,改进算法可达到较高的语音性能评价指标,取得较好的语音增强效果,可满足语音增强处理要求。
Abstract: Speech enhancement is a mean to improve the quality and intelligibility by noise suppression and enhancing the SNR at the same time, which has been widely applied in voice communication equipments. In recent years, Deep Neural Network (DNN) has become a research hot point due to its powerful ability to avoid local optimum, which is superior to the traditional neural network. However, the existed DNN costs storage and has a bad generalization. Now, this document puts forward a sparse regression DNN model to solve the above problems. First, we will take two regularization skills called Dropout and sparsity constraint to strengthen the generalization ability of the model. Obviously, in this way, the model can reach the consistency between the pre-training model and the training model. Then network compression by weights sharing and quantization is taken to reduce storage cost. Next, spectral subtraction is used in post-processing to overcome stationary noise. The result proofs that the improved framework gets a good effect and meets the requirement of the speech processing.
文章引用:孙海霞, 李思昆. 基于稀疏回归深度神经网络的单通道语音增强[J]. 软件工程与应用, 2017, 6(1): 8-19. https://doi.org/10.12677/SEA.2017.61002

参考文献

[1] Le, T.T and Mason, J.S. (1996) Artificial Neural Network for Nonlinear Time-Domain Filtering of Speech. IEEE Proceedings on Vision, Image and Signal Processing, 3, 433-438.
[2] Mohammadina, N., Smaragdis, P. and Leijon, A. (2013) Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization. IEEE Transactions on Audio, Speech, and Language Processing, 21, 2140-2151.
https://doi.org/10.1109/TASL.2013.2270369
[3] 时文华, 张雄伟, 张瑞昕, 韩伟. 深度学习理论及其应用专题讲座(四)[J]. 军事通信技术, 2016, 37(3): 98-104.
[4] Hinton, G.E., Osindero, S. and The, Y.W. (2006) A Fast Learning Algorithm for Deep Belief Nets. Neural Computation, 18, 1527-1554.
https://doi.org/10.1162/neco.2006.18.7.1527
[5] Dahl, G.E., Yu, D., Deng, L., et al. (2012) Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabu- lary Speech Recognition. IEEE Transactions on Audio Speech & Language Processing, 20, 30-42.
https://doi.org/10.1109/TASL.2011.2134090
[6] Cireşan, D., Meier, U., Gambardella, L., et al. (2010) Deep, Big, Simple Neural Nets for Handwritten Digit Recognition. Neural Computation, 22, 3207-3220.
https://doi.org/10.1162/NECO_a_00052
[7] Xu, Y., Du, J., Dai, L.R., et al. (2014) An Experimental Study on Speech Enhancement Based on Deep Neural Networks. IEEE Signal Processing Letters, 21, 65-68.
https://doi.org/10.1109/LSP.2013.2291240
[8] Vu, T.T., Bigot, B. and Chng, E.S. (2016) Combing Non-Negative Matrix Factorization and Deep Neural Network for Speech Enhancement and Automatic Speech Recognition. In: IEEE International Conference on Acoustic Speech and Signal Processing, IEEE Press, Shanghai, 499-503.
[9] Han, S., Mao, H.Z. and Dally, W.J. (2015) Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman coding.
[10] Hinton, G E. (2010) A Practical Guide to Training Restricted Boltzmann Machines. Momentum, 9, 599-619.
[11] Srivastava, N., Hinton, G., Krizhevsky, A., et al. (2014) Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research, 15, 1929-1958.
[12] Nair, V. and Hinton, G.E. (2009) 3D Object Recognition with Deep Belief Nets. Advances in Neural Information Processing Systems 22: Conference on Neural Information Processing Systems 2009, Vancouver, British Columbia, Canada, 7-10 December 2009, 1527-1554.
[13] Phan, K.T., Maul, T.H. and Vu, T.T. (2015) A Parallel Circuit Approach for Improving the Speed and Generalization Properties of Neural Networks. International Conference on Natural Computation, 45, 1-7.
[14] 魏泉水. 基于深度神经网络的语音增强算法研究[D]: [硕士学位论文]. 南京: 南京大学,2016.
[15] Hu, Y. and Loizou, P.C. (2006) Evaluation of Objective Quality Measures for Speech Enhancement. INTERSPEECH 2006-ICSLP, Ninth International Conference on Spoken Language Processing, Pittsburgh, PA, USA, September 2006, 229-238.
https://doi.org/10.1007/11939993