SEA  >> Vol. 6 No. 1 (February 2017)

    Single-Channel Speech Enhancement Based on Sparse Regressive Deep Neural Network

  • 全文下载: PDF(771KB) HTML   XML   PP.8-19   DOI: 10.12677/SEA.2017.61002  
  • 下载量: 540  浏览量: 894   国家自然科学基金支持


孙海霞,李思昆:国防科学技术大学,湖南 长沙

语音增强深度神经网络正则化技术网络压缩谱减法Speech Enhancement DNN Regularization Technique Network Compression Spectral Subtraction



Speech enhancement is a mean to improve the quality and intelligibility by noise suppression and enhancing the SNR at the same time, which has been widely applied in voice communication equipments. In recent years, Deep Neural Network (DNN) has become a research hot point due to its powerful ability to avoid local optimum, which is superior to the traditional neural network. However, the existed DNN costs storage and has a bad generalization. Now, this document puts forward a sparse regression DNN model to solve the above problems. First, we will take two regularization skills called Dropout and sparsity constraint to strengthen the generalization ability of the model. Obviously, in this way, the model can reach the consistency between the pre-training model and the training model. Then network compression by weights sharing and quantization is taken to reduce storage cost. Next, spectral subtraction is used in post-processing to overcome stationary noise. The result proofs that the improved framework gets a good effect and meets the requirement of the speech processing.

孙海霞, 李思昆. 基于稀疏回归深度神经网络的单通道语音增强[J]. 软件工程与应用, 2017, 6(1): 8-19.


[1] Le, T.T and Mason, J.S. (1996) Artificial Neural Network for Nonlinear Time-Domain Filtering of Speech. IEEE Proceedings on Vision, Image and Signal Processing, 3, 433-438.
[2] Mohammadina, N., Smaragdis, P. and Leijon, A. (2013) Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization. IEEE Transactions on Audio, Speech, and Language Processing, 21, 2140-2151.
[3] 时文华, 张雄伟, 张瑞昕, 韩伟. 深度学习理论及其应用专题讲座(四)[J]. 军事通信技术, 2016, 37(3): 98-104.
[4] Hinton, G.E., Osindero, S. and The, Y.W. (2006) A Fast Learning Algorithm for Deep Belief Nets. Neural Computation, 18, 1527-1554.
[5] Dahl, G.E., Yu, D., Deng, L., et al. (2012) Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabu- lary Speech Recognition. IEEE Transactions on Audio Speech & Language Processing, 20, 30-42.
[6] Cireşan, D., Meier, U., Gambardella, L., et al. (2010) Deep, Big, Simple Neural Nets for Handwritten Digit Recognition. Neural Computation, 22, 3207-3220.
[7] Xu, Y., Du, J., Dai, L.R., et al. (2014) An Experimental Study on Speech Enhancement Based on Deep Neural Networks. IEEE Signal Processing Letters, 21, 65-68.
[8] Vu, T.T., Bigot, B. and Chng, E.S. (2016) Combing Non-Negative Matrix Factorization and Deep Neural Network for Speech Enhancement and Automatic Speech Recognition. In: IEEE International Conference on Acoustic Speech and Signal Processing, IEEE Press, Shanghai, 499-503.
[9] Han, S., Mao, H.Z. and Dally, W.J. (2015) Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman coding.
[10] Hinton, G E. (2010) A Practical Guide to Training Restricted Boltzmann Machines. Momentum, 9, 599-619.
[11] Srivastava, N., Hinton, G., Krizhevsky, A., et al. (2014) Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research, 15, 1929-1958.
[12] Nair, V. and Hinton, G.E. (2009) 3D Object Recognition with Deep Belief Nets. Advances in Neural Information Processing Systems 22: Conference on Neural Information Processing Systems 2009, Vancouver, British Columbia, Canada, 7-10 December 2009, 1527-1554.
[13] Phan, K.T., Maul, T.H. and Vu, T.T. (2015) A Parallel Circuit Approach for Improving the Speed and Generalization Properties of Neural Networks. International Conference on Natural Computation, 45, 1-7.
[14] 魏泉水. 基于深度神经网络的语音增强算法研究[D]: [硕士学位论文]. 南京: 南京大学,2016.
[15] Hu, Y. and Loizou, P.C. (2006) Evaluation of Objective Quality Measures for Speech Enhancement. INTERSPEECH 2006-ICSLP, Ninth International Conference on Spoken Language Processing, Pittsburgh, PA, USA, September 2006, 229-238.