|
[1]
|
樊云云. 面向说话人识别的深度学习方法研究[D]: [硕士学位论文]. 南昌: 南昌航空大学, 2019.
|
|
[2]
|
Luck, J.E. (1969) Auto-matic Speaker Verification Using Cepstral Measurements. Journal of the Acoustical Society of America, 46, 1026-1032. [Google Scholar] [CrossRef] [PubMed]
|
|
[3]
|
Atal, B.S. (1976) Automatic Recognition of Speakers from Their Voices. Proceedings of the IEEE, 64, 460-475. [Google Scholar] [CrossRef]
|
|
[4]
|
Davis, S. and Mermelstein, P. (1980) Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences. IEEE Transactions on Signal Processing, 28, 357-366. [Google Scholar] [CrossRef]
|
|
[5]
|
Sakoe, H. and Chiba, S. (1978) Dynamic Programming Algorithm Optimi-zation for Spoken Word Recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 26, 43-49. [Google Scholar] [CrossRef]
|
|
[6]
|
Matsui, T. and Furui, S. (1994) Comparison of Text-Independent Speaker Recognition Methods Using VQ-Distortion and Discrete/Continuous HMM’s. IEEE Transactions on Speech and Audio Processing, 2, 456-459. [Google Scholar] [CrossRef]
|
|
[7]
|
Kenny, P. (2005) Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms.
|
|
[8]
|
Lei, Y., Scheffer, N., Ferrer, L. and McLaren, M. (2014) A Novel Scheme for Speaker Recognition Using a Pho-netically-Aware Deep Neural Network. 2014 IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, 4-9 May 2014, 1695-1699. [Google Scholar] [CrossRef]
|
|
[9]
|
Deng, J., Dong, W., Socher, R., et al. (2009) ImageNet: A Large-Scale Hierarchical Image Database. IEEE Conference on Computer Vision and Pattern Recognition, Miami, 20-25 June 2009, 248-255. [Google Scholar] [CrossRef]
|
|
[10]
|
刘华平, 李昕, 徐柏龄, 姜宁. 语音信号端点检测方法综述及展望[J]. 计算机应用研究, 2008(8): 2278-2283.
|
|
[11]
|
胡航. 现代语音信号处理[M]. 北京: 电子工业出版社, 2014: 74.
|
|
[12]
|
Hochreiter, S. and Schmidhuber, J. (1997) Long Short-Term Memory. Neural Computation, 9, 1735-1780. [Google Scholar] [CrossRef] [PubMed]
|
|
[13]
|
Heigold, G., Moreno, I., Bengio, S. and Shazeer, N. (2016) End-to-End Text-Dependent Speaker Verification. 2016 IEEE International Conference on in Acoustics, Speech and Signal Processing, Shanghai, 20-25 March 2016, 5115-5119. [Google Scholar] [CrossRef]
|
|
[14]
|
Schroff, F., Kalenichenko, D. and Philbin, J. (2015) Facenet: A Unified Embedding for Face Recognition and Clustering. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 7-12 June 2015, 815-823. [Google Scholar] [CrossRef]
|
|
[15]
|
Prabhavalkar, R., Alvarez, R., Parada, C., Nakkiran, P. and Sainath, T.N. (2015) Automatic Gain Control and Multi-Style Training for Robust Small-Footprint Keyword Spotting with Deep Neural Networks. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, South Brisbane, 19-24 April 2015, 4704-4708. [Google Scholar] [CrossRef]
|
|
[16]
|
Sak, H., Senior, A. and Beaufays, F. (2014) Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition.
https://arxiv.org/abs/1402.1128
|
|
[17]
|
Pascanu, R., Mikolov, T. and Bengio, Y. (2012) On the Difficulty of Training Recurrent Neural Networks.
https://arxiv.org/abs/1211.5063
|