与文本无关的单训练样本特征点提取研究
The Research of Text-Independent Feature Extraction Based on Single Training Sample
DOI: 10.12677/CSA.2016.66047, PDF, HTML, XML, 下载: 1,807  浏览: 3,633 
作者: 郭建敏 :陕西师范大学物理学与信息技术学院,陕西 西安
关键词: 特征提取线性预测编码Mel频率倒谱系数局部归一化倒谱系数小波包变换Feature Extraction Linear Predictive Coding Cepstral Mel-Frequency Cepstral Coefficients LNCC WPT
摘要: 现有的说话人识别是基于语音的线性预测编码(LPCC)、Mel频率倒谱系数(MFCC)、局部归一化倒谱系数和小波包变换等特征,这些特征对环境噪声都比较敏感。针对上述问题,本文提出了一种与文本无关的单训练样本的特征提取方法。该方法提取的语音特征能够充分反映说话人的基本发声特性,可以很好的将不同的说话者区分开。本文列出了以上四种特征提取方法在但语音训练样本上对于不同说话者的识别效果,也将其与本文的方法进行了比较。对英文与汉语语音数据库的仿真实验表明,该特征提取方法可以实现单训练样本下的说话人识别中对于特征的提取,而且在单样本识别中会有相对好的效果。
Abstract: The existing speaker identification are based on Linear Predictive Coding Cepstral (LPCC) coeffi-cients, Mel-Frequency Cepstral Coefficients (MFCC), local normalized cepstral coefficients (LNCC) and wavelet packet transform (WPT) method; these features are sensitive to noisy and environmental sounds. This paper describes a novel robust text-independent feature extraction method using single training sample. In the proposed method, the features can reflect a person’s basic phonation characteristic and distinguish different speakers. This paper introduces the four methods in single training sample and compares them with the proposed method. Experimental results on speech databases in English and Chinese demonstrate that the proposed approach can implement feature extraction in speaker identification based on single training sample, and yields a better performance in single training sample.
文章引用:郭建敏. 与文本无关的单训练样本特征点提取研究[J]. 计算机科学与应用, 2016, 6(6): 384-392. http://dx.doi.org/10.12677/CSA.2016.66047

参考文献

[1] Pohjalainen, J. and Räsänen, O. (2015) Feature Selection Methods and Their Combinations in High-Dimensional Classification of Speaker Likability, Intelligibility and Personality Traits. Computer Speech and Language, 29, 145-171.
[2] Kinnunen, T. and Li. H.Z. (2010) An Overview of Text-Independent Speaker Recognition: From Features to Supervectors. Speech Communication, 52, 12-40.
http://dx.doi.org/10.1016/j.specom.2009.08.009
[3] Vijayasenan, D. and Valente, F. (2012) Multistream Speaker Diarization of Meetings Recordings beyond MFCC and TDOA Features. Speech Communication, 54, 55-67.
http://dx.doi.org/10.1016/j.specom.2011.07.001
[4] 王彪. 基于LPCC参数的语音识别系统[J]. 电子设计工程, 2012, 20(7).
[5] 许昊, 张二华. 基于改进C0复杂度和MFCC相似度的端点检测[J]. 现代电子技术, 2015, 38(10).
[6] Madikeri, S. (2012) Effect of Feature Warping and Decorrelation on Mel Filter bank Slope for Speaker Recognition, IEEE, 978-1-4673.
[7] R. Shantha Selva Kumari, S. Selva Nidhyananthan and Anand. G. (2012) Fused Mel Feature Sets Based Text-Inde- pendent Speaker Identification Using Gaussian Mixture Model. Procedia Engineering, 30, 319-326.
http://dx.doi.org/10.1016/j.proeng.2012.01.867
[8] Ai, O.C. and Hariharan, M. (2012) Classification of Speech Dysfluencies with MFCC and LPCC Features. Expert Systems with Applications, 39, 2157-2165.
http://dx.doi.org/10.1016/j.eswa.2011.07.065
[9] El-Henawy, I.M. and Khedr, W.I. (2014) Recognition of Phonetic Arabic Figures via Wavelet Based Mel Frequency Cepstrum Using HMMs. HBRC Journal, 10, 49-54.
[10] Poblete, V. and Espic, F. (2015) A Perceptually-Motivated Low-Complexity Instantaneous Linear Channel Normalization Technique Applied to Speaker Verification. Computer Speech and Language, 31, 1-27.
http://dx.doi.org/10.1016/j.csl.2014.10.006
[11] Turner, C. and Joseph, A. (2015) A Wavelet Packet and Mel-Frequency Cepstral Coefficients-Based Feature Extraction Method for Speaker Identification. Procedia Computer Science, 61, 416-421.
http://dx.doi.org/10.1016/j.procs.2015.09.177