说话人识别中基于粒子群优化的GMM训练方法
Gaussian Mixture Model Training Method Based on Particle Swarm Optimizer for Speaker Recognition
DOI: 10.12677/SEA.2013.21001, PDF, HTML, 下载: 3,454  浏览: 11,737  科研立项经费支持
作者: 薛丽萍, 姚应龙, 王志强, 周 虹:深圳大学计算机与软件学院,深圳
关键词: 说话人识别高斯混合模型粒子群优化 Speaker Recognition; Particle Swarm Optimization (PSO); Gaussian Mixture Model (GMM)
摘要:

针对高斯混合模型(Gaussian Mixture Model, GMM)参数最优估计问题,常用的最大期望(Expectation- Maximization, EM)算法对初值敏感,在实际训练中极易得到局部最优参数,本文提出了一种GMM参数优化的新方法。将EM算法融入到粒子群优化(Particle Swarm Optimization, PSO)训练过程,形成了一种新的混合算法,利用PSO的全局探索和EM算法的局部深度搜索的混合策略,粒子在每次迭代中执行PSO速度位置更新和标准EM算法的混合更新操作,在训练语音矢量空间搜索最优高斯混合模型参数。从而避免传统EM算法陷入局部最优的缺点。说话人辨认实验表明,与EM算法相比,本文方法可以得到更优的模型参数,能有效提高系统的识别率。

Abstract: Expectation-Maximization (EM) algorithm is usually used to estimate parameters of Gaussian mixture model. Due to the hill-climbing characteristic of EM, any arbitrary estimation of the initial model parameters will usually lead to a sub-optimal model in practice. To resolve this problem, a hybrid training method based on Particle Swarm Optimi- zation (PSO) is proposed. It utilizes the global searching capability of PSO and combines the effectiveness of EM. The particles perform basic operations of PSO (velocity updating and position updating) and EM algorithm, which can ex- plore the training speech space to move toward the global optimum. The dependence of the final model parameters on the selection of the initial model parameters is also reduced. Experimental results have showed that this method can obtain more optimized GMM parameters and has better capability than EM in speaker recognition.

文章引用:薛丽萍, 姚应龙, 王志强, 周虹. 说话人识别中基于粒子群优化的GMM训练方法[J]. 软件工程与应用, 2013, 2(1): 1-5. http://dx.doi.org/10.12677/SEA.2013.21001

参考文献

[1] D. A. Reynolds, R. C. Rose. Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 1995, 3(1): 72- 83.
[2] D. A. Reynolds. Speaker identification and verification using Gaussian mixture speaker models. Speech Communication, 1995, 17(1): 91-108.
[3] Q. Y. Hong, S. Kwong. A genetic classification method for speaker recognition. Engineering Applications of Artificial Intel-ligence, 2005, 18(1): 13-19.
[4] 林琳, 王树勋. 基于自适应小生境混合遗传算法的说话人识别[J]. 电子学报, 2007, 35(1): 8-12.
[5] 王金明, 张雄伟. 一种模糊高斯混合说话人识别模型[J]. 解放军理工大学学报(自然科学版), 2006, 7(3): 214-219.
[6] J. Kennedy, R. Eberhart. Particle swarm optimization. Proceed- ings of the IEEE International Conference on Neural Networks 1995, Piscataway: IEEE Press, 1995: 1942-1948.
[7] Y. Shi, R. C. Eberhart. A modified particle swarm optimizer. IEEE International Conference on Evolutionary Computation Proceedings, Piscataway: IEEE, 1998: 69-73.
[8] J. S. Garofolo, L. F. Lamel. TIMIT acoustic-phonetic continuous speech corpus, 2012. http://www.ldc.upenn.edu/Catalog