傣语语音合成中的文本归一化方法

doi:10.12677/CSA.2016.67051

期刊菜单

傣语语音合成中的文本归一化方法
An Approach to Normalization of Dai Text for Speech Synthesis

DOI: 10.12677/CSA.2016.67051, PDF, 国家自然科学基金支持
作者: 伍烛梅, 杨鉴^*, 王展：云南大学信息学院，云南昆明
关键词: 傣语；语音合成；文本分析；归一化；Dai Language； Speech Synthesis； Text Analysis； Normalization

摘要: 本文以开发傣语语音合成系统为目的，重点研究傣语文本中的数字归一化和特殊字符归一化问题。数字和特殊字符都属于傣语文本中的非标准词，文本归一化的主要目的是用标准词表示非标准词的发音。归一化处理过程包括：非标准词识别、歧义判断、消歧处理和非标准词转换为标准词4个步骤。本文采用基于规则和上下文关键词相结合的方法识别非标准词，利用正则表达式判断其歧义类型，根据转换规则对非标准词进行消歧并确定其正确的傣文读音。实验结果表明，本文提出的文本归一化方法的正确率达到了94.6%，可以完全满足傣语文语转换系统前端文本分析的需求，并具有良好的自然语言处理应用价值。

Abstract: With the purpose of developing a Dai speech synthesis system, this paper focuses on the study of Dai numbers and special characters normalization. Both numbers and special characters are the non-standard words in Dai text. The main purpose of the text normalization is to represent the pronunciation of non-standard words with standard words. The normalization process includes non-standard words recognition, ambiguity judgment, disambiguation and non-standard transla-tion. Firstly, the non-standard words are recognized and the ambiguous types of these non-stan- dard words are determined using a method based on rule-based and context-keyword, in this paper. Then, the types of ambiguity are judged on regular expression. Lastly, the correct pronunciation of no-standard words is determined according to the transformation rules. Experimental results show that the correct rate of this normalization is more than 94.6%. This purposed method can fully satisfy the front-end text analysis in Dai text to speech conversion system, and has a good natural language processing application value.

文章引用：伍烛梅, 杨鉴, 王展. 傣语语音合成中的文本归一化方法[J]. 计算机科学与应用, 2016, 6(7): 415-422. https://dx.doi.org/10.12677/CSA.2016.67051

参考文献

[1]	戴红亮, 张公瑾. 西双版纳傣语基础教程[M]. 北京: 中央民族大学出版社, 2012.
[2]	玉康, 张秋生, 岩温龙. 西双版纳傣语基础教程[M]. 昆明: 云南民族出版社, 2006.
[3]	Gao, L., Chen, Q., Li, Y.H., et al. (2010) Several Problems of Text Analysis in Tibetan Speech Synthesis. Journal of Northwest University for Nationalities (Natural Science Edition), 2, 1-7.
[4]	Hopkins, H. and Edmunds, T. (2016) Broadcast System Using Text to Speech Conversion. United States Patent 9263027.
[5]	Haunschild, R. and Bornmann, L. (2016) Normalization of Mendeley Reader Counts for Impact Assessment. Journal of in Formetrics, 10, 62-73. http://dx.doi.org/10.1016/j.joi.2015.11.003 [Google Scholar] [CrossRef]
[6]	Sproat, R., Black, A.W. and Chen S. (2001) Normalization of Non-Standard Words. Computer Speech & Language, 15, 287-333. http://dx.doi.org/10.1006/csla.2001.0169 [Google Scholar] [CrossRef]
[7]	戴红亮. 西双版纳傣语数词层次分析[J]. 民族语文, 2004(4): 22-26.
[8]	邱涛, 王斌, 杨晓春. 利用关键因子过滤的正则表达式匹配算法[J]. 计算机科学与探索, 2016(3): 326-337.

为你推荐

友情链接