基于韵母结构的LSTM汉语韵律边界识别
LSTM Recognition Model of Chinese Prosody Boundary Based on Vowel Structure
摘要: 随着语音合成的普及,人们对合成语音的自然度以及准确度的要求日益提高,而韵律边界就对这两个指标起着重要作用。在人们交流中,语句间停顿的部分即为韵律边界。如何提升韵律边界的识别率,仍是当前学术界的重要研究内容。本文在当今已有研究理论的基础上,提出了基于韵母结构的LSTM汉语韵律边界识别方法。该方法首先对语料库进行特征提取,然后利用韵母结构特征对韵母时长进行归一化,最后利用所得特征数据集对LSTM模型进行训练以得到具有较高识别率的韵律边界识别模型。结果表明,将韵母时长更换为归一化时长的模型其识别率高于更换前的模型,其中韵律短语的F值提升了4.9%,其他韵律边界的识别率也得到了一定的改善,韵律边界识别F-Score平均值相对提高了2%,这代表着韵母结构特征对提高模型识别率的有效性。
Abstract: With the popularity of speech synthesis, people’s requirements for the naturalness and accuracy of synthesized speech are increasing, and the prosody boundary plays an important role in these two indicators. In people’s communication, the part of pause between sentences is the boundary of prosody. How to improve the recognition rate of prosodic boundaries is still an important research content in the current academic circles. Based on the existing research theories, this paper proposes an LSTM Chinese prosody boundary recognition method based on the vowel structure. This method first extracts features from the corpus, then uses the structural features of the finals to normalize the duration of the finals, and finally uses the resulting feature data set to train the LSTM model to obtain a prosody boundary recognition model with a higher recognition rate. The results show that the recognition rate of the model that replaces the vowel duration with the normalized duration is higher than that of the model before the replacement. The F value of prosodic phrases is increased by 4.9%, and the recognition rate of other prosodic boundaries has also been improved. The average value of the recognition F-Score is relatively increased by 2%, which represents the effectiveness of the vowel structure characteristics in improving the model recognition rate.
文章引用:魏新享, 吴怡之, 高文明. 基于韵母结构的LSTM汉语韵律边界识别[J]. 计算机科学与应用, 2021, 11(4): 1081-1088. https://doi.org/10.12677/CSA.2021.114111

参考文献

[1] 王洪君. 汉语的韵律词与韵律短语[J]. 中国语文, 2000(6): 525-536+575.
[2] Soto, V., Cooper, E., Rosenberg, A., et al. (2013) Cross-Language Phrase Boundary Detection. 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, 26-31 May 2013, 8460-8464. [Google Scholar] [CrossRef
[3] 林举, 解焱陆, 张劲松, 张微. 基于声调核参数及DNN建模的韵律边界检测研究[J]. 中文信息学报, 2016, 30(6): 35-39+48.
[4] Wightman, C.W., Shattuck-Hufnagel, S., Ostendorf, M., et al. (1992) Segmental Durations in the Vicinity of Prosodic Phrase Boundaries. The Journal of the Acoustical Society of America, 91, 1707-1717. [Google Scholar] [CrossRef] [PubMed]
[5] Beckman, M.E. and Pierrehumbert, J.B. (1986) Intonational Structure in Japanese and English. Phonology, 3, 255-309. [Google Scholar] [CrossRef
[6] 梅晓, 熊子瑜. 普通话韵律结构对声韵母时长影响的分析[J]. 中文信息学报, 2010, 24(4): 96-103.
[7] 曹剑芬. 音段延长的不同类型及其韵律价值[J]. 南京师范大学文学院学报, 2005(4): 160-167.
[8] Wu, F. and Kenstowicz, M. (2015) Duration Reflexes of Syllable Structure in Mandarin. Lingua, 164, 87-99. [Google Scholar] [CrossRef
[9] 王孟杰, 孟子厚. 基于语音参数的普通话韵母区别特征[J]. 声学技术, 2011, 30(1): 88-92.
[10] Hochreiter, S. and Schmidhuber, J. (1997) Long Short-Term Memory. Neural Computation, 9, 1735-1780. [Google Scholar] [CrossRef] [PubMed]
[11] Sak, H., Senior, A. and Beaufays, F. (2014) Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling. Fifteenth Annual Conference of the International Speech Communication Association, Singapore, 14-18 September 2014, 338-342.
[12] 标贝科技. 中文合成语音数据库[EB].
https://www.data-baker.com/open_source.html, 2021-03-19.