基于BERT-CRF模型的缅甸语韵律单元边界预测
Prosodic Unit Boundary Prediction of Myanmar Based on BERT-CRF Model
DOI: 10.12677/CSA.2021.113051, PDF,  被引量    国家自然科学基金支持
作者: 李培英, 杨 鉴*:云南大学信息学院,云南 昆明
关键词: 缅甸语韵律单元预测BERT预训练模型条件随机场模型语音合成Myanmar Prosodic Unit Prediction BERT Model CRF Model Speech Synthesis
摘要: 近年来,缅甸语语音合成引起了众多学者的关注,然而该技术的性能离推广应用还有一段距离。本文以提升缅甸语语音合成自然度作为目标,研究缅甸语韵律特征,探索通过缅甸语文本自动预测韵律单元边界的方法。本文提出并实现了一种基于BERT预训练模型和条件随机场(CRF)模型相结合的缅甸语韵律词和韵律短语边界预测方法。实验结果表明,采用BERT-CRF模型,韵律词和韵律短语的预测效果均优于CRF、BiLSTM、BiLSTM-CRF以及BERT模型。为了验证该方法的可用性,本文还将本文所提出的方法应用于语音合成前端文本分析与处理中。语音合成实验结果表明,本文所提方法能有效提高缅甸语语音合成的自然度。
Abstract: In recent years, Myanmar speech synthesis has attracted the attention of many scholars, but the performance of this technology is still a long way from popularization and application. In order to improve the naturalness of Myanmar speech synthesis, this paper studies the prosodic features of Myanmar language and explores the method of automatically predicting the boundary of prosodic units through texts. In this paper, a boundary prediction method of prosodic words and phrases in Myanmar language based on BERT pretraining model and Conditional Random Field (CRF) model is proposed and implemented. The experimental results show that the prediction effect of prosodic words and phrases using BERT-CRF model is better than that of CRF, BiLSTM, BiLSTM-CRF and BERT models. In order to verify the availability of this method, the method proposed in this paper is also applied to the front-end text analysis and processing of speech synthesis. The experimental results of speech synthesis show that the proposed method can effectively improve the naturalness of Myanmar speech synthesis.
文章引用:李培英, 杨鉴. 基于BERT-CRF模型的缅甸语韵律单元边界预测[J]. 计算机科学与应用, 2021, 11(3): 505-514. https://doi.org/10.12677/CSA.2021.113051

参考文献

[1] Gu, W., Hirose, K. and Fujisaki, H. (2003) A Method for Automatic Extraction of F0 Contour Generation Process Mod-el Parameters for Mandarin. IEEE Workshop on Automatic Speech Recognition and Understanding, 2003, 682-687.
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1318522
[2] 赵晟, 陶建华, 蔡莲红. 基于规则学习的韵律结构预测[J]. 中文信息学报, 2002, 16(5): 30-37.
[3] 王仁华, 胡郁, 李威, 凌震华. 基于决策树的汉语大语料库合成系统[C]//中国中文信息学会. 全国人机语音通讯学术会议, 深圳, 2001: 307-311.
[4] 熊艳娇. 基于HMM语音识别的韵律标记[J]. 中国新通信, 2015, 17(12): 98-99.
[5] Sun, J.W.,Yang, J., Zhang, J.P. and Yan, Y.H. (2009) Chinese Prosody Structure Prediction Based on Conditional Random Fields. 2009 5th International Con-ference on Natural Computation, Tianjian, 14-16 August 2009, 602-606. [Google Scholar] [CrossRef
[6] 张鹏远, 卢春晖, 王睿敏. 基于预训练语言表示模型的汉语韵律结构预测[J]. 天津大学学报(自然科学与工程技术版), 2020, 53(3): 265-271.
[7] 钟智翔, 尹湘玲. 基础缅甸语[M]. 广州: 世界图书出版广东有限公司, 2012.
[8] Chaw, S.H. and Aye, T. (2017) Myanmar Speech Synthesis System by Using Phoneme Concatenation Method. 2017 International Conference on Signal Processing and Communi-cation, Coimbatore, 28-29 July 2017, 399-404.
[9] 汪大年. 缅甸语汉语比较研究[M]. 北京: 北京大学出版社, 2012.
[10] Devlin, J., Chang, M.W., Lee, K. and Toutanova, K. (2018) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.
https://arxiv.org/abs/1810.04805
[11] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., et al. (2017) Attention Is All You Need. 31st Annual Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 6000-6010.
[12] 田梓函, 李欣. 基于BERT-CRF模型的中文事件检测方法研究[J]. 计算机工程与应用, 2020: 1002-8331. http://kns.cnki.net/kcms/detail/11.2127.TP.20201027.1328.012.html
[13] Hlaing, A.M., Pa, W.P. and Thu, Y.K. (2018) DNN Based Myanmar Speech Synthesis. The 6th International Workshop on Spoken Language Technologies for Un-der-Resourced Languages, Gurugram, 29-31 August 2018, 142-146. [Google Scholar] [CrossRef
[14] Kubichek, R. (1993) Mel-Cepstral Distance Measure for Objective Speech Quality Assessment. Proceedings of IEEE Pacific Rim Conference on Communications Computers and Signal Processing, Victoria, 19-21 May 1993, 125-128. [Google Scholar] [CrossRef
[15] Streijl, R.C., Winkler, S. and Hands, D.S. (2016) Mean Opinion Score (MOS) Revisited: Methods and Applications, Limitations and Alternatives. Multimedia Systems, 22, 213-227. [Google Scholar] [CrossRef