|
[1]
|
Dutoit, T. (2001) An Introduction to Text-to-Speech Synthesis. Kluwer Academic Publishers, Dordrecht.
|
|
[2]
|
Gonzalvo, X., Tazari, S., Chan, C.A., et al. (2016) Recent Advances in Google Real-Time HMM-Driven Unit Selection Synthesizer. Interspeech 2016, San Francisco, 8-12 September 2016, 2238-2242. [Google Scholar] [CrossRef]
|
|
[3]
|
Zen, H., Agiomyrgiannakis, Y., Egberts, N., et al. (2016) Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices. Inter-speech 2016, San Francisco, 8-12 September 2016, 2273-2277. [Google Scholar] [CrossRef]
|
|
[4]
|
Li, N., Liu, S., Liu, Y., et al. (2018) Close to Human Quality TTS with Transformer.
|
|
[5]
|
王飞华. 汉英语气系统对比研究[D]: [博士学位论文]. 上海: 复旦大学出版社, 2005.
|
|
[6]
|
张亚强. 基于迁移学习和自学习情感表征的情感语音合成[D]: [硕士学位论文]. 北京: 北京邮电大学, 2019.
|
|
[7]
|
Sun, G., Zhang, Y., Weiss, R.J., et al. (2020) Fully-Hierarchical Fine-Grained Prosody Modeling for Inter-pretable Speech Synthesis. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Bar-celona, 4-8 May 2020, 6264-6268. [Google Scholar] [CrossRef]
|
|
[8]
|
Zhang, Y.J., Pan, S., He, L., et al. (2019) Learning Latent Representations for Style Control and Transfer in End-to-End Speech Synthesis. 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, 12-17 May 2019, 6945-6949. [Google Scholar] [CrossRef]
|
|
[9]
|
Kingma, D.P. and Welling, M. (2014) Au-to-Encoding Variational Bayes. 2nd International Conference on Learning Representations, Banff, 14-16 April 2014.
|
|
[10]
|
Wittrock, M.C. (2010) Learning as a Generative Process. Educational Psychologist, 45, 40-45. [Google Scholar] [CrossRef]
|
|
[11]
|
Bishop, C.M. (2006) Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, New York.
|
|
[12]
|
Bengio, Yoshua, Courville, et al. (2013) Repre-sentation Learning: A Review and New Perspectives. IEEE Transactions on Pattern Analysis & Machine Intelligence, 35, 1798-1828. [Google Scholar] [CrossRef]
|
|
[13]
|
Khurana, S., Joty, S.R., Ali, A., et al. (2019) A Factorial Deep Markov Model for Unsupervised Disentangled Representation Learning from Speech. 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, 12-17 May 2019, 6540-6544. [Google Scholar] [CrossRef]
|
|
[14]
|
Wainwright, M.J. and Jordan, M.I. (2008) Graphical Models, Exponential Families, and Variational Inference. Foundations & Trends® in Machine Learning, 1, 1-305. [Google Scholar] [CrossRef]
|
|
[15]
|
Joyce, J.M. (2011) Kullback-Leibler Divergence. In: Lovric, M., Ed., In-ternational Encyclopedia of Statistical Science, Springer, Berlin, 720-722. [Google Scholar] [CrossRef]
|
|
[16]
|
Hodari, Z., Lai, C. and King, S. (2020) Perception of Pro-sodic Variation for Speech Synthesis Using an Unsupervised Discrete Representation of F0. 10th International Confer-ence on Speech Prosody, Tokyo, 25-28 May 2020, 965. [Google Scholar] [CrossRef]
|
|
[17]
|
He, M., Deng, Y. and He, L. (2019) Robust Se-quence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS. Interspeech 2019, Graz, 15-19 September 2019, 1293-1297. [Google Scholar] [CrossRef]
|
|
[18]
|
Xue, S. and Yan, Z. (2017) Improving Latency-Controlled BLSTM Acoustic Models for Online Speech Recognition. IEEE International Conference on Acoustics, Speech and Signal Processing, New Orleans, 5-9 March 2017, 5340-5344. [Google Scholar] [CrossRef]
|
|
[19]
|
Morise, M., Yokomori, F. and Ozawa, K. (2016) WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications. Ice Transactions on Information & Systems, 99, 1877-1884. [Google Scholar] [CrossRef]
|
|
[20]
|
King, S., Crumlish, J., Martin, A. and Wihlborg, L. (2017) The Blizzard Challenge 2018. Proc. Blizzard Challenge Workshop, Hyderabad.
|
|
[21]
|
Tokuda, K., Yoshimura, T., Masuko, T., et al. (2002) Speech Parameter Generation Algorithms for HMM-Based Speech Synthesis. IEEE International Confer-ence on Acoustics, Orlando, 13-17 May 2002, 1315-1318.
|