|
[1]
|
Akhtar, S., Hussain, F., Raja, F.R., et al. (2020) Improving Mispronunciation Detection of Arabic Words for Non-Native Learners Using Deep Convolutional Neural Network Features. Electronics, 9, 963. [Google Scholar] [CrossRef]
|
|
[2]
|
Franco, H. Neumeyer, L. Ramos, M. and Bratt, H. (1999) Auto-matic Detection of Phone-Level Mispronunciation for Language Learning. Sixth European Conference on Speech Com-munication and Technology, Budapest, 5-9 September 1999, 851-854.
|
|
[3]
|
胡文凭. 基于深层神经网络的口语发音检测与错误分析[D]: [博士学位论文]. 合肥: 中国科学技术大学, 2016.
|
|
[4]
|
Majeed, M.N., Ghazanfar, M.A., et al. (2019) Mispronunciation Detection Using Deep Convolutional Neural Network Features and Transfer Learning Based Model for Arabic Phonemes. IEEE Access, 7, 52589-52608. [Google Scholar] [CrossRef]
|
|
[5]
|
Lo, W.-K., Qian, X.-J., et al. (2009) Implementation of an Extended Recognition Network for Mispronunciation Detection and Diagnosis in Computer-Assisted Pronunciation Training. Speech and Language Technology in Education (SLaTE 2009), 1, 1-4.
|
|
[6]
|
Huang, H., Xu, H., Wang, X., et al. (2015) Maximum F1-Score Discriminative Training Criterion for Automatic Mispronunciation Detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23, 787-797. [Google Scholar] [CrossRef]
|
|
[7]
|
Hinton, G., Deng, L., Yu, D., et al. (2012) Deep Neural Net-works for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Pro-cessing Magazine, 29, 82-97. [Google Scholar] [CrossRef]
|
|
[8]
|
Davis, S. and Mermelstein, P. (1980) Comparison of Parametric Representations for Mono Syllabic Word Recognition in Continuously Spoken Sentences. IEEE Transactions on Acous-tics, Speech, and Signal Processing, 28, 357-366. [Google Scholar] [CrossRef]
|
|
[9]
|
Graves, A. and Schmidhuber, J. (2005) Frame Wise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures. Neural Networks, 18, 602-610. [Google Scholar] [CrossRef] [PubMed]
|
|
[10]
|
Oquab, M., Bottou, L., Laptev, I., et al. (2014) Learning and Transferring Mid-Level Image Representations Using Convolutional Neural Networks. IEEE Conference on Computer Vision & Pattern Recognition, Columbus, 23-28 June 2014, 1717-1724. [Google Scholar] [CrossRef]
|
|
[11]
|
Garofolo, J.S., Lamel, L.F., Fisher, W.M., et al. (1993) TIMIT Acoustic-Phonetic Continuous Speech Corpus. Philadelphia: Linguistic Data Consortium, LDC93S1.
|
|
[12]
|
标贝(北京)科技有限公司. 中文标准女声音库[EB/OL]. https://www.data-baker.com, 2016.
|