|
[1]
|
Edwards, P., Landreth, C., Fiume, E. and Singh, K. (2016) JALI: An Animator-Centric Viseme Model for Expressive Lip Synchronization. ACM Transactions on Graphics, 35, Article No. 127. [Google Scholar] [CrossRef]
|
|
[2]
|
Taylor, S.L., Mahler, M., Theobald, B.-J. and Matthews, I. (2012) Dynamic Units of Visual Speech. Proceedings of the ACM SIGGRAPH/Eurographics Conference on Computer Anima-tion, Lausanne, 29-31 July 2012, 275-284.
|
|
[3]
|
Xu, Y.Y., Feng, A.W., et al. (2013) A Practical and Configurable Lip Sync Method for Games. Proceedings of Motion on Games, Dublin, 6-8 November 2013, 131-140. [Google Scholar] [CrossRef]
|
|
[4]
|
Sako, S., Tokuda, K., Masuko, T., et al. (2000) HMM-Based Text-To-Audio-Visual Speech Synthesis. Sixth International Conference on Spoken Language Processing, Beijing, 16-20 October 2000. [Google Scholar] [CrossRef]
|
|
[5]
|
Zhou, Y., Xu, Z., Landreth, C., et al. (2018) VisemeNet: Au-dio-Driven Animator-Centric Speech Animation. ACM Transactions on Graphics, 37, Article No. 161. [Google Scholar] [CrossRef]
|
|
[6]
|
Karras, T., Aila, T., Laine, S., et al. (2017) Audio-Driven Facial Animation by Joint End-To-End Learning of Pose and Emotion. ACM Transactions on Graphics (TOG), 36, Article No. 94. [Google Scholar] [CrossRef]
|
|
[7]
|
Hochreiter, S. and Schmidhuber, J. (1997) Long Short-Term Memory. Neural Computation, 9, 1735-1780. [Google Scholar] [CrossRef] [PubMed]
|
|
[8]
|
Schuster, M. and Paliwal, K.K. (1997) Bidirectional Recurrent Neural Networks. IEEE Transactions on Signal Processing, 45, 2673-2681. [Google Scholar] [CrossRef]
|
|
[9]
|
Cudeiro, D., Bolkart, T., Laidlaw, C., et al. (2019) Capture, Learning, and Synthesis of 3D Speaking Styles. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni-tion, Long Beach, 15-20 June 2019, 10101-10111. [Google Scholar] [CrossRef]
|
|
[10]
|
Richard, A., Zollhöfer, M., Wen, Y., et al. (2021) MeshTalk: 3d Face Animation from Speech Using Cross-Modality Disentangle-ment. Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11-17 October 2021, 1173-1182. [Google Scholar] [CrossRef]
|
|
[11]
|
Fan, Y., Lin, Z., Saito, J., et al. (2022) Face-Former: Speech-Driven 3d Facial Animation with Transformers. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, 18-24 June 2022, 18770-18780. [Google Scholar] [CrossRef]
|
|
[12]
|
Chen, Q., Ma, Z., Liu, T., et al. (2023) Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation. ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, 04-10 June 2023, 1-5. [Google Scholar] [CrossRef]
|
|
[13]
|
Cuturi, M. and Blondel, M. (2017) Soft-DTW: A Dif-ferentiable Loss Function for Time-Series. International Conference on Machine Learning, Sydney, 6-11 August 2017, 894-903.
|
|
[14]
|
Vaswani, A., Shazeer, N., Parmar, N., et al. (2017) Attention Is All You Need. 31st Conference on Neu-ral Information Processing Systems (NIPS 2017), Long Beach, 4-9 December 2017.
|
|
[15]
|
Baevski, A., Zhou, Y., Mo-hamed, A., et al. (2020) wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. 34th Con-ference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, 6-12 December 2020, 12449-12460.
|
|
[16]
|
Sakoe, H. (1971) A Dynamic-Programming Approach to Continuous Speech Recognition.
https://www.semanticscholar.org/paper/A-Dynamic-Programming-Approach-to-Continuous-Speech-Sakoe-Chiba/2d2eb229c21269ffaa8a85b0961a2bda1116a6c7#citing-papers
|