|
[1]
|
Fisher, C.G. (1968) Confusions among Visually Perceived Consonants. Journal of Speech and Hearing Research, 11, 796-804. [Google Scholar] [CrossRef] [PubMed]
|
|
[2]
|
Parke, F.I. (1972) Computer Generated Animation of Faces. Proceedings of the ACM Annual Conference, 1, 451-457. [Google Scholar] [CrossRef]
|
|
[3]
|
Parke, F.I. and Waters, K. (1996) Computer Facial Animation. A. K. Peters, Ltd., Natick.
|
|
[4]
|
Li, L., Liu, Y. and Zhang, H. (2012) A Survey of Computer Facial Animation Techniques. 2012 International Conference on Computer Science and Electronics Engineering, Hangzhou, 23-25 March 2012, 434-438. [Google Scholar] [CrossRef]
|
|
[5]
|
李代超. 基于伪肌肉向量的三维人脸动画及其驱动研究与实现[D]: [硕士学位论文]. 成都: 电子科技大学, 2011.
|
|
[6]
|
Ekman. P. and Friesen, W.V. (1978) Facial Action Coding System (FACS): A Technique for the Measurement of Facial Actions. Rivista di Psichiatria, 47, 126-138.
|
|
[7]
|
Zhang, M., Chen, Y., Li, L. and Wang, D. (2017) Speaker Recognition with Cough, Laugh and “Wei”. 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Kuala Lumpur, 12-15 December 2017, 497-501. [Google Scholar] [CrossRef]
|
|
[8]
|
Li, P.C., et al. (2018) An Attention Pooling Based Representation Learning Method for Speech Emotion Recognition. Proceedings of INTERSPEECH, Hyderabad, 2-6 September 2018, 3087-3091.
|
|
[9]
|
Cudeiro, D., Bolkart, T., Laidlaw, C., Ranjan, A. and Black, M.J. (2019) Capture, Learning, and Synthesis of 3D Speaking Styles. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 10093-10103. [Google Scholar] [CrossRef]
|
|
[10]
|
Oh, T.-H., et al. (2019) Speech2Face: Learning the Face behind a Voice.
|
|
[11]
|
Fan, Y., Lin, Z., Saito, J., Wang, W. and Komura, T. (2022) FaceFormer: Speech-Driven 3D Facial Animation with Transformers. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 18749-18758. [Google Scholar] [CrossRef]
|
|
[12]
|
Richard, A., Zollhofer, M., Wen, Y., de la Torre, F. and Sheikh, Y. (2021) MeshTalk: 3D Face Animation from Speech Using Cross-Modality Disentanglement. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 1153-1162. [Google Scholar] [CrossRef]
|
|
[13]
|
Xing, J., Xia, M., Zhang, Y., Cun, X., Wang, J. and Wong, T. (2023) CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 12780-12790. [Google Scholar] [CrossRef]
|
|
[14]
|
Zhang, W., Cun, X., Wang, X., Zhang, Y., Shen, X., Guo, Y., et al. (2023) SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 8652-8661. [Google Scholar] [CrossRef]
|
|
[15]
|
Shen, S., Zhao, W., Meng, Z., Li, W., Zhu, Z., Zhou, J., et al. (2023) DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 1982-1991. [Google Scholar] [CrossRef]
|
|
[16]
|
Hsu, W., Bolte, B., Tsai, Y.H., Lakhotia, K., Salakhutdinov, R. and Mohamed, A. (2021) Hubert: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 3451-3460. [Google Scholar] [CrossRef]
|
|
[17]
|
Hua, W., Dai, Z., Liu, H. and Le, Q.V. (2022) Transformer Quality in Linear Time.
|
|
[18]
|
Panayotov, V., Chen, G., Povey, D. and Khudanpur, S. (2015) Librispeech: An ASR Corpus Based on Public Domain Audio Books. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, 19-24 April 2015, 5206-5210. [Google Scholar] [CrossRef]
|
|
[19]
|
Devlin, J., Chang, M.-W., Lee, K. and Toutanova, K. (2019) BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.
|
|
[20]
|
Baevski, A., Zhou, H., Mohamed, A. and Auli, M. (2020) wav2vec2.0: A Framework for Self-Supervised Learning of Speech Representations.
|
|
[21]
|
Li, T., Bolkart, T., Black, M.J., Li, H. and Romero, J. (2017) Learning a Model of Facial Shape and Expression from 4D Scans. ACM Transactions on Graphics, 36, 1-17. [Google Scholar] [CrossRef]
|