|
[1]
|
Edwards, P., Landreth, C., Fiume, E. and Singh, K. (2016) JALI: An Animator-Centric Viseme Model for Expressive Lip Syn-chronization. ACM Transactions on Graphics, 35, 1-11. [Google Scholar] [CrossRef]
|
|
[2]
|
Xing, J.B., Xia, M.H., Zhang, Y.C., Cun, X.D., Wang, J. and Wong, T.T. (2023) CodeTalker: Speech-Driven 3D Facial Animation with Dis-crete Motion Prior. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 12780-12790. [Google Scholar] [CrossRef]
|
|
[3]
|
Peng, Z.Q., Wu, H.Y., Song, Z.B., Xu, H., Zhu, X.Y., Liu, H.Y., He, J. and Fan, Z.X. (2023) EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation. arXiv preprint arXiv: 2303.11089.
|
|
[4]
|
宋昕洋, 阎志远, 孙沐毅, 等. 说话人生成研究现状与发展趋势[J]. 计算机科学, 2023, 50(8): 68-78.
|
|
[5]
|
Zhou, Y., Xu, Z., Landreth, C., Kalogerakis, E., Maji, S. and Singh, K. (2018) Visemenet: Au-dio-Driven Animator-Centric Speech Animation. ACM Transactions on Graphics, 37, 1-10. [Google Scholar] [CrossRef]
|
|
[6]
|
Richard, A., Zollhofer, M., Wen, Y.D., de la Torre, F. and Sheikh, Y. (2021) MeshTalk: 3D Face Animation from Speech Using Cross-Modality Disentanglement. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 1153-1162. [Google Scholar] [CrossRef]
|
|
[7]
|
Cudeiro, D., Bolkart, T., Laidlaw, C., Ranjan, A. and Black, M.J. (2019) Capture, Learning, and Synthesis of 3D Speaking Styles. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 10093-10103. [Google Scholar] [CrossRef]
|
|
[8]
|
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, and Polosukhin, I. (2017) Attention Is All You Need. Advances in Neural Information Processing Systems, 30, 2-3.
|
|
[9]
|
Li, T., Bolkart, T., Black, M.J., Li, H. and Romero, J. (2017) Learning a Model of Facial Shape and Expression from 4D Scans. ACM Transactions on Graphics, 36, 1-17. [Google Scholar] [CrossRef]
|
|
[10]
|
van den Oord, A., Vinyals, O. and Kavukcuoglu, K. (2017) Neural Discrete Representation Learning. In: I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., Advances in Neural Information Processing Systems. Curran As-sociates, Inc., Newburyport.
|
|
[11]
|
Baevski, A., Zhou, H., Mohamed, A. and Auli, M. (2020) Wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. arXiv:2006.11477.
|
|
[12]
|
Bai, S.J., Kolter, J.Z. and Koltun, V. (2018) An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. arXiv:1803.01271.
|
|
[13]
|
Fan, Y., Lin, Z., Saito, J., Wang, W. and Komura, T. (2022) FaceFormer: Speech-Driven 3D Facial Animation with Transformers. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 18749-18758. [Google Scholar] [CrossRef]
|