由粗到精的高保真单目三维人脸重建
Coarse-to-Fine Monocular 3D Face Reconstruction with High Fidelity
DOI: 10.12677/CSA.2024.144095, PDF,    科研立项经费支持
作者: 景圣恩, 高 添, 陶应诚, 彭梦昊, 侍亚东, 王 冠:合肥工业大学计算机与信息学院,安徽 合肥
关键词: 三维人脸重建三维形变模型自监督学习人脸渲染3D Face Reconstruction 3D Morphable Model Self-Supervised Learning Face Rendering
摘要: 针对现有单目三维人脸重建方法在细节刻画和身份信息保持方面的不足,本文提出了一种由粗到精的三维人脸重建框架。该框架首先利用从二维人脸图片中提取的特征参数生成初始三维人脸模型,并设计多尺度身份特征提取器捕获个性化特征。然后,通过自适应加权策略筛选对重建任务最具贡献的特征信息。在精细重建阶段,本文关注人脸的几何细节重建,将身份和表情编码融入几何细节生成网络中,以生成具有特定身份和表情信息的几何细节。最后,利用可微分渲染器将三维人脸模型渲染为二维人脸图像,进行自监督训练。在CelebA和AFLW2000-3D数据集上的实验结果表明,本文提出的框架能够从单幅图像中重建出更加真实、自然且具有高度个性化特征的三维人脸模型,在细节刻画和身份信息保持方面均优于现有方法,具有广阔的应用前景。
Abstract: Addressing the limitations of existing monocular 3D face reconstruction methods in capturing fine details and preserving identity information, this paper proposes a coarse-to-fine framework for 3D face reconstruction. The framework initially generates a basic 3D face model using feature parame-ters extracted from a 2D facial image and employs a multi-scale identity feature extractor to cap-ture personalized characteristics. Subsequently, an adaptive weighting strategy is utilized to select the most relevant features for the reconstruction task. In the fine reconstruction phase, the focus is on geometric detail reconstruction, integrating identity and expression encodings into a geometric detail generation network to produce detailed geometry specific to the individual's identity and expressions. Finally, a differentiable renderer is employed to convert the 3D face model into a 2D facial image for self-supervised training. Experimental results on the CelebA and AFLW2000-3D datasets demonstrate that the proposed framework can reconstruct more realistic, natural, and highly personalized 3D face models from a single image, outperforming existing methods in terms of detail capture and identity preservation, thus holding promising potential for various applica-tions.
文章引用:景圣恩, 高添, 陶应诚, 彭梦昊, 侍亚东, 王冠. 由粗到精的高保真单目三维人脸重建[J]. 计算机科学与应用, 2024, 14(4): 255-267. https://doi.org/10.12677/CSA.2024.144095

参考文献

[1] Medin, S.C., Egger, B., Cherian, A., et al. (2022) MOST-GAN: 3D Morphable StyleGAN for Disentangled Face Image Manipulation. Proceedings of the 36th AAAI Conference on Artificial Intelligence, Vol. 36, Online, 22 February-1 March 2022, 1962-1971. [Google Scholar] [CrossRef
[2] Gecer, B., Lattas, A., Ploumpis, S., et al. (2020) Synthesizing Coupled 3D Face Modalities by Trunk-Branch Generative Adversarial Networks. Computer Vi-sion—ECCV 2020, Glasgow, 23-28 August 2020, 415-433. [Google Scholar] [CrossRef
[3] Olivier, N., Baert, K., Danieau, F., et al. (2023) Facetunegan: Face Autoencoder for Convolutional Expression Transfer Using Neural Generative Adversarial Networks. Computers & Graphics, 110, 69-85. [Google Scholar] [CrossRef
[4] Dipanda, A. and Woo, S. (2005) Towards a Real-Time 3D Shape Reconstruction Using a Structured Light System. Pattern Recognition, 38, 1632-1650. [Google Scholar] [CrossRef
[5] Lee, H., Song, S. and Jo, S. (2016) 3D Reconstruction Using a Sparse Laser Scanner and a Single Camera for Outdoor Autonomous Vehicle. 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), Rio de Janeiro, 1-4 November 2016, 629-634. [Google Scholar] [CrossRef
[6] Blanz, V. and Vetter, T. (1999) A Morphable Model for the Synthesis of 3D Faces. Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, Los Angeles, 8-13 August 1999, 187-194. [Google Scholar] [CrossRef
[7] Cao, C., Weng, Y., Zhou, S., et al. (2013) Facewarehouse: A 3D Fa-cial Expression Database for Visual Computing. IEEE Transactions on Visualization and Computer Graphics, 20, 413-425. [Google Scholar] [CrossRef
[8] Li, T., Bolkart, T., Black, M.J., et al. (2017) Learning a Model of Facial Shape and Expression from 4D Scans. ACM Transactions on Graphics, 36, Article No. 194. [Google Scholar] [CrossRef
[9] Chen, A., Chen, Z., Zhang, G., et al. (2019) Photo-Realistic Facial Details Synthesis from Single Image. 2019 Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, 27 October-2 November 2019, 9428-9438. [Google Scholar] [CrossRef
[10] Feng, Y., Feng, H., Black, M.J., et al. (2021) Learning an Animatable Detailed 3D Face Model from in-the-Wild Images. ACM Transac-tions on Graphics, 40, Article No. 88. [Google Scholar] [CrossRef
[11] Yang, H., Zhu, H., Wang, Y., et al. (2020) Facescape: A Large-Scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 598-607. [Google Scholar] [CrossRef
[12] Chen, Y., Wu, F., Wang, Z., et al. (2020) Self-Supervised Learning of Detailed 3D Face Reconstruction. IEEE Transactions on Image Processing, 29, 8696-8705. [Google Scholar] [CrossRef
[13] Cao, C., Bradley, D., Zhou, K., et al. (2015) Real-Time High-Fidelity Facial Performance Capture. ACM Transactions on Graphics, 34, Article No. 46. [Google Scholar] [CrossRef
[14] Romdhani, S. and Vetter, T. (2005) Estimating 3D Shape and Texture Using Pixel Intensity, Edges, Specular Highlights, Texture Constraints and a Prior. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, 20-25 June 2005, 986-993. [Google Scholar] [CrossRef
[15] Paysan, P., Knothe, R., Amberg, B., et al. (2009) A 3D Face Model for Pose and Illumination Invariant Face Recognition. 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance, Genova, 2-4 September 2009, 296-301. [Google Scholar] [CrossRef
[16] Lee, Y.J., Lee, S.J., Park, K.R., et al. (2012) Single View-Based 3D Face Reconstruction Robust to Self-Occlusion. EURASIP Journal on Advances in Signal Processing, 2012, Article 176. [Google Scholar] [CrossRef
[17] Daněček, R., Black, M.J. and Bolkart, T. (2022) EMOCA: Emo-tion Driven Monocular Face Capture and Animation. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, 18-24 June 2022, 20279-20290. [Google Scholar] [CrossRef
[18] Deng, Y., Yang, J., Xu, S., et al. (2019) Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, 16-17 June 2019, 285-295. [Google Scholar] [CrossRef
[19] Tewari, A., Zollhofer, M., Kim, H., et al. (2017) MOFA: Mod-el-Based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction. 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), Venice, 22-29 October 2017, 1274-1283. [Google Scholar] [CrossRef
[20] Fan, X., Cheng, S., Huyan, K., et al. (2020) Dual Neural Networks Coupling Data Regression with Explicit Priors for Monocular 3D Face Reconstruction. IEEE Transactions on Multime-dia, 23, 1252-1263. [Google Scholar] [CrossRef
[21] Zhu, W., Wu, H.T., Chen, Z., et al. (2020) ReDA: Reinforced Differentiable Attribute for 3D Face Reconstruction. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 13-19 June 2020, 4957-4966. [Google Scholar] [CrossRef
[22] Tran, L. and Liu, X. (2018) Nonlinear 3D Face Morphable Model. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 7346-7355. [Google Scholar] [CrossRef
[23] Lee, G.H. and Lee, S.W. (2020) Uncertainty-Aware Mesh Decoder for High Fidelity 3D Face Reconstruction. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 13-19 June 2020, 6099-6108. [Google Scholar] [CrossRef
[24] Sela, M., Richardson, E. and Kimmel, R. (2017) Unrestrict-ed Facial Geometry Reconstruction Using Image-to-Image Translation. 2017 IEEE International Conference on Com-puter Vision, Venice, 22-29 October 2017, 1585-1594. [Google Scholar] [CrossRef
[25] Wang, T.C., Liu, M.Y., Zhu, J.Y., et al. (2018) High-Resolution Im-age Synthesis and Semantic Manipulation with Conditional Gans. 2018 IEEE Conference on Computer Vision and Pat-tern Recognition, Salt Lake City, 18-23 June 2018, 8798-8807. [Google Scholar] [CrossRef
[26] Tewari, A., Zollhöfer, M., Garrido, P., et al. (2018) Self-Supervised Multi-Level Face Model Learning for Monocular Reconstruction at over 250 Hz. 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 2549-2559. [Google Scholar] [CrossRef
[27] Bas, A., Huber, P., Smith, W.A.P., et al. (2017) 3D Morphable Models as Spatial Transformer Networks. 2017 IEEE International Conference on Computer Vision Workshops, Venice, 22-29 October 2017, 895-903. [Google Scholar] [CrossRef
[28] Catmull, E. (1998) Computer Display of Curved Surfaces. In: Wolfe, R., Ed., Seminal Graphics: Pioneering Efforts that Shaped the Field, Association for Computing Machinery, New York, 35-41. [Google Scholar] [CrossRef
[29] Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convolu-tional Networks for Biomedical Image Segmentation. Medical Image Computing and Computer-Assisted Interven-tion—MICCAI 2015, Munich, 5-9 October 2015, 234-241. [Google Scholar] [CrossRef
[30] Parkhi, O.M., Vedaldi, A. and Zisserman, A. (2015) Deep Face Recognition. BMVC 2015—Proceedings of the British Machine Vision Conference 2015, Swansea, 7-10 September 2015, 41.1-41.12. [Google Scholar] [CrossRef
[31] Wang, K., Peng, X., Yang, J., et al. (2020) Suppressing Uncertainties for Large-Scale Facial Expression Recognition. 2020 IEEE/CVF Conference on Computer Vision and Pat-tern Recognition, Seattle, 13-19 June 2020, 6896-6905. [Google Scholar] [CrossRef
[32] Yin, X., Tai, Y., Huang, Y., et al. (2020) Fan: Feature Ad-aptation Network for Surveillance Face Recognition and Normalization. Proceedings of the 2020 Asian Conference on Computer Vision, Kyoto, 30 November-4 December 2020, 301-319. [Google Scholar] [CrossRef
[33] Nirkin, Y., Masi, I., Tuan, A.T., et al. (2018) On Face Seg-mentation, Face Swapping, and Face Perception. 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xi’an, 15-19 May 2018, 98-105. [Google Scholar] [CrossRef
[34] Liu, Z., Luo, P., Wang, X., et al. (2015) Deep Learning Face Attrib-utes in the Wild. 2015 IEEE International Conference on Computer Vision, Santiago, 7-13 December 2015, 3730-3738. [Google Scholar] [CrossRef
[35] Zhu, X., Lei, Z., Liu, X., et al. (2016) Face Alignment across Large Poses: A 3D Solution. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 146-155. [Google Scholar] [CrossRef
[36] He, K., Zhang, X., Ren, S., et al. (2016) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef
[37] Ravi, N., Reizenstein, J., Novotny, D., Gordon, T., Lo, W.-Y., John-son, J. and Gkioxari, G. (2020) PyTorch3D.
https://github.com/facebookresearch/pytorch3d
[38] Kingma, D.P. and Ba, J. (2014) Adam: A Method for Stochas-tic Optimization. arXiv Preprint arXiv:1412.6980
[39] McDonagh, J. and Tzimiropoulos, G. (2016) Joint Face Detection and Alignment with a Deformable Hough Transform Model. Computer Vision—ECCV 2016 Workshops, Amsterdam, 8-10 and 15-16 October, 2016, 569-580. [Google Scholar] [CrossRef
[40] Bhagavatula, C., Zhu, C., Luu, K., et al. (2017) Faster than Real-Time Facial Alignment: A 3D Spatial Transformer Network Approach in Unconstrained Poses. 2017 IEEE Interna-tional Conference on Computer Vision, Venice, 22-29 October 2017, 4000-4009. [Google Scholar] [CrossRef
[41] Feng, Y., Wu, F., Shao, X., et al. (2018) Joint 3D Face Reconstruc-tion and Dense Alignment with Position Map Regression Network. Computer Vision—ECCV 2018, Munich, 8-14 Sep-tember 2018, 557-574. [Google Scholar] [CrossRef
[42] Guo, J., Zhu, X., Yang, Y., et al. (2020) Towards Fast, Accu-rate and Stable 3D Dense Face Alignment. Computer Vision—ECCV 2020, Glasgow, 23-28 August 2020, 152-168. [Google Scholar] [CrossRef
[43] Ruan, Z., Zou, C., Wu, L., et al. (2021) SADRNet: Self-Aligned Dual Face Regression Networks for Robust 3D Dense Face Alignment and Reconstruction. IEEE Transac-tions on Image Processing, 30, 5793-5806. [Google Scholar] [CrossRef
[44] Shang, J., Shen, T., Li, S., et al. (2020) Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-View Geometry Consistency. Computer Vision—ECCV 2020, Glasgow, 23-28 August 2020, 53-70. [Google Scholar] [CrossRef
[45] Trần, A.T., Hassner, T., Masi, I., et al. (2018) Extreme 3D Face Reconstruction: Seeing through Occlusions. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 3935-3944. [Google Scholar] [CrossRef