基于3D高斯溅射的三维人体重建算法研究

doi:10.12677/csa.2026.164119

期刊菜单

基于3D高斯溅射的三维人体重建算法研究
Research on 3D Human Reconstruction Algorithms Based on 3D Gaussian Splatting

DOI: 10.12677/csa.2026.164119, PDF,
作者: 王文龙：浙江师范大学计算机科学与技术学院，浙江金华
关键词: 三维人体重建；3D高斯溅射；SMPL；非刚性形变矫正；3D Human Reconstruction； 3D Gaussian Splatting； SMPL； Non-Rigid Deformation Correction

摘要: 三维人体重建在数字人、动作捕捉与沉浸式交互等应用中具有重要价值，但真实场景下衣物褶皱、软组织位移及快速运动带来的非刚性形变，使得仅依赖参数化人体模型的模板驱动方法难以获得稳定且细节贴合的重建结果。本文提出形变修正的高斯人体模型DCGaussianAvatar，结合3D高斯溅射的高效渲染能力与SMPL结构先验，实现可驱动的人体重建与新视角合成。该方法首先在SMPL网格表面初始化高斯基元，并通过姿态编码器提取全局姿态嵌入；在此基础上，设计姿态条件的非刚性形变矫正模块，以高斯中心位置嵌入与姿态嵌入构成的高级特征为输入，分别对蒙皮权重与外观参数进行残差式更新，从而补偿衣物与软组织等非刚性细节带来的残差误差。在ZJU-MoCap数据集上与多种代表性方法进行对比，结果表明DCGaussianAvatar在PSNR上取得最高值30.93，同时SSIM达到0.964、LPIPS为0.032，并在衣物褶皱与关节区域呈现更稳定的对齐效果与更清晰的细节纹理。

Abstract: 3D human reconstruction is of great importance for applications such as digital humans, motion capture, and immersive interaction. However, non-rigid deformations caused by clothing wrinkles, soft-tissue motion, and fast body movements in real-world scenarios make it difficult for template-driven methods that rely solely on parametric human models to achieve stable reconstructions with well-aligned fine details. To address this issue, we propose DCGaussianAvatar, a deformation-corrected Gaussian avatar model that combines the efficient rendering capability of 3D Gaussian Splatting with the structural prior of SMPL for animatable human reconstruction and novel-view synthesis. Our method initializes Gaussian primitives on the SMPL mesh surface and extracts a global pose embedding via a pose encoder. Building on this, we design a pose-conditioned non-rigid deformation correction module, which takes high-level features formed by concatenating the embedded Gaussian centers and the pose embedding, and performs residual updates for both skinning weights and appearance parameters to compensate for residual errors introduced by non-rigid details such as clothing and soft tissues. Experiments on the ZJU-MoCap dataset against multiple representative baselines show that DCGaussianAvatar achieves the best PSNR of 30.93, along with an SSIM of 0.964 and an LPIPS of 0.032, and delivers more stable alignment and sharper fine-grained details, particularly around clothing wrinkles and articulated joints.

文章引用：王文龙. 基于3D高斯溅射的三维人体重建算法研究[J]. 计算机科学与应用, 2026, 16(4): 169-177. https://doi.org/10.12677/csa.2026.164119

参考文献

[1]	Fieraru, M., Zanfir, M., Oneata, E., Popa, A., Olaru, V. and Sminchisescu, C. (2020) Three-Dimensional Reconstruction of Human Interactions. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 7212-7221. [Google Scholar] [CrossRef]
[2]	Shi, M., Aberman, K., Aristidou, A., Komura, T., Lischinski, D., Cohen-Or, D., et al. (2020) MoTioNet: 3D Human Motion Reconstruction from Monocular Video with Skeleton Consistency. ACM Transactions on Graphics, 40, 1-15. [Google Scholar] [CrossRef]
[3]	Zackariasson, P. and Wilson, T.L. (2012) The Video Game Industry: Formation, Present State, and Future. Routledge.
[4]	Kachach, R., Perez, P., Villegas, A. and Gonzalez-Sosa, E. (2020) Virtual Tour: An Immersive Low Cost Telepresence System. 2020 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), Atlanta, 22-26 March 2020, 504-506. [Google Scholar] [CrossRef]
[5]	Zhao, F., Xie, Z., Kampffmeyer, M., Dong, H., Han, S., Zheng, T., et al. (2021) M3D-VTON: A Monocular-to-3d Virtual Try-On Network. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 13219-13229. [Google Scholar] [CrossRef]
[6]	Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R. and Ng, R. (2021) NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. Communications of the ACM, 65, 99-106. [Google Scholar] [CrossRef]
[7]	Kerbl, B., Kopanas, G., Leimkuehler, T. and Drettakis, G. (2023) 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Transactions on Graphics, 42, 1-14. [Google Scholar] [CrossRef]
[8]	Loper, M., Mahmood, N., Romero, J., Pons-Moll, G. and Black, M.J. (2015) SMPL: A Skinned Multi-Person Linear Model. ACM Transactions on Graphics, 34, 1-16. [Google Scholar] [CrossRef]
[9]	Jiang, B., Hong, Y., Bao, H. and Zhang, J. (2022) SelfRecon: Self Reconstruction Your Digital Avatar from Monocular Video. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 19-23 June 2022, 5605-5615.
[10]	Xu, Z., Peng, S., Geng, C., Mou, L., Yan, Z., Sun, J., et al. (2024) Relightable and Animatable Neural Avatar from Sparse-View Video. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 16-22 June 2024, 990-1000. [Google Scholar] [CrossRef]
[11]	Mihajlovic, M., Zhang, Y., Black, M.J. and Tang, S. (2021) LEAP: Learning Articulated Occupancy of People. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 10456-10466. [Google Scholar] [CrossRef]
[12]	Li, Z., Zheng, Z., Zhang, H., Ji, C. and Liu, Y. (2022) AvatarCap: Animatable Avatar Conditioned Monocular Human Volumetric Capture. In: Lecture Notes in Computer Science, Springer, 322-341. [Google Scholar] [CrossRef]
[13]	Weng, C., Curless, B., Srinivasan, P.P., Barron, J.T. and Kemelmacher-Shlizerman, I. (2022) HumanNeRF: Free-Viewpoint Rendering of Moving People from Monocular Video. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 8-24 June 2022, 16189-16199. [Google Scholar] [CrossRef]
[14]	Chen, J., Zhang, Y., Kang, D., Zhe, X., Bao, L., Jia, X. and Lu, H. (2021) Animatable Neural Radiance Fields from Monocular RGB Videos.
[15]	Pang, H., Zhu, H., Kortylewski, A., Theobalt, C. and Habermann, M. (2024) ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 16-22 June 2024, 1165-1175. [Google Scholar] [CrossRef]
[16]	Hu, L., Zhang, H., Zhang, Y., Zhou, B., Liu, B., Zhang, S., et al. (2024) GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 16-22 June 2024, 634-644. [Google Scholar] [CrossRef]
[17]	Lei, J., Wang, Y., Pavlakos, G., Liu, L. and Daniilidis, K. (2024) GART: Gaussian Articulated Template Models. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 16-22 June 2024, 19876-19887. [Google Scholar] [CrossRef]
[18]	Qian, Z., Wang, S., Mihajlovic, M., Geiger, A. and Tang, S. (2024) 3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 16-22 June 2024, 5020-5030. [Google Scholar] [CrossRef]
[19]	Zheng, C., Zhu, S., Mendieta, M., Yang, T., Chen, C. and Ding, Z. (2021) 3D Human Pose Estimation with Spatial and Temporal Transformers. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 11636-11645. [Google Scholar] [CrossRef]
[20]	Peng, S., Zhang, Y., Xu, Y., Wang, Q., Shuai, Q., Bao, H., et al. (2021) Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 9050-9059. [Google Scholar] [CrossRef]

为你推荐

友情链接