基于多通道Mamba自适应高斯溅射网络的高精度单视图3D重建
Multi-Channel Mamba Adaptive Gaussian Splatting Network for High-Precision Single-View 3D Reconstruction
摘要: 因缺乏多视角信息,现有单视图三维重建方法普遍难以兼顾建模精度与泛化能力。为此,本文提出一种多通道Mamba自适应高斯溅射网络(MMAGS)来实现高精度重建。该方法在网络结构上引入多通道并行Vision Mamba模块,通过通道划分与并行驱动提升遮挡区域的几何还原能力与跨视角结构信息的稳健提取。为增强纹理细节与图像边缘保真度,进一步提出深度–颜色双梯度感知的自适应三维高斯滤波策略,依据局部几何与纹理变化动态调整高斯协方差。所提出的方法应用于ShapeNet-SRN与CO3D两类数据集。实验结果表明,所提出的MMAGS方法展现出比现有最先进方法更高的重建精度和重建效果,具备良好的跨类别泛化能力与真实场景鲁棒性。
Abstract: Single-view 3D reconstruction faces significant challenges due to the lack of multi-view constraints, often leading to a compromise between modeling accuracy and cross-category generalization. We propose MMAGS, a Multi-channel Mamba Adaptive Gaussian Splatting Network designed for high-precision single-view 3D reconstruction. This framework incorporates a multi-branch Vision-Mamba backbone that segments intermediate features into parallel channels, facilitating robust structural reasoning and improved geometric recovery in occluded and texture-limited areas. To enhance detail preservation further, we introduce a depth-color dual-gradient adaptive 3D Gaussian filter that dynamically adjusts Gaussian covariance according to local geometric and photometric variations, thereby significantly enhancing edge sharpness and texture fidelity. We evaluate MMAGS on the ShapeNet-SRN and CO3D benchmarks, where it demonstrates state-of-the-art reconstruction accuracy and perceptual quality, exhibiting strong cross-category generalization and robustness across both synthetic and real-world datasets.
文章引用:周三琪. 基于多通道Mamba自适应高斯溅射网络的高精度单视图3D重建[J]. 建模与仿真, 2025, 14(9): 215-231. https://doi.org/10.12677/mos.2025.149598

参考文献

[1] Belkaid, M., Alaoui, E.A.A., Berrajaa, A., Akkad, N.E. and Merras, M. (2024) Deep Learning-Based Solution for 3D Reconstruction from Single RGB Images. 2024 International Conference on Circuit, Systems and Communication (ICCSC), Fes, 28-29 June 2024, 1-6. [Google Scholar] [CrossRef
[2] Fanani, A.Z. and Syarif, A.M. (2023) Historical Building 3D Reconstruction for a Virtual Reality-Based Documentation. International Journal of Advanced Computer Science and Applications, 14, 811-818. [Google Scholar] [CrossRef
[3] Pérez Nava, F., Sánchez Berriel, I., Pérez Morera, J., Martín Dorta, N., Meier, C. and Hernández Rodríguez, J. (2023) From Maps to 3D Models: Reconstructing the Urban Landscape of San Cristóbal De La Laguna in the 16th Century. Applied Sciences, 13, Article No. 4293. [Google Scholar] [CrossRef
[4] Nakao, M. (2023) Medical Image Synthesis and Statistical Reconstruction Methods. Advanced Biomedical Engineering, 12, 21-27. [Google Scholar] [CrossRef
[5] Jun, W., Son, M., Yoo, J. and Lee, S. (2023) Optimal Configuration of Multi-Task Learning for Autonomous Driving. Sensors, 23, Article No. 9729. [Google Scholar] [CrossRef] [PubMed]
[6] Clotet, E. and Palacín, J. (2023) SLAMICP Library: Accelerating Obstacle Detection in Mobile Robot Navigation via Outlier Monitoring Following ICP Localization. Sensors, 23, Article No. 6841. [Google Scholar] [CrossRef] [PubMed]
[7] Pütz, S. (2023) Navigation Control and Path Planning for Autonomous Mobile Robots. KI-Künstliche Intelligenz, 37, 183-186. [Google Scholar] [CrossRef
[8] Zhou, L., Wu, G., Zuo, Y., Chen, X. and Hu, H. (2024) A Comprehensive Review of Vision-Based 3D Reconstruction Methods. Sensors, 24, Article No. 2314. [Google Scholar] [CrossRef] [PubMed]
[9] Xu, Q., Mu, T. and Yang, Y. (2023) A Survey of Deep Learning-Based 3D Shape Generation. Computational Visual Media, 9, 407-442. [Google Scholar] [CrossRef
[10] Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S. and Geiger, A. (2019) Occupancy Networks: Learning 3D Reconstruction in Function Space. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 4455-4465. [Google Scholar] [CrossRef
[11] Mittal, P., Cheng, Y., Singh, M. and Tulsiani, S. (2022) AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 306-315. [Google Scholar] [CrossRef
[12] Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R. and Ng, R. (2021) NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. Communications of the ACM, 65, 99-106. [Google Scholar] [CrossRef
[13] Fan, H., Su, H. and Guibas, L. (2017) A Point Set Generation Network for 3D Object Reconstruction from a Single Image. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 2463-2471. [Google Scholar] [CrossRef
[14] Groueix, T., Fisher, M., Kim, V.G., Russell, B.C. and Aubry, M. (2018) A Papier-Mache Approach to Learning 3D Surface Generation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 216-224. [Google Scholar] [CrossRef
[15] Kerbl, B., Kopanas, G., Leimkuehler, T. and Drettakis, G. (2023) 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Transactions on Graphics, 42, 1-14. [Google Scholar] [CrossRef
[16] Gu, A. and Dao, T. (2023) Mamba: Linear-Time Sequence Modeling with Selective State Spaces.
[17] Zhu, L., Liao, B., Zhang, Q., Wang, X., Liu, W. and Wang, X. (2024) Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model. Proceedings of Machine Learning Research, Vol. 235, 62429-62442.
[18] Yu, Z., Chen, A., Huang, B., Sattler, T. and Geiger, A. (2024) Mip-Splatting: Alias-Free 3D Gaussian Splatting. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 16-22 June 2024, 19447-19456. [Google Scholar] [CrossRef
[19] Sobel, I., and Feldman, G. (1968) A 3x3 Isotropic Gradient Operator for Image Processing. The Stanford Artificial Intelligence Project (SAIL).
[20] https://github.com/vsitzmann/scene-representation-networks
[21] Sitzmann, V., Zollhöfer, M. and Wetzstein, G. (2019) Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, 8-14 December 2019.
[22] Reizenstein, J., Shapovalov, R., Henzler, P., Sbordone, L., Labatut, P. and Novotny, D. (2021) Common Objects in 3D: Large-Scale Learning and Evaluation of Real-Life 3D Category Reconstruction. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 10881-10891. [Google Scholar] [CrossRef
[23] https://github.com/facebookresearch/co3d
[24] Szymanowicz, S., Rupprecht, C. and Vedaldi, A. (2024) Splatter Image: Ultra-Fast Single-View 3D Reconstruction. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 16-22 June 2024, 10208-10217. [Google Scholar] [CrossRef
[25] Yu, A., Ye, V., Tancik, M. and Kanazawa, A. (2021) pixelNeRF: Neural Radiance Fields from One or Few Images. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 4576-4585. [Google Scholar] [CrossRef
[26] Lin, K., Yen-Chen, L., Lai, W., Lin, T., Shih, Y. and Ramamoorthi, R. (2023) Vision Transformer for NeRF-Based View Synthesis from a Single Input Image. 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, 2-7 January 2023, 806-815. [Google Scholar] [CrossRef
[27] Guo, P., Bautista, M.A., Colburn, A., Yang, L., Ulbricht, D., Susskind, J.M., et al. (2022) Fast and Explicit Neural View Synthesis. 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, 3-8 January 2022, 11-20. [Google Scholar] [CrossRef
[28] Jang, W. and Agapito, L. (2021) CodeNeRF: Disentangled Neural Radiance Fields for Object Categories. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 12929-12938. [Google Scholar] [CrossRef
[29] Szymanowicz, S., Rupprecht, C. and Vedaldi, A. (2023) Viewset Diffusion: (0-)Image-Conditioned 3D Generative Models from 2D Data. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, 1-6 October 2023, 8829-8839. [Google Scholar] [CrossRef
[30] Gu, J.T., Trevithick, A., Lin, K.E. et al. (2023) NerfDiff: Single-Image View Synthesis with NeRF-Guided Distillation from 3D-Aware Diffusion.
https://arxiv.org/abs/2302.10109