基于结构引导Transformer的单视图三维重建去模糊方法
A Structure-Guided Transformer-Based Single-View 3D Reconstruction Deblurring Method
DOI: 10.12677/csa.2026.161016, PDF,   
作者: 张媛梦, 林立霞, 曹 鹏:北京印刷学院信息工程学院,北京
关键词: Transformer三维重建3D Gaussian SplattingTransformer 3D Reconstruction 3D Gaussian Splatting
摘要: 随着XR与AR等交互式应用的迅速发展,利用图像进行三维重建在计算机视觉领域展现出重要价值。然而,实际拍摄图像过程中普遍存在的运动模糊会削弱纹理与结构信息,显著降低三维重建的几何一致性与细节完整度。为此,本文提出了一种面向单视图三维重建任务的结构引导Transformer去模糊网络。该方法引入了显式结构先验,通过结构引导前馈网络增强Transformer在模糊区域的边缘辨识能力;同时使用多头卷积自注意力模块降低传统自注意力的计算复杂度并加强局部空间建模能力。为了验证结构恢复对三维几何推断的有效性,本文将去模糊结果输入3D Gaussian Splatting的单视图重建框架中进行评估。实验结果显示,所提方法在多项指标上均取得更优表现。
Abstract: With the rapid development of interactive applications such as XR and AR, 3D reconstruction has demonstrated significant value in the field of computer vision. However, motion blur, which is prevalent in practice, weakens texture and structural information, significantly reducing the geometric consistency and detail integrity of 3D reconstruction. To address this, this paper proposes a structure-guided Transformer deblurring network for 3D reconstruction tasks. This method introduces an explicit structural prior and enhances the Transformer’s edge recognition ability in blurred regions through a structure-guided feedforward network; simultaneously, it uses a multi-head convolutional self-attention module to reduce the computational complexity of traditional self-attention and strengthen local spatial modeling capabilities. To verify the effectiveness of structural recovery for 3D geometric inference, the deblurring results are evaluated using a single-view reconstruction framework based on 3D Gaussian Splatting. Experimental results show that the proposed method achieves superior performance on multiple metrics.
文章引用:张媛梦, 林立霞, 曹鹏. 基于结构引导Transformer的单视图三维重建去模糊方法[J]. 计算机科学与应用, 2026, 16(1): 198-204. https://doi.org/10.12677/csa.2026.161016

参考文献

[1] Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R. and Ng, R. (2021) NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. Communications of the ACM, 65, 99-106. [Google Scholar] [CrossRef
[2] Kerbl, B., Kopanas, G., Leimkuehler, T. and Drettakis, G. (2023) 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Transactions on Graphics, 42, 1-14. [Google Scholar] [CrossRef
[3] Fu, K., Peng, J., He, Q. and Zhang, H. (2020) Single Image 3D Object Reconstruction Based on Deep Learning: A Review. Multimedia Tools and Applications, 80, 463-498. [Google Scholar] [CrossRef
[4] Yang, S., Zhang, H., Ren, J., Tang, Z., Zhao, M. and Liu, Y. (2025) Zero-1-to-3DGS: A Single Image to 3D Gaussian by Consistent Multi-View Generation. 2025 IEEE International Conference on Multimedia and Expo (ICME), Nantes, 30 June-4 July 2025, 1-6. [Google Scholar] [CrossRef
[5] Xu, D., Jiang, Y., Wang, P., Fan, Z., Shi, H. and Wang, Z. (2022) SinNeRF: Training Neural Radiance Fields on Complex Scenes from a Single Image. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M. and Hassner, T., Eds., Computer VisionECCV 2022, Springer, 736-753. [Google Scholar] [CrossRef
[6] Yu, A., Ye, V., Tancik, M. and Kanazawa, A. (2021) PixeINeRF: Neural Radiance Fields from One or Few Images. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 4576-4585. [Google Scholar] [CrossRef
[7] Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S. and Yang, M. (2022) Restormer: Efficient Transformer for High-Resolution Image Restoration. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 5718-5729. [Google Scholar] [CrossRef
[8] Szymanowicz, S., Rupprecht, C. and Vedaldi, A. (2024) Splatter Image: Ultra-Fast Single-View 3D Reconstruction. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 16-22 June 2024, 10208-10217. [Google Scholar] [CrossRef
[9] Han, K., Wang, Y., Chen, H., Chen, X., Guo, J., Liu, Z., et al. (2023) A Survey on Vision Transformer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45, 87-110. [Google Scholar] [CrossRef] [PubMed]
[10] Shi, W., Caballero, J., Huszar, F., Totz, J., Aitken, A.P., Bishop, R., et al. (2016) Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 1874-1883. [Google Scholar] [CrossRef
[11] Nah, S., Kim, T.H. and Lee, K.M. (2017) Deep Multi-Scale Convolutional Neural Network for Dynamic Scene Deblurring. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 257-265. [Google Scholar] [CrossRef
[12] Kupyn, O., Budzan, V., Mykhailych, M., Mishkin, D. and Matas, J. (2018) DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 8183-8192. [Google Scholar] [CrossRef