特征流解耦下基于多视图像的三维目标重建
Three-Dimensional Object Reconstruction from Multi-View Images with Disentangled Attribute Flow
DOI: 10.12677/csa.2024.149195, PDF,    科研立项经费支持
作者: 焦 旋, 许 斌, 刘健龙:江西理工大学信息工程学院,江西 赣州;岑 颖:广东茂名幼儿师范专科学校计算机学院,广东 茂名
关键词: 3DAttriFlow目标重建多视图通道注意力特征融合3DAttriFlow Object Reconstruction Multi-View Images Channel Attention Feature Fusion
摘要: 在深度学习技术的推动下,基于二维图像的三维目标重建得到迅速发展。常用的方法是先从二维图像中提取出特征流,然后用特征流引导解码器估计三维目标结构。三维特征流解耦(Three-dimensional Disentangled Attribute Flow, 3DAttriFlow)模型能将提取出的特征流进行解耦,并用明确的几何和语义特征引导三维目标重建。然而,3DAttriFlow仅适用于基于单视图像的三维重建,单视图中存在遮挡部分信息缺失,导致重建性能仍有待提升。本文将3DAttriFlow推广至多视图像的三维目标重建,分别从特征提取和多视图像的特征融合两个方面对模型进行改进。特征提取方面,在原始骨干网络ResNet18中引入通道注意力机制,以突出重要通道特征;多视图特征融合方面,通过注意力模块将多个视图的特征进行融合,以获取更为完整和丰富的目标特征。ShapeNet数据子集上的实验结果表明,与原始的3DAttriFlow模型相比,本文的改进模型能获得更好的三维重建质量。
Abstract: Driven by deep learning technology, three-dimensional (3D) object reconstruction based on two-dimensional (2D) images has been developed rapidly. The commonly used method is to extract attribute flow from 2D images and then use them to estimate 3D object structure. The three-dimensional disentangled attribute flow (3DAttriFlow) model can disentangle the attribute flows and guide 3D object reconstruction with explicit geometric and semantic attributes. However, 3DAttriFlow is only suitable for 3D reconstruction based on a single-view image, and there is missing information in the occluded parts of the single-view image, which leads to the need for improvement in the reconstruction performance. In this paper, we extend 3DAttriFlow to 3D object reconstruction from multi-view images and improve the model from two aspects: feature extraction and fusion of features from multi-view images. In terms of feature extraction, the channel attention mechanism is introduced into the original backbone network ResNet18 to highlight important features. In the aspect of multi-view feature fusion, the attention module is used to fuse the features of multiple views to obtain more abundant and complete features. The experimental results on the ShapeNet data subset show that compared with the original 3DAttriFlow model, the improved model in this paper can achieve better 3D reconstruction quality.
文章引用:焦旋, 岑颖, 许斌, 刘健龙. 特征流解耦下基于多视图像的三维目标重建[J]. 计算机科学与应用, 2024, 14(9): 141-150. https://doi.org/10.12677/csa.2024.149195

参考文献

[1] 刘乐元, 孙见弛, 高韵琪, 等. 基于深度学习的单图像三维人体重建研究综述[J]. 华中科技大学学报(自然科学版), 2024, 52(5): 98-122.
[2] 杨航, 陈瑞, 安仕鹏, 等. 深度学习背景下的图像三维重建技术进展综述[J]. 中国图象图形学报, 2023, 28(8): 2396-2409.
[3] Wu, J., Wyman, O., Tang, Y., Pasini, D. and Wang, W. (2024) Multi-View 3D Reconstruction Based on Deep Learning: A Survey and Comparison of Methods. Neurocomputing, 582, Article ID: 127553. [Google Scholar] [CrossRef
[4] Choy, C.B., Xu, D., Gwak, J., Chen, K. and Savarese, S. (2016) 3D-R2N2: A Unified Approach for Single and Multi-View 3D Object Reconstruction. In: Leibe, B., Matas, J., Sebe, N. and Welling, M., Eds., Computer VisionECCV 2016, Springer, 628-644. [Google Scholar] [CrossRef
[5] Tatarchenko, M., Dosovitskiy, A. and Brox, T. (2017) Octree Generating Networks: Efficient Convolutional Architectures for High-Resolution 3D Outputs. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 2088-2096. [Google Scholar] [CrossRef
[6] Fan, H., Su, H. and Guibas, L. (2017) A Point Set Generation Network for 3D Object Reconstruction from a Single Image. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 605-613. [Google Scholar] [CrossRef
[7] Wen, X., Zhou, J., Liu, Y., Su, H., Dong, Z. and Han, Z. (2022) 3D Shape Reconstruction from 2D Images with Disentangled Attribute Flow. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 19-24 June 2022, 3803-3813. [Google Scholar] [CrossRef
[8] Groueix, T., Fisher, M., Kim, V.G., Russell, B.C. and Aubry, M. (2018) A Papier-Mache Approach to Learning 3D Surface Generation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 216-224. [Google Scholar] [CrossRef
[9] Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W. and Jiang, Y. (2018) Pixel2mesh: Generating 3D Mesh Models from Single RGB Images. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., Computer VisionECCV 201, Springer, 55-71. [Google Scholar] [CrossRef
[10] Wen, C., Zhang, Y., Li, Z. and Fu, Y. (2019) Pixel2Mesh++: Multi-View 3D Mesh Generation via Deformation. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 1042-1051. [Google Scholar] [CrossRef
[11] Yang, X., Lin, G. and Zhou, L. (2023) Single-View 3D Mesh Reconstruction for Seen and Unseen Categories. IEEE Transactions on Image Processing, 32, 3746-3758. [Google Scholar] [CrossRef] [PubMed]
[12] Velickovic, P., Cucurull, G., Casanova, A., et al. (2017) Graph Attention Networks. International Conference on Learning Representations, Vancouver, 30 April-3 May 2018.
[13] Hu, J., Shen, L. and Sun, G. (2018) Squeeze-and-Excitation Networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 7132-7141. [Google Scholar] [CrossRef
[14] Yang, B., Wang, S., Markham, A. and Trigoni, N. (2019) Robust Attentional Aggregation of Deep Feature Sets for Multi-View 3D Reconstruction. International Journal of Computer Vision, 128, 53-73. [Google Scholar] [CrossRef
[15] Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S. and Geiger, A. (2019) Occupancy Networks: Learning 3D Reconstruction in Function Space. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 16-20 June 2019, 4460-4470. [Google Scholar] [CrossRef