基于PP-Matting抠图和增量式SfM的三维重建方法
3D Reconstruction Method Based on PP-Matting and Incremental Structure-from-Motion
DOI: 10.12677/MOS.2023.124375, PDF,   
作者: 任梦欣:贵州大学数学与统计学院,贵州 贵阳;贵大·贵安科创超级计算算力算法应用实验室,贵州 贵阳;杨剑锋*:贵州大学数学与统计学院,贵州 贵阳;贵州理工学院大数据学院,贵州 贵阳;邓周灰:贵大·贵安科创超级计算算力算法应用实验室,贵州 贵阳;贵安新区科创产业发展有限公司,贵州 贵阳;邹 琼:深圳瑞云科技股份有限公司,广东 深圳;仝天乐:贵大·贵安科创超级计算算力算法应用实验室,贵州 贵阳;贵州黔驴科技有限公司,贵州 贵阳
关键词: 三维重建抠图PP-MattingSfM + MVS3D Reconstruction Image Matting PP-Matting SfM + MVS
摘要: 基于视觉的三维重建技术通过获取物体的真实图像来还原其三维模型。然而,这些获取的图像通常包含大量无用的背景信息,直接使用这样的图像进行三维重建将导致计算资源和存储空间的浪费。为了解决上述问题,本文提出了一种融合PP-Matting抠图和增量式SfM的三维重建方法,该方法在使用SfM和MVS算法完成三维重建之前,对物体的原始图像进行抠图。本文利用Distinctions-646等多个图像集对PP-Matting抠图模型进行微调训练,得到仅包含待重建物体的图像。实验结果表明,本文提出的方法在重建效率方面取得显著提升,并且能够降低存储空间需求。
Abstract: Visual-based 3D reconstruction techniques aim to restore the three-dimensional models of objects by capturing their real images. However, these captured images often contain a significant amount of irrelevant background information, and directly using such images for 3D reconstruction results in wastage of computational resources and storage space. To address these issues, this paper pro-poses a three-dimensional reconstruction method that combines PP-Matting image matting and in-cremental Structure-from-Motion (SfM). The proposed method performs image matting on the original images of the objects before utilizing SfM and MVS algorithms for 3D reconstruction. The PP-Matting model is fine-tuned using multiple image datasets, including Distinctions-646, to obtain images that solely contain the objects to be reconstructed. Experimental results demonstrate that the proposed method significantly improves reconstruction efficiency and reduces storage space requirements.
文章引用:任梦欣, 杨剑锋, 邓周灰, 邹琼, 仝天乐. 基于PP-Matting抠图和增量式SfM的三维重建方法[J]. 建模与仿真, 2023, 12(4): 4116-4126. https://doi.org/10.12677/MOS.2023.124375

参考文献

[1] Farenzena, M., Fusiello, A. and Gherardi, R. (2009) Structure-and-Motion Pipeline on a Hierarchical Cluster Tree. 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, Kyoto, 27 September-4 October 2009, 1489-1496. [Google Scholar] [CrossRef
[2] Snavely, N., Seitz, S.M. and Szeliski, R. (2006) Photo Tourism: Exploring Photo Collections in 3D. ACM Transactions on Graphics, 25, 835-846. [Google Scholar] [CrossRef
[3] Eigen, D., Puhrsch, C. and Fergus, R. (2014) Depth Map Prediction from a Single Image Using a Multi-Scale Deep Network. Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, 8-13 December 2014, 2366-2374.
[4] Choy, C.B., Xu, D.F., Gwak, J., Chen, K. and Savarese, S. (2016) 3D-R2N2: A Unified Approach for Single and Multi-View 3D Object Reconstruction. Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, 11-14 October 2016, 1-17.
[5] Yu, H. and Oh, J. (2022) Anytime 3D Object Reconstruction Using Multi-Modal Variational Autoencoder. IEEE Robotics and Automation Letters, 7, 2162-2169. [Google Scholar] [CrossRef
[6] Fan, H.Q., Su, H. and Guibas, L.J. (2017) A Point Set Generation Net-work for 3D Object Reconstruction from a Single Image. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 2463-2471. [Google Scholar] [CrossRef
[7] Chen, R., Han, S.F., Xu, J. and Su, H. (2019) Point-Based Multi-View Stereo Network. Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, 27 October-2 November 2019, 1538-1547. [Google Scholar] [CrossRef
[8] Wang, N.Y., Zhang, Y.D., Li, Z.W., Fu, Y.W., Liu, W. and Jiang, Y.-G. (2018) Pixel2mesh: Generating 3D Mesh Models from Single Rgb Images. Proceedings of the European Conference on Com-puter Vision (ECCV), Munich, 8-14 September 2018, 55-71. [Google Scholar] [CrossRef
[9] Yang, C.J., Duraiswami, R., Gumerov, N.A. and Davis, L. (2003) Improved Fast Gauss Transform and Efficient Kernel Density Es-timation. Proceedings 9th IEEE International Conference on Computer Vision, Nice, 13-16 October 2003, 664-671. [Google Scholar] [CrossRef
[10] Li, X., Li, J. and Lu, H. (2019) A Survey on Natural Image Matting with Closed-Form Solutions. IEEE Access, 7, 136658-136675. [Google Scholar] [CrossRef
[11] Levin, A., Rav-Acha, A. and Lischinski, D. (2008) Spectral Matting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 1699-1712. [Google Scholar] [CrossRef
[12] Rother, C., Kolmogorov, V. and Blake, A. (2004) “GrabCut” Interactive Foreground Extraction Using Iterated Graph Cuts. ACM Transactions on Graphics (TOG), 23, 309-314. [Google Scholar] [CrossRef
[13] Weiss, Y. and Freeman, W.T. (2001) On the Optimality of Solutions of the Max-Product Belief-Propagation Algorithm in Arbitrary Graphs. IEEE Transactions on Information Theory, 47, 736-744. [Google Scholar] [CrossRef
[14] Szeliski, R. (2006) Locally Adapted Hierarchical Basis Preconditioning. ACM Transactions on Graphics, 25, 1135-1143. [Google Scholar] [CrossRef
[15] Li, Y. and Lu, H. (2020) Nat-ural Image Matting via Guided Contextual Attention. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 11450-11457. [Google Scholar] [CrossRef
[16] Forte, M. and Pitié, F.F.B. (2020) Alpha Matting.
[17] Ding, H.H., Zhang, H., Liu, C. and Jiang, X.D. (2022) Deep Interactive Image Matting with Feature Propagation. IEEE Transactions on Image Processing, 31, 2421-2432. [Google Scholar] [CrossRef
[18] Sun, Y., Tang, C.-K. and Tai, Y.-W. (2021) Semantic Image Matting. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nash-ville, 20-25 June 2021, 11115-11124. [Google Scholar] [CrossRef
[19] Sengupta, S., Jayaram, V., Curless, B., et al. (2020) Background Matting: The World Is Your Green Screen. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog-nition, Seattle, 13-19 June 2020, 2288-2297. [Google Scholar] [CrossRef
[20] Yu, H.C., Xu, N., Huang, Z.L., Zhou, Y.Q. and Shi, H. (2021) High-Resolution Deep Image Matting. Proceedings of the AAAI Conference on Artificial Intelligence, 35, 3217-3224. [Google Scholar] [CrossRef
[21] Yu, Z.J., Li, X.H., Huang, H.J., Zheng, W. and Chen, L. (2021) Cascade Image Matting with Deformable Graph Refinement. Proceedings of the IEEE/CVF International Conference on Computer Vi-sion, Montreal, 10-17 October 2021, 7147-7156.
[22] Liu, J.L., Yao, Y., Hou, W.D., Cui, M.M., Xie, X.S., Zhang, C.S. and Hua, X.-S. (2020) Boosting Semantic Human Matting with Coarse Annotations. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 13-19 June 2020, 8560-8569.
[23] Zhang, Y.K., Gong, L.X., Fan, L.B., Ren, P.R., Huang, Q.X., Bao, H.J. and Xu, W.W. (2019) A Late Fusion CNN for Digital Matting. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 15-20 June 2019, 7461-7470. [Google Scholar] [CrossRef
[24] Li, J.Z., Zhang, J., Maybank, S.J. and Tao, D.C. (2022) Bridging Com-posite and Real: Towards End-to-End Deep Image Matting. International Journal of Computer Vision, 130, 246-266. [Google Scholar] [CrossRef
[25] Li, J.Z., Ma, S.H., Zhang, J. and Tao, D.C. (2021) Privacy-Preserving Portrait Matting. Proceedings of the 29th ACM International Conference on Multimedia, 20-24 October 2021, 3501-3509.
[26] McCormac, J., Handa, A., Davison, A. and Leutenegger, S. (2017) Semanticfusion: Dense 3D Semantic Mapping with Convolutional Neural Networks. Proceedings of the 2017 IEEE International Conference on Robotics and Au-tomation (ICRA), Singapore, 29 May-3 June 2017, 4628-4635. [Google Scholar] [CrossRef
[27] Park, J.J., Florence, P., Straub, J., Newcombe, R. and Lovegrove, S. (2019) Deepsdf: Learning Continuous Signed Distance Func-tions for Shape Representation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 15-20 June 2019, 165-174.
[28] Zhi, S.F., Laidlow, T., Leutenegger, S. and Davison, A.J. (2021) In-Place Scene Label-ling and Understanding with Implicit Scene Representation. Proceedings of the IEEE/CVF International Conference on Com-puter Vision, Montreal, 10-17 October 2021, 15838-15847.
[29] Chen, G.W., et al. (2022) PP-Matting: High-Accuracy Natural Image Matting.
[30] Ke, Z.H., Sun, J.Y., Li, K.C., Yan, Q. and Lau, R.W.H. (2022) Modnet: Real-Time Trimap-Free Portrait Matting via Objective Decomposition. Proceedings of the AAAI Conference on Artificial Intelligence, 36, 1140-1147. [Google Scholar] [CrossRef
[31] Xu, N., Price, B., Cohen, S. and Huang, T. (2017) Deep Image Matting. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 2970-2979.
[32] Qiao, Y., Liu, Y.H., Yang, X., Zhou, D.S., Xu, M.L., Zhang, Q. and Wei, X.P. (2020) Attention-Guided Hi-erarchical Structure Aggregation for Image Matting. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 13-19 June 2020, 13676-13685. [Google Scholar] [CrossRef
[33] Li, J.Z., Zhang, J. and Tao, D.C. (2021) Deep Automatic Natural Image Matting. Proceedings of the 30th International Joint Conference on Artificial Intelligence, Montreal, 19-27 August 2021, 800-806. [Google Scholar] [CrossRef