一种基于U型全卷积神经网络的深度估计模型
Image Depth Estimation Model Based on Fully Convolutional U-Net
DOI: 10.12677/CSA.2019.92029, PDF,  被引量   
作者: 王小康*, 付小宁:西安电子科技大学机电工程学院,陕西 西安;董 悫:武汉高德红外股份有限公司,湖北 武汉
关键词: 单目深度估计全卷积神经网络残差上采样跳跃链接Depth Estimation Fully Convolutional Network Residual Up-Sampling Layers Skip Connection
摘要: 本文解决了从单张图像估计深度信息的问题。单张图像与深度图之间的映射是是模棱两可的,它需要全局信息和局部信息。本文部署了一个全卷积U型神经网络,它用预训练的ResNet-50网络提取图像特征,然后用残差上采样模块将特征图恢复到深度图的尺寸大小,并且使用了跳跃链接,整个网络呈现U型,从而对全局信息和局部信息进行融合。整个网络可以进行端到端的训练。
Abstract: The problem of depth estimation from single image has been addressed. The mapping between a single image and the depth map is inherently ambiguous, and requires both global and local information. This paper presents a fully convolutional U-net whose encoder is pretrained ResNet50 without fully connected layer or pooling layer, and uses residual up-sampling layers to enlarge the feature maps. Besides, skip connection is introduced, making the model U-net, to fuse global and local information. The network can be end-to-end trained.
文章引用:王小康, 付小宁, 董悫. 一种基于U型全卷积神经网络的深度估计模型[J]. 计算机科学与应用, 2019, 9(2): 250-255. https://doi.org/10.12677/CSA.2019.92029

参考文献

[1] Ren, X., Bo, L. and Fox, D. (2012) Rgb-(d) Scene Labeling: Features and Algorithms. 2012 IEEE Conference on Com-puter Vision and Pattern Recognition (CVPR), Providence, 16-21 June 2012, 2759-2766.
[2] Eigen, D. and Fergus, R. (2015) Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture. Proceedings of the IEEE International Conference on Computer Vision, Santiago, 7-13 December 2015, 2650-2658. [Google Scholar] [CrossRef
[3] Saxena, A., Chung, S.H. and Ng, A.Y. (2006) Learning Depth from Single Monocular Images. Advances in Neural Information Processing Systems, 18, 1161-1168.
[4] Saxena, A., Sun, M. and Ng, A.Y. (2009) Make3d: Learning 3d Scene Structure from a Single Still Image. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 824-840. [Google Scholar] [CrossRef
[5] Karsch, K., Liu, C. and Kang, S. (2012) Depth Extraction from Video Using Non-Parametric Sampling. Proceedings of the 12th European Conference on Computer Vision—Volume Part V, Florence, 7-13 October 2012, 775-788. [Google Scholar] [CrossRef
[6] Konrad, J., Wang, M. and Ishwar, P. (2012) 2D-to-3D Image Conversion by Learning Depth from Examples. 2012 IEEE Computer Society Conference on Computer Vision and Pat-tern Recognition Workshops (CVPRW), Providence, 16-21 June 2012, 16-22. [Google Scholar] [CrossRef
[7] Liu, M., Salzmann, M. and He, X. (2014) Dis-crete-Continuous Depth Estimation from a Single Image. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 23-28 June 2014, 716-723. [Google Scholar] [CrossRef
[8] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al. (2015) Imagenet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 115, 211-252. [Google Scholar] [CrossRef
[9] Eigen, D., Puhrsch, C. and Fergus, R. (2014) Depth Map Prediction from a Single Image Using a Multi-Scale Deep Network. Advances in Neural Information Processing Systems, 2366-2374.
[10] Liu, F., Shen, C. and Lin, G. (2015) Deep Con-volutional Neural Fields for Depth Estimation from a Single Image. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 7-12 June 2015, 5162-5170. [Google Scholar] [CrossRef
[11] Liu, F., Shen, C., Lin, G. and Reid, I. (2016) Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, 2024-2039. [Google Scholar] [CrossRef
[12] Li, B., Shen, C., Dai, Y., van den Hengel, A. and He, M. (2015) Depth and Surface Normal Estimation from Monocular Images Using Regression on Deep Features and Hierar-chical CRFS. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 7-12 June 2015, 1119-1127.
[13] Wang, P., Shen, X., Lin, Z., Cohen, S., Price, B. and Yuille, A.L. (2015) Towards Unified Depth and Semantic Prediction from a Single Image. Proceedings of the IEEE Conference on Computer Vision and Pat-tern Recognition, Boston, 7-12 June 2015, 2800-2809.
[14] Cao, Y., Wu, Z. and Shen, C. (2016) Estimating Depth from Monocular Images as Classification Using Deep Fully Convolutional Residual Networks. arXiv:1605.02305 [cs.CV]
[15] Li, B., Dai, Y., Chen, H. and He, M. (2017) Single Image Depth Estimation by Dilated Deep Residual Convolutional Neural Network and Soft-Weight-Sum Inference. arXiv:1705.00534 [cs.CV]
[16] Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. International Con-ference on Medical Image Computing and Computer-Assisted Intervention, Munich, 5-9 October 2015, 234-241. [Google Scholar] [CrossRef
[17] Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F. and Navab, N. (2016) Deeper Depth Prediction with Fully Convolutional Residual Networks. 2016 Fourth International Conference on 3D Vision (3DV), Stanford, 25-28 October 2016, 239-248.
[18] Silberman, N., Hoiem, D., Kohli, P. and Fergus, R. (2012) Indoor Segmentation and Support Inference from RGBD Images. Computer Vision— ECCV 2012, Florence, 7-13 October 2012, 746-760. [Google Scholar] [CrossRef