基于局部代价层级优化的立体匹配
Stereo Matching with Hierarchical Optimization Based on Local Cost
DOI: 10.12677/CSA.2022.128197, PDF,   
作者: 胡 昊, 刘兴林:五邑大学智能制造学部,广东 江门
关键词: 立体匹配ASPP局部代价层级优化Stereo Matching ASPP Local Cost Hierarchical Optimization
摘要: 随着深度学习的出现和立体匹配数据不断丰富,立体匹配结合深度学习成为研究热点。虽然结合深度学习的立体匹配算法精度上不断地获得提升,但是精度的提升带来的计算复杂度不断加深的神经网络,导致大量的计算成本,这将导致其中立体匹配的方法并不是适用于常规的CPU或者GPU运算。因此保持较高精度并降低算法计算复杂度成为当前立体匹配适用工程上的热点问题。对此本文从降低算法复杂度出发并保持算法性能的目的,提出结合深度可分离卷积的ASPP (Atrous Spatial Pyra-mid Pooling)特征提取,构建局部代价体来降低计算复杂度和内存消耗,并通过层级的方式不断地对视差进行优化,从而保证性能的稳定性。研究通过实验表明,通过局部代价的方式和深度可分离卷积能够降低算法的运行时间,以及通过层级优化和ASPP方式提取特征能保证精度水平,从而在计算成本和精度水平上有一个很好的性能表现。
Abstract: With the emergence of deep learning and the continuous enrichment of stereo matching datasets, stereo matching combined with deep learning has become a research hotspot. Although the accuracy of the stereo matching algorithm combined with deep learning has been continuously improved, the neural network with increasing computational complexity brought about by the improvement of accuracy leads to a large amount of computational cost, which will result in the stereo matching method not suitable for conventional CPU or GPU operation. Therefore, maintaining high accuracy and reducing the computational complexity of the algorithm has become a hot issue in the current stereo matching engineering. In order to reduce the complexity of the algorithm and maintain the performance of the algorithm, this paper proposes the ASPP feature extraction combined with the depthwise separable convolution, constructs a local cost volume to reduce the computational complexity and memory consumption, and continuously optimizes the disparity in a hierar-chical manner to ensure stable performance. The research shows through experiments that the running time of the algorithm can be reduced by means of local cost and depthwise separable convolution, and the level of accuracy can be guaranteed by extracting features through hierarchical optimization and ASPP, so there are many good results in terms of computational cost and precision.
文章引用:胡昊, 刘兴林. 基于局部代价层级优化的立体匹配[J]. 计算机科学与应用, 2022, 12(8): 1964-1973. https://doi.org/10.12677/CSA.2022.128197

参考文献

[1] Hamzah, R.A. and Ibrahim, H. (2016) Literature Survey on Stereo Vision Disparity Map Algorithms. Journal of Sensors, 2016, Article ID: 8742920. [Google Scholar] [CrossRef
[2] Scharstein, D. and Szeliski, R. (2002) A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms. International Journal of Computer Vision, 47, 7-42. [Google Scholar] [CrossRef
[3] Chang, J.-R. and Chen, Y.-S. (2018) Pyramid Stereo Matching Network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 5410-5418. [Google Scholar] [CrossRef
[4] Zhang, F., Prisacariu, V., Yang, R., and Torr, P.H.S. (2019) Ga-Net: Guided Aggregation Net for End-to-End Stereo Matching. Proceedings of the IEEE/CVF Conference on Com-puter Vision and Pattern Recognition, Long Beach, 15-20 June 2019, 185-194. [Google Scholar] [CrossRef
[5] Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R., Bachrach, A., et al. (2017) End-to-End Learning of Geometry and Context for Deep Stereo Regression. Proceedings of the IEEE International Conference on Computer Vision, Venice, 22-29 October 2017, 66-75. [Google Scholar] [CrossRef
[6] Duggal, S., Wang, S., Ma, W.-C., Hu, R. and Urtasun, R. (2019) Deeppruner: Learning Efficient Stereo Matching via Differentiable Patchmatch. Proceedings of the IEEE/CVF Interna-tional Conference on Computer Vision, Seoul, 27 October 2019-2 November 2019, 4383-4392. [Google Scholar] [CrossRef
[7] Yang, G., Manela, J., Happold, M. and Ramanan, D. (2019) Hier-archical Deep Stereo Matching on High-Resolution Images. Proceedings of the IEEE/CVF Conference on Computer Vi-sion and Pattern Recognition, Long Beach, 15-20 June 2019, 5510-5519. [Google Scholar] [CrossRef
[8] Xiao, J., Ma, D. and Yamane, S. (2021) Optimizing 3D Convolu-tion Kernels on Stereo Matching for Resource Efficient Computations. Sensors, 21, Article No. 6808. [Google Scholar] [CrossRef] [PubMed]
[9] Shamsafar, F., Woerz, S., Rahim, R. and Zell, A. (2022) MobileStereoNet: Towards Lightweight Deep Networks for Stereo Matching. Proceedings of the IEEE/CVF Winter Conference on Appli-cations of Computer Vision, Waikoloa, 3-8 January 2022, 677-686. [Google Scholar] [CrossRef
[10] Badki, A., Troccoli, A., Kim, K., Kautz, J., Sen, P. and Gallo, O. (2020) Bi3d: Stereo Depth Estimation via Binary Classifications. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13-19 June 2020, Seattle, 1597-1605. [Google Scholar] [CrossRef
[11] Rao, Z., He, M., Dai, Y., Zhu, Z., Li, B. and He, R. (2020) NLCA-Net: A Non-Local Context Attention Network for Stereo Matching. Apsipa Transactions on Signal and Infor-mation Processing, 9, Article No. e18. [Google Scholar] [CrossRef
[12] Guo, X., Yang, K., Yang, W. and Li, H. (2019) Group-Wise Correla-tion Stereo Network. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 15-20 June 2019, 3268-3277. [Google Scholar] [CrossRef
[13] Wu, Z., Wu, X., Zhang, X., Wang, S. and Ju, L. (2019) Semantic Stereo Matching with Pyramid Cost Volumes. Proceedings of the IEEE/CVF International Conference on Computer Vi-sion, Seoul, 27 October-2 November 2019, 7483-7492. [Google Scholar] [CrossRef
[14] Shen, Z., Dai, Y. and Rao, Z. (2021) Cfnet: Cascade and Fused Cost Volume for Robust Stereo Matching. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, 20-25 June 2021, 13901-13910. [Google Scholar] [CrossRef
[15] Lu, C., Uchiyama, H., Thomas, D., Shimada, A. and Taniguchi, R.-I. (2018) Sparse Cost Volume for Efficient Stereo Matching. Remote Sensing, 10, Article No. 1844. [Google Scholar] [CrossRef
[16] Sifre, L. and Mallat, S. (2014) Rigid-Motion Scattering for Image Classi-fication. Ph.D. Thesis, Ecole Polytechnique, Palaiseau.
[17] Vanhoucke, V. (2014) Learning Visual Representations at Scale. ICLR Invited Talk, 1.
[18] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., et al. (2017) MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. Arxiv Preprint arxiv:170404861.
[19] Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. and Chen, L.-C. (2018) Mobilenetv2: In-verted Residuals and Linear Bottlenecks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recogni-tion, Salt Lake City, 18-23 June 2018, 4510-4520. [Google Scholar] [CrossRef
[20] He, K., Zhang, X., Ren, S. and Sun, J. (2015) Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 1904-1916. [Google Scholar] [CrossRef
[21] Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F. and Adamm H. (2018) Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, 8-14 September 2018, 833-851. [Google Scholar] [CrossRef
[22] Tankovich, V., Hane, C., Zhang, Y., Kowdle, A., Fanello, S. and Bouaziz, S. (2021) Hitnet: Hierarchical Iterative Tile Refinement Network for Real-Time Stereo Matching. Proceed-ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, 20-25 June 2021, 14357-14367. [Google Scholar] [CrossRef
[23] Menze, M. and Geiger, A. (2015) Object Scene Flow for Autonomous Vehicles. Proceedings of the IEEE Conference on Computer Vision and Pattern Recogni-tion, Boston, 7-12 June 2015, 3061-3070. [Google Scholar] [CrossRef
[24] Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Doso-vitskiy, A., et al. (2016) A large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 4040-4048. [Google Scholar] [CrossRef
[25] Rao, Z., Dai, Y., Shen, Z. and He, R. (2022) Rethinking Training Strategy in Stereo Matching. IEEE Transactions on Neural Networks and Learning Systems, 1-14. [Google Scholar] [CrossRef
[26] Khamis, S., Fanello, S., Rhemann, C., Kowdle, A., Valentin, J. and Izadi, S. (2018) Stereonet: Guided Hierarchical Refinement for Real-Time Edge-Aware Depth Prediction. Proceed-ings of the European Conference on Computer Vision (ECCV), Munich, 8-14 September 2018, 596-613. [Google Scholar] [CrossRef
[27] Cheng, X., Zhong, Y., Harandi, M., et al. (2020) Hierarchical Neural Architecture Search for Deep Stereo Matching. Proceedings of the 34th International Conference on Neural In-formation Processing Systems, Vancouver, 6-12 December 2020, Article No. 1858.
[28] Song, X., Zhao, X., Hu, H. and Fang, L. (2018) Edgestereo: A Context Integrated Residual Pyramid Network for Stereo Matching. Proceedings of the Asian Conference on Computer Vision, Perth, 2-6 December 2018, 20-35. [Google Scholar] [CrossRef