基于自适应卷积的视差细化网络
Parallax Refinement Network Based on Adaptive Convolution
摘要: 为提高视差细化的精度,本文提出一种基于自适应卷积的视差细化与采样方法。利用细化假设为视差细化引入其它可用信息,在不同阶段附加不同的信息来增强细化假设。基于自适应传播方法构建局部代价卷,并将聚合操作从空间域转换至视差域,以缓解使用大卷积窗口带来的边界模糊问题,增强在无纹理或弱纹理区域的聚合效果。同时,使用自适应卷积从相似视差平面上更新视差,进而提高视差的精度。对于上采样过程,利用视差自适应采样克服双线性插值导致的精度下降问题。在SceneFlow和KITTI2015数据集上,对算法进行验证,实验结果表明,相比原始方法,本文算法在精度方面有了明显提升,特别是在KITTI2015数据集上,端点误差(EPE)和3像素错误率指标分别提升9.7%和12.5%。
Abstract: To improve the accuracy of disparity refinement, this paper proposes a disparity refinement and sampling method based on adaptive convolution. The refine hypothesis is used to introduce other available information for disparity refinement, and different information is attached at different stages to augment the refine hypothesis. Based on the adaptive propagation method, the local cost volume is constructed and the aggregation operation is converted from the spatial domain to the disparity domain to alleviate the boundary blur problem caused by using large convolution windows and augmented the aggregation effect in texture-free or weakly textured areas. At the same time, adaptive convolution is used to update the disparity from similar disparity planes, which in turn improves the accuracy of disparity. For the upsampling process, disparity adaptive sampling is used to overcome the degradation of accuracy caused by bilinear interpolation. The algorithm is validated on the SceneFlow and KITTI2015 datasets, and the experimental results show that the algorithm in this paper has significantly improved in terms of accuracy compared with the original method, especially on the KITTI2015 dataset, the Endpoint Error (EPE) and 3-pixel error rate indicators have increased by 9.7% and 12.5%, respectively.
文章引用:胡文辉, 周泽豪, 戚桢. 基于自适应卷积的视差细化网络[J]. 光电子, 2022, 12(4): 147-158. https://doi.org/10.12677/OE.2022.124017

参考文献

[1] Scharstein, D. and Szeliski, R. (2002) A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algo-rithms. International Journal of Computer Vision, 47, 7-42. [Google Scholar] [CrossRef
[2] Poggi, M., Tosi, F., Batsos, K., et al. (2021) On the Synergies between Machine Learning and Binocular Stereo for Depth Es-timation from Images: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 5314-5334. [Google Scholar] [CrossRef
[3] Laga, H., Jospin, L.V., Boussaid, F., et al. (2020) A Survey on Deep Learning Techniques for Stereo-Based Depth Estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 1738-1764.
[4] Yang, G., Zhao, H., Shi, J., et al. (2018) SegStereo: Exploiting Semantic Information for Disparity Estimation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, 8-14 September 2018, 636-651. [Google Scholar] [CrossRef
[5] Song, X., Zhao, X., Hu, H., et al. (2018) EdgeStereo: A Context Integrated Residual Pyramid Network for Stereo Matching. In: Asian Conference on Computer Vision, Springer, Cham, 20-35. [Google Scholar] [CrossRef
[6] Song, X., Zhao, X., Fang, L., et al. (2020) EdgeStereo: An Effective Multi-Task Learning Network for Stereo Matching and Edge Detection. International Journal of Computer Vision, 128, 910-930. [Google Scholar] [CrossRef
[7] Khamis, S., Fanello, S., Rhemann, C., et al. (2018) StereoNet: Guided Hierarchical Refinement for Real-Time Edge-Aware Depth Prediction. Proceedings of the European Conference on Computer Vision (ECCV), Munich, 8-14 September 2018, 573-590. [Google Scholar] [CrossRef
[8] Chabra, R., Straub, J., Sweeney, C., et al. (2019) StereoDRNet: Dilated Residual StereoNet. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 15-20 June 2019, 11786-11795. [Google Scholar] [CrossRef
[9] Wang, L., Guo, Y., Wang, Y., et al. (2020) Parallax Attention for Unsupervised Stereo Correspondence Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 2108-2125.
[10] Kendall, A., Martirosyan, H., Dasgupta, S., et al. (2017) End-to-End Learning of Geometry and Context for Deep Stereo Regression. Proceedings of the IEEE International Conference on Computer Vision, Venice, 22-29 October 2017, 66-75. [Google Scholar] [CrossRef
[11] Chang, J.R. and Chen, Y.S. (2018) Pyramid Stereo Matching Network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-22 June 2018, 5410-5418. [Google Scholar] [CrossRef
[12] Mayer, N., Ilg, E., Hausser, P., et al. (2016) A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 4040-4048. [Google Scholar] [CrossRef
[13] Jie, Z., Wang, P., Ling, Y., et al. (2018) Left-Right Comparative Recurrent Model for Stereo Matching. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-22 June 2018, 3838-3846. [Google Scholar] [CrossRef
[14] Xu, H. and Zhang, J. (2020) AANet: Adaptive Aggregation Network for Efficient Stereo Matching. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 14-19 June 2020, 1959-1968. [Google Scholar] [CrossRef
[15] Wang, F., Galliani, S., Vogel, C., et al. (2021) PatchmatchNet: Learned Multi-View Patchmatch Stereo. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, 20-25 June 2021, 14194-14203. [Google Scholar] [CrossRef
[16] Tankovich, V., Hane, C., Zhang, Y., et al. (2021) HITNet: Hierarchical Iterative Tile Refinement Network for Real-Time Stereo Matching. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, 20-25 June 2021, 14362-14372. [Google Scholar] [CrossRef
[17] Dai, J., Qi, H., Xiong, Y., et al. (2017) Deformable Convolutional Networks. Proceedings of the IEEE International Conference on Computer Vision, Venice, 22-29 October 2017, 764-773. [Google Scholar] [CrossRef
[18] Zhu, X., Hu, H., Lin, S., et al. (2019) Deformable ConvNets V2: More Deformable, Better Results. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 16-17 June 2019, 9308-9316. [Google Scholar] [CrossRef
[19] Min, D., Lu, J. and Do, M.N. (2011) A Revisit to Cost Aggregation in Stereo Matching: How Far Can We Reduce Its Computational Redundancy? 2011 IEEE International Conference on Computer Vision, Barcelona, 6-13 November 2011, 1567-1574. [Google Scholar] [CrossRef
[20] Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, Cham, 234-241. [Google Scholar] [CrossRef
[21] Long, J., Shelhamer, E. and Darrell, T. (2015) Fully Convolutional Networks for Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 7-12 June 2015, 3431-3440. [Google Scholar] [CrossRef
[22] He, K., Zhang, X., Ren, S., et al. (2016) Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef
[23] Menze, M. and Geiger, A. (2015) Object Scene Flow for Autonomous Vehicles. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 7-12 June 2015, 3061-3070. [Google Scholar] [CrossRef