高效并转融全景玻璃精分网络
Efficient and Integrated Panoramic Glass Precision Sorting Network
DOI: 10.12677/csa.2026.162061, PDF,    科研立项经费支持
作者: 黄智鸿, 李科霖, 余昊辉, 常青玲*:五邑大学电子与信息工程学院,广东 江门;黄舒荧:华南农业大学继续教育学院,广东 广州;崔 岩:五邑大学电子与信息工程学院,广东 江门;珠海市四维时代网络科技有限公司,广东 珠海
关键词: 玻璃分割深度学习神经网络全景语义分割Glass Segmentation Deep Learning Neural Network Panoramic Semantic Segmentation
摘要: 精准分割环境中的玻璃物体,是提升自动驾驶、深度感知等视觉系统性能的关键环节。然而当前主流的深度学习分割方法,其训练与推理几乎完全建立在传统的透视图像之上,这类图像的有限视野与局部的上下文信息,使其在处理开阔场景中尺度多变、距离各异的玻璃目标时显得力不从心。虽然全景成像能提供无死角的全局环境感知,但其中玻璃区域因透视产生的剧烈形变,与其自身的透光、反射等固有光学特性相互作用,构成了一个极度复杂的视觉分析难题,远超传统透视图像所面临的挑战。为系统性地解决上述难题,本文提出了一种新的网络架构——高效并转融全景玻璃精分网络。该神经网络架构系统集成了注意力机制、转置卷积、深度可分离卷积和空卷积等先进操作,分别设计了三个模块:高效并行卷融深度可分模块、高效转卷双支融调模块和高效并转累融精调模块,用于对主干网络提取的特征进行再处理。我们在PanoGlass V2等基准数据集上的实验表明,本方法关键指标显著优于现有技术,IoU、MAE与F-Score分别达到91.37%、95.49%与0.0060,验证了其高效性与优越的泛化能力,为复杂场景下的全景视觉应用提供了可靠解决方案。
Abstract: Accurately segmenting glass objects in the environment is a crucial step in enhancing the performance of visual systems such as autonomous driving and deep perception. However, the current mainstream deep learning segmentation methods rely almost entirely on traditional perspective images for training and inference. The limited field of view and local contextual information of such images make them inadequate in handling glass targets with varying scales and distances in open scenes. Although panoramic imaging provides a comprehensive and unobstructed view of the environment, the severe deformation of glass areas due to perspective, coupled with their inherent optical properties such as light transmission and reflection, poses an extremely complex visual analysis challenge, far exceeding the challenges faced by traditional perspective images. To systematically address the aforementioned challenges, this paper proposes a novel network architecture—Efficient and Integrated Panoramic Glass Precision Sorting Network. This neural network architecture integrates advanced operations such as attention mechanisms, transposed convolution, depthwise separable convolution, and spatial convolution, and designs three modules: the Efficient Parallel-to-Global Deepwise Separable Module, the Efficient Transposed-to-Global Dual-Stream Fusion Module, and the Efficient Parallel-to-Global Accumulative Fusion Fine-tuning Module, for reprocessing the features extracted by the backbone network. Our experiments on benchmark datasets such as PanoGlass V2 demonstrate that the key metrics of this method significantly outperform existing techniques, achieving IoU, MAE, and F-Score of 91.37%, 95.49%, and 0.0060, respectively. This verifies its efficiency and superior generalization ability, providing a reliable solution for panoramic vision applications in complex scenes.
文章引用:黄智鸿, 李科霖, 黄舒荧, 余昊辉, 常青玲, 崔岩. 高效并转融全景玻璃精分网络[J]. 计算机科学与应用, 2026, 16(2): 314-327. https://doi.org/10.12677/csa.2026.162061

参考文献

[1] Fu, X., Zhang, S., Chen, T., Lu, Y., Zhou, X., Geiger, A. and Liao, Y. (2023) Panopticnerf-360: Panoramic 3D-to-2D Label Transfer in Urban Scenes.
[2] Cassar, D.R. (2023) Glassnet: A Multitask Deep Neural Network for Predicting Many Glass Properties. Ceramics International, 49, 36013-36024. [Google Scholar] [CrossRef
[3] Xie, E., Wang, W., Wang, W., Ding, M., Shen, C. and Luo, P. (2020) Segmenting Transparent Objects in the Wild. In: Vedaldi, A., et al., Eds., Computer VisionECCV 2020, Springer International Publishing, 696-711. [Google Scholar] [CrossRef
[4] Xie, E., Wang, W., Wang, W., Sun, P., Xu, H., Liang, D., et al. (2021) Segmenting Transparent Objects in the Wild with Transformer. Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, Montreal, 19-27 August 2021, 1194-1200. [Google Scholar] [CrossRef
[5] Huo, D., Wang, J., Qian, Y. and Yang, Y. (2023) Glass Segmentation with RGB-Thermal Image Pairs. IEEE Transactions on Image Processing, 32, 1911-1926. [Google Scholar] [CrossRef] [PubMed]
[6] Mei, H., Dong, B., Dong, W., Yang, J., Baek, S., Heide, F., et al. (2022) Glass Segmentation Using Intensity and Spectral Polarization Cues. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 12622-12631. [Google Scholar] [CrossRef
[7] Chang, Q., Meng, X., Hong, Z. and Cui, Y. (2024) ProgressiveGlassNet:Glass Detection with Progressive Decoder. 2024 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA), Kaifeng, 30 October-2 November 2024, 917-925. [Google Scholar] [CrossRef
[8] Mei, H., Yang, X., Wang, Y., Liu, Y., He, S., Zhang, Q., et al. (2020) Don’t Hit Me! Glass Detection in Real-World Scenes. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 3687-3696. [Google Scholar] [CrossRef
[9] Lin, J., He, Z. and Lau, R.W. (2021) Rich Context Aggregation with Reflection Prior for Glass Surface Detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, 19-25 June 2021, 13415-13424.
[10] He, H., Li, X., Cheng, G., Shi, J., Tong, Y., Meng, G., Prinet, V. and Weng, L. (2021) Enhanced Boundary Learning for Glass-Like Object Segmentation. Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, 10-17 October 2021, 15859-15868.
[11] Yu, L., Mei, H., Dong, W., Wei, Z., Zhu, L., Wang, Y., et al. (2022) Progressive Glass Segmentation. IEEE Transactions on Image Processing, 31, 2920-2933. [Google Scholar] [CrossRef] [PubMed]
[12] Zheng, C., Li, P., Zhang, X., Lu, X. and Wei, M. (2023) Don’t Worry about Mistakes! Glass Segmentation Network via Mistake Correction.
https://api.semanticscholar.org/CorpusID:258291912
[13] Chang, Q., Liao, H., Meng, X., Xu, S. and Cui, Y. (2024) Panoglassnet: Glass Detection with Panoramic RGB and Intensity Images. IEEE Transactions on Instrumentation and Measurement, 73, 1-15. [Google Scholar] [CrossRef
[14] Eder, M., Shvets, M., Lim, J. and Frahm, J. (2020) Tangent Images for Mitigating Spherical Distortion. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 12426-12434. [Google Scholar] [CrossRef
[15] Lee, Y., Jeong, J., Yun, J., Cho, W. and Yoon, K. (2019) SpherePHD: Applying CNNs on a Spherical PolyHeDron Representation of 360˚ Images. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 9181-9189. [Google Scholar] [CrossRef
[16] Orhan, S. and Bastanlar, Y. (2021) Semantic Segmentation of Outdoor Panoramic Images. Signal, Image and Video Processing, 16, 643-650. [Google Scholar] [CrossRef
[17] Sun, C., Sun, M. and Chen, H. (2021) HoHoNet: 360 Indoor Holistic Understanding with Latent Horizontal Features. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 2573-2582. [Google Scholar] [CrossRef
[18] Zheng, Z., Lin, C., Nie, L., Liao, K., Shen, Z. and Zhao, Y. (2023) Complementary Bi-Directional Feature Compression for Indoor 360˚ Semantic Segmentation with Self-Distillation. 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, 2-7 January 2023, 4501-4510. [Google Scholar] [CrossRef
[19] Shen, Z., Lin, C., Liao, K., Nie, L., Zheng, Z. and Zhao, Y. (2022) PanoFormer: Panorama Transformer for Indoor 360˚ Depth Estimation. In: Avidan, S., et al., Eds., Computer VisionECCV 2022, Springer, 195-211. [Google Scholar] [CrossRef
[20] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., et al. (2019) UPSNet: A Unified Panoptic Segmentation Network. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 8810-8818. [Google Scholar] [CrossRef
[21] Zhang, H., Wu, C., Zhang, Z., Zhu, Y., Lin, H., Zhang, Z., et al. (2022) ResNeSt: Split-Attention Networks. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), New Orleans, 19-20 June 2022, 2736-2746. [Google Scholar] [CrossRef
[22] Yu, C., Wang, J., Peng, C., Gao, C., Yu, G. and Sang, N. (2018) BiSeNet: Bilateral Segmentation Network for Real-Time Semantic Segmentation. In: Ferrari, V., et al., Eds., Computer VisionECCV 2018, Springer International Publishing, 334-349. [Google Scholar] [CrossRef
[23] Xie, E., Wang, W., Yu, Z., et al. (2021) SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers.
[24] Kirillov, A., Girshick, R., He, K. and Dollár, P. (2019) Panoptic Feature Pyramid Networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 6392-6401. [Google Scholar] [CrossRef
[25] He, K., Chen, X., Xie, S., Li, Y., Dollar, P. and Girshick, R. (2022) Masked Autoencoders Are Scalable Vision Learners. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 15979-15988. [Google Scholar] [CrossRef
[26] Yu, W., Luo, M., Zhou, P., Si, C., Zhou, Y., Wang, X., et al. (2022) MetaFormer Is Actually What You Need for Vision. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 10809-10819. [Google Scholar] [CrossRef
[27] Yu, C., Gao, C., Wang, J., Yu, G., Shen, C. and Sang, N. (2021) Bisenet V2: Bilateral Network with Guided Aggregation for Real-Time Semantic Segmentation. International Journal of Computer Vision, 129, 3051-3068. [Google Scholar] [CrossRef
[28] Kirillov, A., Wu, Y., He, K. and Girshick, R. (2020) PointRend: Image Segmentation as Rendering. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 9796-9805. [Google Scholar] [CrossRef
[29] Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2020) An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale.
[30] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., et al. (2021) Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 10012-10022. [Google Scholar] [CrossRef
[31] Fan, M., Lai, S., Huang, J., Wei, X., Chai, Z., Luo, J., et al. (2021) Rethinking BiSeNet for Real-Time Semantic Segmentation. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 9716-9725. [Google Scholar] [CrossRef
[32] Chu, X., Tian, Z., Wang, Y., Zhang, B., Ren, H., Wei, X., Xia, H. and Shen, C. (2021) Twins: Revisiting the Design of Spatial Attention in Vision Transformers. Advances in Neural Information Processing Systems, Vol. 34, 9355-9366.
[33] Liu, Z., Mao, H., Wu, C., Feichtenhofer, C., Darrell, T. and Xie, S. (2022) A Convnet for the 2020s. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 11976-11986. [Google Scholar] [CrossRef
[34] Guo, M.-H., Lu, C.-Z., Hou, Q., Liu, Z., Cheng, M.-M. and Hu, S.-M. (2022) Segnext: Rethinking Convolutional Attention Design for Semantic Segmentation. Advances in Neural Information Processing Systems, Vol. 35, 1140-1156.
[35] He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef
[36] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niebner, M., Savva, M., et al. (2017) Matterport3D: Learning from RGB-D Data in Indoor Environments. 2017 International Conference on 3D Vision (3DV), Qingdao, 10-12 October 2017, 667-676. [Google Scholar] [CrossRef
[37] Armeni, Sax, S., Zamir, A.R. and Savarese, S. (2017) Joint 2D-3D-Semantic Data for Indoor Scene Understanding.
[38] Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T. and Niessner, M. (2017) ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 5828-5839. [Google Scholar] [CrossRef
[39] Song, S., Lichtenberg, S.P. and Xiao, J. (2015) SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 7-12 June 2015, 567-576. [Google Scholar] [CrossRef