基于四通道不可分加性小波与DeepLabv3+结合的语义分割模型
Semantic Segmentation Model Based on Four Channel Non-Separable Additive Wavelet Combined with DeepLabv3+
摘要: 为了改善传统语义分割模型中因丢失细节,从而导致信息下降的问题,我们提出了一种改进的DeepLabv3+网络分割模型。首先将主干网络替换为MobileNetV2网络;其次通过构造四通道不可分小波低通滤波器,对源图像进行分解,提取源图像的高频子图;再次,将普通卷积更换为深度可分离卷积并且引入卷积注意力模块(CBAM)自适应细化特征,从而提高网络模型的分割效果。实验结果表明,改进后的模型在VOC数据集上均交并比(mean intersection over union, MIoU)比原始的DeepLabv3+模型提高0.94%,平均像素精度(mean pixel accuracy, MPA)比原始DeepLabv3+模型提高了1.34%,准确度比原始DeepLabv3+模型提高0.19%。在BDD100K数据集上均交并比比原始的DeepLabv3+模型提高0.53%,平均像素精度比原始DeepLabv3+模型提高了0.15%,准确率比原始DeepLabv3+模型提高0.13%。在主观和客观结果上均显示我们的模型优于原模型。
Abstract: In order to improve the loss of details in the traditional semantic segmentation model, which leads to the decline of information, we propose an improved DeepLabv3+ network segmentation model. Firstly, replace the backbone network with the MobileNetV2 network. Secondly, the source image is decomposed by constructing a four-channel non-separable wavelet low-pass filter, and the high-frequency subimage of the source image is extracted. Thirdly, the common convolution is replaced by deep separable convolution and the adaptive refinement feature of convolutional attention module (CBAM) is introduced to improve the segmentation effect of the network model. The experimental results show that on the VOC data set, the mean intersection over union (MIoU) of the improved model is 0.94% higher than that of the original DeepLabv3+ model, the mean pixel accuracy (MPA) is 1.34% higher than the original DeepLabv3+ model, and the accuracy is 0.19% higher than the original DeepLabv3+ model. On the BDD100K data set, mean intersection over union is 0.53% higher than the original DeepLabv3+ model. The DeepLabv3+ mean pixel accuracy is 0.15% higher than the original DeepLabv3+ model, and the accuracy is 0.13% higher than the original DeepLabv3+ model. Both subjective and objective results show that our model is better than the original model.
文章引用:刘斌, 潘蜜. 基于四通道不可分加性小波与DeepLabv3+结合的语义分割模型[J]. 图像与信号处理, 2023, 12(3): 279-289. https://doi.org/10.12677/JISP.2023.123028

参考文献

[1] Ess, A., Müller, T., Grabner, H., et al. (2009) Segmentation-Based Urban Traffic Scene Understanding. British Machine Vision Conference, BMVC 2009, London, 7-10 September 2009, 2.
[Google Scholar] [CrossRef
[2] Geiger, A., Lenz, P. and Urtasun, R. (2012) Are We Ready for Autonomous Driving? The KITTI Vision Benchmark Suite. 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, 16-21 June 2012, 3354-3361.
[Google Scholar] [CrossRef
[3] Cordts, M., Omran, M., Ramos, S., et al. (2016) The Cityscapes Dataset for Semantic Urban Scene Understanding. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 3213-3223.
[Google Scholar] [CrossRef
[4] Oberweger, M., Wohlhart, P. and Lepetit, V. (2015) Hands Deep in Deep Learning for Hand Pose Estimation.
[5] Yoon, Y., Jeon, H.G., Yoo, D., et al. (2015) Learning a Deep Convolutional Network for Light-Field Image Super-Resolution. Proceedings of the IEEE International Conference on Computer Vision Workshops, Santiago, 7-13 December 2015, 24-32.
[Google Scholar] [CrossRef
[6] Wan, J., Wang, D., Hoi, S.C.H., et al. (2014) Deep Learning for Content-Based Image Retrieval: A Comprehensive Study. Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, 3-7 November 2014, 157-166.
[Google Scholar] [CrossRef
[7] 邱艺东. 基于图像分割的增强现实导航方法研究[D]: [硕士学位论文]. 福州: 福州大学, 2020.
[8] Rosenfeld, A. (1981) The Max Roberts Operator Is a Hueckel-Type Edge Detector. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-3, 101-103.
[Google Scholar] [CrossRef
[9] Lang, Y. and Zheng, D. (2016) An Improved Sobel Edge Detection Operator. 2016 6th International Conference on Mechatronics, Computer and Education Informationization (MCEI 2016), Shenyang, 11-13 November 2016, 590-593.
[Google Scholar] [CrossRef
[10] Yang, L., Wu, X., Zhao, D., et al. (2011) An Improved Prewitt Algorithm for Edge Detection Based on Noised Image. 2011 4th International Congress on Image and Signal Processing, Vol. 3, 1197-1200.
[Google Scholar] [CrossRef
[11] Zhang, Y.J. (2006) An Overview of Image and Video Segmentation in the Last 40 Years. In: Zhang, Y.-J., Ed., Advances in Image and Video Segmentation, IGI Global, Hershey, 1-16.
[Google Scholar] [CrossRef
[12] Pham, D.L., Xu, C. and Prince, J.L. (2000) A Survey of Current Methods in Medical Image Segmentation. Annual Review of Biomedical Engineering, 2, 315-337.
[Google Scholar] [CrossRef] [PubMed]
[13] Tremeau, A. and Borel, N. (1997) A Region Growing and Merging Algorithm to Color Segmentation. Pattern Recognition, 30, 1191-1203.
[Google Scholar] [CrossRef
[14] Cheng, Y. (1995) Mean Shift, Mode Seeking, and Clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17, 790-799.
[Google Scholar] [CrossRef
[15] Fukunaga, K. and Hostetler, L. (1975) The Estimation of the Gradient of a Density Function, with Applications in Pattern Recognition. IEEE Transactions on Information Theory, 21, 32-40.
[Google Scholar] [CrossRef
[16] Sheikh, Y.A., Khan, E.A. and Kanade, T. (2007) Mode-Seeking by Medoidshifts. 2007 IEEE 11th International Conference on Computer Vision, Rio de Janeiro, 14-21 October 2007, 1-8.
[Google Scholar] [CrossRef
[17] Boykov, Y.Y. and Jolly, M.P. (2001) Interactive Graph Cuts for Optimal Boundary & Region Segmentation of Objects in ND Images. Proceedings 8th IEEE International Conference on Computer Vision, ICCV 2001, Vol. 1, 105-112.
[18] Rother, C., Kolmogorov, V. and Blake, A. (2004) “GrabCut” Interactive Foreground Extraction Using Iterated Graph Cuts. ACM Transactions on Graphics (TOG), 23, 309-314.
[Google Scholar] [CrossRef
[19] Tang, M., Gorelick, L., Veksler, O., et al. (2013) Grabcut in One Cut. Proceedings of the IEEE International Conference on Computer Vision, Sydney, 1-8 December 2013, 1769-1776.
[Google Scholar] [CrossRef
[20] Long, J., Shelhamer, E. and Darrell, T. (2015) Fully Convolutional Networks for Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 7-12 June 2015, 3431-3440.
[Google Scholar] [CrossRef
[21] Badrinarayanan, V., Kendall, A. and Cipolla, R. (2017) Segnet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 2481-2495.
[Google Scholar] [CrossRef
[22] Chen, L.C., Papandreou, G., Schroff, F., et al. (2017) Rethinking Atrous Convolution for Semantic Image Segmentation.
[23] Zhao, H., Shi, J., Qi, X., et al. (2017) Pyramid Scene Parsing Network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 2881-2890.
[Google Scholar] [CrossRef
[24] Chen, L.C., Zhu, Y., Papandreou, G., et al. (2018) Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, 8-14 September 2018, 801-818.
[Google Scholar] [CrossRef
[25] 徐长友, 樊绍胜, 朱航. 采用通道域注意力机制DeepLabv3+算法的遥感影像语义分割[J]. 控制工程, 2023, 30(2): 368-375.
[26] 郑斌军, 孔玲君. 基于DeepLabv3+的图像语义分割优化方法[J]. 包装工程, 2022, 43(1): 187-194.
[27] 马朝永, 马兴杰, 胥永刚. 基于DeepLabv3+网络的滚动轴承故障特征识别[J]. 轴承, 2023(2): 74-81.
[28] 张鑫禄, 张崇涛, 戴晨光, 季虹良, 王映雪. 基于DeepLabv3架构的高分辨率遥感图像分类[J]. 海洋测绘, 2019, 39(2): 40-44.
[29] Sandler, M., Howard, A., Zhu, M., et al. (2018) Mobilenetv2: Inverted Residuals and Linear Bottlenecks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 4510-4520.
[Google Scholar] [CrossRef
[30] 刘斌, 彭嘉雄. 基于四通道不可分加性小波的多光谱图像融合[J]. 计算机学报, 2009, 32(2): 350-356.
[31] Liu, B. and Peng, J.X. (2008) Multi-Spectral Image Fusion Method Based on Two Channels Non-Separable Wavelets. Science in China Series F: Information Sciences, 51, 2022-2032.
[Google Scholar] [CrossRef
[32] Chen, Q., Micchelli, C.A., Peng, S., et al. (2003) Multivariate Filter Banks Having Matrix Factorizations. SIAM Journal on Matrix Analysis and Applications, 25, 517-531.
[Google Scholar] [CrossRef
[33] Woo, S., Park, J., Lee, J.Y., et al. (2018) Cbam: Convolutional Block Attention Module. Proceedings of the European Conference on Computer Vision (ECCV), Munich, 8-14 September 2018, 3-19.
[Google Scholar] [CrossRef
[34] Hu, J., Shen, L. and Sun, G. (2018) Squeeze-and-Excitation Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 7132-7141.
[Google Scholar] [CrossRef
[35] Lin, M., Chen, Q. and Yan, S. (2013) Network in Network.
[36] 刘斌, 熊静雯. 不可分小波构造及其在图像水印中的应用[J]. 现代电子技术, 2020, 43(11): 87-91+96.