PR U-Net:工业缺陷分割中的特征融合和渐进监督策略
PR U-Net: Feature Fusion and Progressive Supervision Strategy in Industrial Defect Segmentation
摘要: 在工业缺陷检测任务中,由于缺陷与周围环境的相似性,准确识别图像中的缺陷区域是一项重要的挑战。本文提出了一种名为渐进细化(PR) U-Net的新型语义分割模型,该模型基于Swin U-Net架构,通过集成PR解码块和焦点模块(FM)来进行改进。PR解码块使模型能够逐步细化特征,并在每个解码阶段引导监督,而FM通过融合分割预测和特征融合机制,有助于更好地区分缺陷和非缺陷区域。大量实验结果表明,本文的PR U-Net在区分背景区域中的细微缺陷方面具有更高的准确性和鲁棒性,同时在推理时间方面也具有高效性,使其适用于各种场景下的工业缺陷检测任务。
Abstract: The paper presents a novel semantic segmentation model named Progressive Refining (PR) U-Net, designed to enhance industrial defect detection. The model builds upon the Swin U-Net architecture, incorporating PR Decoder Blocks and Focus Modules (FMs) to improve feature refinement and segmentation accuracy. The PR Decoder Blocks allow for progressive feature refinement, while FMs aid in distinguishing defects from non-defects by integrating segmentation predictions and feature fusion mechanisms. Extensive experiments demonstrate that PR U-Net outperforms existing models in accuracy, robustness, and efficiency in industrial defect detection tasks.
文章引用:王新宇. PR U-Net:工业缺陷分割中的特征融合和渐进监督策略[J]. 建模与仿真, 2024, 13(4): 4893-4903. https://doi.org/10.12677/mos.2024.134442

参考文献

[1] Liu, Y., Wang, X., Zhang, Z. and Deng, F. (2023) Deep Learning in Image Segmentation for Mineral Production: A Review. Computers & Geosciences, 18, Article 105455. [Google Scholar] [CrossRef
[2] Hesamian, M.H., Jia, W., He, X. and Kennedy, P. (2019) Deep Learning Techniques for Medical Image Segmentation: Achievements and Challenges. Journal of Digital Imaging, 32, 582-596. [Google Scholar] [CrossRef] [PubMed]
[3] Gruosso, M., Capece, N. and Erra, U. (2021) Human Segmentation in Surveillance Video with Deep Learning. Multimedia Tools and Applications, 80, 1175-1199. [Google Scholar] [CrossRef
[4] Colleoni, E. and Stoyanov, D. (2021) Robotic Instrument Segmentation with Image-to-Image Translation. IEEE Robotics and Automation Letters, 6, 935-942. [Google Scholar] [CrossRef
[5] Long, J., Shelhamer, E. and Darrell, T. (2015) Fully Convolutional Networks for Semantic Segmentation. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 7-12 June 2015, 3431-3440. [Google Scholar] [CrossRef
[6] Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B. and Belongie, S. (2017) Feature Pyramid Networks for Object Detection. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 2117-2125. [Google Scholar] [CrossRef
[7] Liu, W., Rabinovich, A. and Berg, A.C. (2015) ParseNet: Looking Wider to See Better. arXiv: 1506.04579. [Google Scholar] [CrossRef
[8] Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K. and Yuille, A.L. (2014) Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. arXiv: 1412.7062. [Google Scholar] [CrossRef
[9] Schwing, A.G. and Urtasun, R. (2015) Fully Connected Deep Structured Networks. arXiv: 1503.02351. [Google Scholar] [CrossRef
[10] Lin, G., Shen, C., Van Den Hengel, A. and Reid, I. (2016) Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 3194-3203. [Google Scholar] [CrossRef
[11] Liu, Z., Li, X., Luo, P., Loy, C.C. and Tang, X. (2015) Semantic Image Segmentation via Deep Parsing Network. 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 7-13 December 2015, 1377-1385. [Google Scholar] [CrossRef
[12] Badrinarayanan, V., Kendall, A. and Cipolla, R. (2017) SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 2481-2495. [Google Scholar] [CrossRef
[13] Li, X., Chen, H., Qi, X., Dou, Q., Fu, C. and Heng, P. (2018) H-DenseUNet: Hybrid Densely Connected Unet for Liver and Tumor Segmentation from CT Volumes. IEEE Transactions on Medical Imaging, 37, 2663-2674. [Google Scholar] [CrossRef] [PubMed]
[14] Weng, Y., Zhou, T., Li, Y. and Qiu, X. (2019) NAS-Unet: Neural Architecture Search for Medical Image Segmentation. IEEE Access, 7, 44247-44257. [Google Scholar] [CrossRef
[15] Jiang, J., Zhu, J., Bilal, M., Cui, Y., Kumar, N., Dou, R., et al. (2023) Masked Swin Transformer Unet for Industrial Anomaly Detection. IEEE Transactions on Industrial Informatics, 19, 2200-2209. [Google Scholar] [CrossRef
[16] McMillan, M., Haber, E., Peters, B. and Fohring, J. (2021) Mineral Prospectivity Mapping Using a VNet Convolutional Neural Network. The Leading Edge, 40, 99-105. [Google Scholar] [CrossRef
[17] Cheng, L., Yi, J., Chen, A. and Zhang, Y. (2023) Fabric Defect Detection Based on Separate Convolutional UNet. Multimedia Tools and Applications, 82, 3101-3122. [Google Scholar] [CrossRef
[18] Chen, L.C., Yang, Y., Wang, J., Xu, W. and Yuille, A.L. (2016) Attention to Scale: Scale-Aware Semantic Image Segmentation. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 3640-3649. [Google Scholar] [CrossRef
[19] Huang, Q., Xia, C., Wu, C., Li, S., Wang, Y., Song, Y. and Kuo, C.C.J. (2017) Semantic Segmentation with Reverse Attention. arXiv: 1707.06426. [Google Scholar] [CrossRef
[20] Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z. and Lu, H. (2019) Dual Attention Network for Scene Segmentation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 3146-3154. [Google Scholar] [CrossRef
[21] Yuan, Y., Huang, L., Guo, J., Zhang, C., Chen, X. and Wang, J. (2018) OCNet: Object Context Network for Scene Parsing. arXiv:1809.00916. [Google Scholar] [CrossRef
[22] Li, X., Zhong, Z., Wu, J., Yang, Y., Lin, Z. and Liu, H. (2019) Expectation-Maximization Attention Networks for Semantic Segmentation. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 9167-9176. [Google Scholar] [CrossRef
[23] Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y. and Liu, W. (2019) CCNet: Criss-Cross Attention for Semantic Segmentation. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 603-612. [Google Scholar] [CrossRef
[24] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S. and Guo, B. (2021) Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 9992-10002. [Google Scholar] [CrossRef
[25] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., et al. (2023) Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation. Computer Vision-ECCV 2022 Workshops, Tel Aviv, 23-27 October 2022, 205-218. [Google Scholar] [CrossRef
[26] Muhammad, K., Hussain, T., Del Ser, J., Palade, V. and de Albuquerque, V.H.C. (2020) DeepReS: A Deep Learning-Based Video Summarization Strategy for Resource-Constrained Industrial Surveillance Scenarios. IEEE Transactions on Industrial Informatics, 16, 5938-5947. [Google Scholar] [CrossRef
[27] Tan, S.H., Chuah, J.H., Chow, C. and Kanesan, J. (2023) Coarse-to-Fine Context Aggregation Network for Vehicle Make and Model Recognition. IEEE Access, 11, 126733-126747. [Google Scholar] [CrossRef
[28] Liu, Y., Zhang, Z., Liu, X., Wang, L. and Xia, X. (2021) Efficient Image Segmentation Based on Deep Learning for Mineral Image Classification. Advanced Powder Technology, 32, 3885-3903. [Google Scholar] [CrossRef
[29] Huang, Y., Jing, J. and Wang, Z. (2021) Fabric Defect Segmentation Method Based on Deep Learning. IEEE Transactions on Instrumentation and Measurement, 70, 1-15. [Google Scholar] [CrossRef
[30] Shi, J., Li, Z., Zhu, T., Wang, D. and Ni, C. (2020) Defect Detection of Industry Wood Veneer Based on NAS and Multi-Channel Mask R-CNN. Sensors, 20, Article 4398. [Google Scholar] [CrossRef] [PubMed]
[31] Song, Y., Xia, W., Li, Y., Li, H., Yuan, M. and Zhang, Q. (2024) AnomalySeg: Deep Learning-Based Fast Anomaly Segmentation Approach for Surface Defect Detection. Electronics, 13, Article 284. [Google Scholar] [CrossRef