TGU-Net:基于纹理与语义感知增强的3D肝脏及其肿瘤分割
TGU-Net: 3D Liver and Tumor Segmentation Based on Texture and Semantic Perception Enhancement
摘要: 肝肿瘤的早期诊断对提高患者生存率至关重要,而精准的肝肿瘤分割在诊疗过程中具有关键作用。然而,传统的分割方法依赖于医生的手动操作,既耗时耗力,也容易受到医生主观经验的影响。近年来,卷积神经网络和Transformer等技术在肝肿瘤分割上取得了一定进展,但仍面临特征提取不足和收敛速度慢等挑战。具体而言,现有方法通常过于关注肿瘤整体形状、位置等全局信息,而忽视了肿瘤边缘模糊、内部结构复杂等局部细节,这些细节对提高分割精度至关重要。同时,尽管Transformer在捕捉长距离依赖和全局上下文方面具有优势,但未能有效结合肝肿瘤的结构特征,影响了模型的分割效果和效率。为解决这些问题,本文基于3D-UNet提出改进的TGU-Net。首先在跳跃连接中加入了纹理增强模块(Texture Enhancement Module),通过多分支、多尺度3D卷积核选择机制,更好地提取局部特征并捕捉边缘的细微梯度变化,从而提高模型对边缘细节的敏感度和分割精度。其次,在3D-UNet的瓶颈层引入了3D Cross-Shaped Transformer模块(Cross-Shaped Transformer),结合3D Transformer的建模能力与Cross-Shaped自注意力机制,使模型更精准地聚焦于肿瘤区域的语义信息,提高对肿瘤复杂形态的理解能力。为进一步提高模型的训练效率,本文在该模块之前加入3D深度可分离卷积的先验层(Local Encoding Module),通过分离空间和通道的卷积操作,提升了特征提取的效率并加快训练速度。在LiTS2017数据集上的实验验证表明,TGU-Net的IOU和Dice指标分别提升了3.89和2.57个百分点,相较于多种SOTA算法表现优异,证明了其在肝肿瘤分割任务中的优越性。
Abstract: Early diagnosis of liver tumors is critical for improving patient survival rates, and precise liver tumor segmentation plays a key role in treatment planning. However, traditional segmentation methods rely on manual operations by clinicians, which are time-consuming, labor-intensive, and often influenced by subjective experience. Recently, technologies like convolutional neural networks (CNNs) and Transformers have achieved progress in liver tumor segmentation, yet challenges remain in feature extraction and model convergence speed. Specifically, existing methods often overemphasize global features, such as the overall shape, location, and size of the tumor, while overlooking local details, including blurred tumor edges and complex internal structures, which are essential for improving segmentation accuracy. Although Transformers excel at capturing long-range dependencies and global context, they have yet to effectively incorporate the structural characteristics of liver tumors, impacting segmentation performance and model efficiency. To address these issues, this paper proposes an enhanced TGU-Net model based on the 3D-UNet architecture. First, a Texture Enhancement Module is introduced into the skip connections, employing a multi-branch, multi-scale 3D convolutional kernel selection mechanism. This module better captures local features and fine gradient changes around tumor edges, thereby enhancing the model’s sensitivity to edge details and improving segmentation accuracy. Next, a 3D Cross-Shaped Transformer module is incorporated in the bottleneck layer of 3D-UNet. By combining the 3D Transformer’s modeling capability with Cross-Shaped self-attention, the model achieves more precise focus on the semantic information of tumor regions, enhancing its ability to understand complex tumor morphologies. To further improve training efficiency, a Local Encoding Module using 3D depthwise separable convolutions is added before this module, separating spatial and channel convolutions to accelerate training and improve feature extraction efficiency. Experimental validation on the LiTS2017 dataset demonstrates that TGU-Net improves IOU and Dice scores by 3.89 and 2.57 percentage points, respectively, outperforming multiple state-of-the-art algorithms and underscoring its superiority in liver tumor segmentation tasks.
文章引用:朱峰, 吴俊, 张航. TGU-Net:基于纹理与语义感知增强的3D肝脏及其肿瘤分割[J]. 计算机科学与应用, 2024, 14(12): 97-110. https://doi.org/10.12677/csa.2024.1412244

参考文献

[1] 丛文铭. 肝脏肿瘤临床病理学研究的回顾与展望[J]. 第二军医大学学报, 2002, 23(5): 468-470.
[2] 杨柳. 临床CT图像中肝脏肿瘤分割研究[D]: [硕士学位论文]. 重庆: 重庆大学, 2013.
[3] Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Lecture Notes in Computer Science, Springer, 234-241. [Google Scholar] [CrossRef
[4] Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N. and Liang, J. (2018) Unet++: A Nested U-Net Architecture for Medical Image Segmentation. In: Lecture Notes in Computer Science, Springer, 3-11. [Google Scholar] [CrossRef] [PubMed]
[5] Diakogiannis, F.I., Waldner, F., Caccetta, P. and Wu, C. (2020) Resunet-A: A Deep Learning Framework for Semantic Segmentation of Remotely Sensed Data. ISPRS Journal of Photogrammetry and Remote Sensing, 162, 94-114. [Google Scholar] [CrossRef
[6] Jha, D., Smedsrud, P.H., Riegler, M.A., Johansen, D., Lange, T.D., Halvorsen, P., et al. (2019) Resunet++: An Advanced Architecture for Medical Image Segmentation. 2019 IEEE International Symposium on Multimedia, San Diego, 9-11 December 2019, 225-2255. [Google Scholar] [CrossRef
[7] Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T. and Ronneberger, O. (2016) 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation. In: Lecture Notes in Computer Science, Springer, 424-432. [Google Scholar] [CrossRef
[8] Isensee, F., Jaeger, P.F., Kohl, S.A.A., Petersen, J. and Maier-Hein, K.H. (2020) NNU-Net: A Self-Configuring Method for Deep Learning-Based Biomedical Image Segmentation. Nature Methods, 18, 203-211. [Google Scholar] [CrossRef] [PubMed]
[9] Chen, J., Lu, Y., Yu, Q., et al. (2021) Transunet: Transformers Make Strong Encoders for Medical Image Segmentation.
[10] Zhang, Y., Liu, H. and Hu, Q. (2021) Transfuse: Fusing Transformers and CNNs for Medical Image Segmentation. In: Lecture Notes in Computer Science, Springer, 14-24. [Google Scholar] [CrossRef
[11] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., et al. (2023) Swin-U-Net: U-Net-Like Pure Transformer for Medical Image Segmentation. In: Lecture Notes in Computer Science, Springer, 205-218. [Google Scholar] [CrossRef
[12] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., et al. (2021) Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. 2021 IEEE/CVF International Conference on Computer Vision, Montreal, 10-17 October 2021, 9992-10002. [Google Scholar] [CrossRef
[13] Hatamizadeh, A., Nath, V., Tang, Y., Yang, D., Roth, H.R. and Xu, D. (2022) Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images. In: Lecture Notes in Computer Science, Springer, 272-284. [Google Scholar] [CrossRef
[14] Hatamizadeh, A., Tang, Y., Nath, V., Yang, D., Myronenko, A., Landman, B., et al. (2022) UNETR: Transformers for 3D Medical Image Segmentation. 2022 IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, 3-8 January 2022, 1748-1758. [Google Scholar] [CrossRef
[15] Di, S., Zhao, Y., Liao, M., Zhang, F. and Li, X. (2023) TD-Net: A Hybrid End-to-End Network for Automatic Liver Tumor Segmentation from CT Images. IEEE Journal of Biomedical and Health Informatics, 27, 1163-1172. [Google Scholar] [CrossRef] [PubMed]
[16] Yang, Z. and Li, S. (2023) Dual-Path Network for Liver and Tumor Segmentation in CT Images Using Swin Transformer Encoding Approach. Current Medical Imaging, 19, 1114-1123. [Google Scholar] [CrossRef] [PubMed]
[17] Li, R., Xu, L., Xie, K., Song, J., Ma, X., Chang, L., et al. (2023) DHT-Net: Dynamic Hierarchical Transformer Network for Liver and Tumor Segmentation. IEEE Journal of Biomedical and Health Informatics, 27, 3443-3454. [Google Scholar] [CrossRef] [PubMed]
[18] Li, X., Wang, W., Hu, X. and Yang, J. (2019) Selective Kernel Networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 15-20 June 2019, 510-519. [Google Scholar] [CrossRef
[19] Dosovitskiy, A. (2020) An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale.
[20] Kaiser, L., Gomez, A.N. and Chollet, F. (2017) Depth Wise Separable Convolutions for Neural Machine Translation.
[21] Bilic, P., Christ, P., Li, H.B., et al. (2023) The Liver Tumor Segmentation Benchmark (Lits). Medical Image Analysis, 84, Article 102680.
[22] Taha, A.A. and Hanbury, A. (2015) Metrics for Evaluating 3D Medical Image Segmentation: Analysis, Selection, and Tool. BMC Medical Imaging, 15, 1-28. [Google Scholar] [CrossRef] [PubMed]
[23] Zhou, H.Y., Guo, J., Zhang, Y., et al. (2021) NN Former: Interleaved Transformer for Volumetric Segmentation. arXiv: 2109.03201.
[24] Zhang, C., Ai, D., Feng, C., Fan, J., Song, H. and Yang, J. (2020) Dial/Hybrid Cascade 3dresunet for Liver and Tumor Segmentation. Proceedings of the 2020 4th International Conference on Digital Signal Processing, New York, 19-21 June 2020, 92-96. [Google Scholar] [CrossRef
[25] Chowdary, G.J. and Yin, Z. (2024) Med-Former: A Transformer Based Architecture for Medical Image Classification. In: Lecture Notes in Computer Science, Springer, 448-457. [Google Scholar] [CrossRef