基于改进SwinUNet腹部多器官分割算法研究
Research on an Improved SwinUNet Algorithm for Multi-Organ Segmentation in the Abdomen
DOI: 10.12677/csa.2025.1510271, PDF,    科研立项经费支持
作者: 檀文文:安徽理工大学计算机科学与工程学院,安徽 淮南;合肥综合性国家科学中心能源研究院(安徽省能源实验室),安徽 合肥;卢 棚, 姜 韦*:合肥综合性国家科学中心能源研究院(安徽省能源实验室),安徽 合肥
关键词: 医学图像分割注意力机制空洞空间金字塔池化特征融合Medical Image Segmentation Attention Mechanism Atrous Spatial Pyramid Pooling Feature Fusion
摘要: 医学图像分割中常面临全局上下文建模不足与多尺度特征表达有限的问题。本文提出一种改进SwinUNet方法。首先,采用Focal Transformer替换原始Swin Transformer,以分层注意力机制增强局部细节与全局依赖建模;其次,在编码器末端引入空洞空间金字塔池化(ASPP)结构,扩展感受野并提升多尺度特征表达;最后,在跳跃连接中加入Tokenized Interaction Fusion (TIF)模块,实现跨层语义与空间信息的高效融合。在Synapse腹部器官数据集上的实验表明,该方法在平均Dice和Hausdorff距离等指标上均优于基线模型,验证了其在腹部多器官分割中的有效性。
Abstract: In medical image segmentation, challenges often arise from insufficient global context modeling and limited multi-scale feature representation. This paper proposes an enhanced SwinUNet approach to address these issues. First, the original Swin Transformer is replaced with a Focal Transformer to strengthen both local detail capture and global dependency modeling through a hierarchical attention mechanism. Second, an Atrous Spatial Pyramid Pooling (ASPP) module is incorporated at the end of the encoder to expand the receptive field and improve multi-scale feature extraction. Finally, a Tokenized Interaction Fusion (TIF) module is integrated into the skip connections to facilitate efficient cross-layer fusion of semantic and spatial information. Experiments conducted on the Synapse multi-organ abdominal dataset demonstrate that the proposed method outperforms baseline models in key metrics such as average Dice coefficient and Hausdorff distance, confirming its effectiveness for abdominal organ segmentation.
文章引用:檀文文, 卢棚, 姜韦. 基于改进SwinUNet腹部多器官分割算法研究[J]. 计算机科学与应用, 2025, 15(10): 318-326. https://doi.org/10.12677/csa.2025.1510271

参考文献

[1] Lecun, Y., Bottou, L., Bengio, Y. and Haffner, P. (1998) Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, 86, 2278-2324. [Google Scholar] [CrossRef
[2] Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab, N., Hornegger, J., Wells, W. and Frangi, A., Eds., Medical Image Computing and Computer-Assisted InterventionMICCAI 2015., Springer, 234-241. [Google Scholar] [CrossRef
[3] Chen, J., Lu, Y., Yu, Q., et al. (2021) TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv: 2102.04306.
[4] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., et al. (2021) Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 10012-10022. [Google Scholar] [CrossRef
[5] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., et al. (2023) Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation. In: Karlinsky, L., Michaeli, T. and Nishino, K., Eds., Computer VisionECCV 2022 Workshops, Springer, 205-218. [Google Scholar] [CrossRef
[6] Yang, J., Li, C., Zhang, P., et al. (2021) Focal Self-Attention for Local-Global Interactions in Vision Transformers. arXiv: 2107.00641.
[7] Chen, L., Papandreou, G., Kokkinos, I., Murphy, K. and Yuille, A.L. (2018) DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 834-848. [Google Scholar] [CrossRef] [PubMed]
[8] Lin, A., Chen, B., Xu, J., Zhang, Z., Lu, G. and Zhang, D. (2022) Ds-TransUNet: Dual Swin Transformer U-Net for Medical Image Segmentation. IEEE Transactions on Instrumentation and Measurement, 71, 1-15. [Google Scholar] [CrossRef
[9] 康家荣, 邵鹏飞, 王元. 基于Swin-Unet改进的医学图像分割算法[J]. 人工智能与机器人研究, 2024, 13(2): 354-362.
[10] 全杨鹤, 柏正尧. CoT-TransUNet: 轻量化的上下文Transformer医学图像分割网络[J]. 计算机工程与应用, 2023, 59(3): 218-225.
[11] 张文豪, 瞿绍军, 颜美丽. 基于深度学习的视网膜血管分割研究进展[J/OL]. 计算机应用研究: 1-15. 2025-03-24.[CrossRef
[12] 任怡璇, 崔容宇. 人工智能深度学习在单光子计算机断层显像中的研究进展[J]. 新医学, 2024, 55(3): 159-164.
[13] Oktay, O., Schlemper, J., Folgoc, L.L., et al. (2018) Attention U-Net: Learning Where to Look for the Pancreas. arXiv: 1804.03999.
[14] Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2020) An Image Is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv: 2010.11929.
[15] Chen, B., Liu, Y., Zhang, Z., Lu, G. and Kong, A.W.K. (2024) TransattUNet: Multi-Level Attention-Guided U-Net with Transformer for Medical Image Segmentation. IEEE Transactions on Emerging Topics in Computational Intelligence, 8, 55-68. [Google Scholar] [CrossRef