基于CoTr分割网络的3D多器官CT图像分割

doi:10.12677/csa.2024.147165

期刊菜单

基于CoTr分割网络的3D多器官CT图像分割
3D Multi-Organ CT Images Segmentation Based on CoTr Segmentation Network

DOI: 10.12677/csa.2024.147165, PDF,
作者: 赵威：浙江财经大学数据科学学院，浙江杭州
关键词: U-Net；卷积神经网络；分割网络；U-Net； Convolutional Neural Network； Segmentation Network

摘要: 在医学图像分割领域U-Net已经成为了被应用最广泛的医学图像分割模型，许多有关医学图像分割的研究都用U-Net作为基线标准。以U-Net为基础的一系列变体分割模型也相继问世，其中包括CoTr，其为Convolutional neural network and a Transformer的简称。就如其名，CoTr是一个结合了卷积神经网络和Transformer，具有类似U-Net的U形结构的分割网络。CoTr构造卷积层以提取特征表示，并且构造有效的可变形Transformer (DeTrans)以对提取的特征图的长程依赖性进行建模。与平等对待所有关键位置的vanilla Transformer不同，DeTrans通过引入可变形的自注意机制，只关注一小部分关键位置。因此，DeTrans的计算和空间复杂性大大降低，使得处理多尺度和高分辨率特征图成为可能，而这些特征图通常对图像分割至关重要。CoTr模型在多模态腹部分割数据集(Amos数据集)上进行了广泛评估。结果表明，在3D多器官分割任务上，与其他基于CNN、基于Transformer和混合方法相比，CoTr带来了持续的性能改进。

Abstract: U-Net has become the most widely used medical image segmentation model in the field of medical image segmentation, and many studies related to medical image segmentation use U-Net as the baseline standard. A series of variant segmentation models based on U-Net have also emerged, including CoTr, which stands for Convolutional Neural Network and a Transformer. As its name suggests, CoTr is a segmentation network that combines convolutional neural networks and Transformers, with a U-Net like U-shaped structure. CoTr constructs convolutional layers to extract feature representations and constructs effective deformable Transformers (DeTrans) to model the long-range dependencies of the extracted feature maps. Unlike vanilla Transformers that treat all key positions equally, DeTrans introduces a deformable self attention mechanism and only focuses on a small portion of key positions. Therefore, the computational and spatial complexity of DeTrans is greatly reduced, making it possible to process multi-scale and high-resolution feature maps, which are usually crucial for image segmentation. The CoTr model has been extensively evaluated on the multimodal abdominal segmentation dataset (Amos dataset). The results indicate that CoTr brings continuous performance improvement in 3D multi organ segmentation tasks compared to other CNN based, Transformer based, and hybrid methods.

文章引用：赵威. 基于CoTr分割网络的3D多器官CT图像分割[J]. 计算机科学与应用, 2024, 14(7): 78-83. https://doi.org/10.12677/csa.2024.147165

参考文献

[1]	Yu, F. and Koltun, V. (2016) Multi-Scale Context Aggregation by Dilated Convolutions. International Conference on Learning Representations (ICLR), San Juan, 2-4 May 2016, 1-13.
[2]	Peng, C., Zhang, X., Yu, G., Luo, G. and Sun, J. (2017). Large Kernel Matters—Improve Semantic Segmentation by Global Convolutional Network. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 4353-4361.[CrossRef]
[3]	Zhao, H., Shi, J., Qi, X., Wang, X. and Jia, J. (2017). Pyramid Scene Parsing Network. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 2881-2890.[CrossRef]
[4]	Wang, X., Girshick, R., Gupta, A. and He, K. (2018). Non-local Neural Networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 7794-7803.[CrossRef]
[5]	Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A. and Zagoruyko, S. (2020) End-to-End Object Detection with Transformers. Computer Vision—ECCV 2020, Springer International Publishing, Cham, 213-229. [Google Scholar] [CrossRef]
[6]	Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., Lu, L., Yuille, A.L. and Zhou, Y. (2021) Transunet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv preprint arXiv:2102.04306

为你推荐

友情链接