基于Mamba模型的U型医学图像分割网络
U-Shaped Medical Image Segmentation Network Based on the Mamba Model
摘要: 在医学图像分割网络结构中,卷积神经网络(CNN)无疑是最强大的网络,它凭借强大的局部特征提取能力,取得了很多突破性进展,成为该领域的主流方法之一。然而,CNN固有的局部感知机制使其在捕捉图像中长距离依赖信息时存在明显的局限性,这在一定程度上制约了其分割性能。Transformer模型凭借自注意力机制,在处理长距离依赖关系上展现出显著优势,能够有效挖掘图像全局范围内的特征关系。但自注意力机制的计算复杂度随输入序列长度的平方而增加,导致Transformer模型在处理高分辨率医学影像时,往往面临计算成本过高、内存占用量大的问题。为解决该问题,本文提出一种新的分割网络——M-UNet。该网络的创新在于将Mamba模型特有的VSS块结构与空洞空间卷积金字塔池化模块进行结合。利用VSS块的线性复杂度特性,在实现高效全局信息建模的同时,大幅降低计算内存。其次,借助ASPP模块多尺度空洞卷积的优势,精准捕捉医学图像中不同尺度的关键特征,弥补了CNN在多尺度特征提取上的不足。为验证M-UNet的性能,本文在Synapse数据集和ACDC数据集上进行了实验。实验结果表明,相较于传统基于CNN的分割方法,M-UNet在分割精度和计算速度两方面均有显著提升。
Abstract: In the architecture of medical image segmentation networks, Convolutional Neural Networks (CNNs) are undoubtedly the most powerful networks, achieving significant breakthroughs due to their strong local feature extraction capabilities, making them one of the mainstream methods in this field. However, the inherent local perception mechanism of CNNs imposes clear limitations when capturing long-distance dependency information in images, which somewhat restricts their segmentation performance. Conversely, Transformer models, with their self-attention mechanism, exhibit significant advantages in handling long-distance dependencies, enabling effective exploration of the feature relationships across the entire image. However, the computational complexity of the self-attention mechanism increases quadratically with the length of the input sequence, leading to issues of excessively high computational costs and large memory usage when dealing with high-resolution medical images. To address this issue, this paper proposes a new segmentation network—M-UNet. The innovation of this network lies in the combination of the unique VSS block structure of the Mamba model with the dilated spatial convolution pyramid pooling module. By leveraging the linear complexity characteristic of the VSS block, it efficiently models global information while significantly reducing computational memory. Additionally, utilizing the multi-scale dilated convolution advantages of the ASPP module allows for precise capture of key features at different scales within medical images, compensating for CNNs’ shortcomings in multi-scale feature extraction. To validate the performance of M-UNet, experiments were conducted on the Synapse dataset and the ACDC dataset. The experimental results indicate that compared to traditional CNN-based segmentation methods, M-UNet shows significant improvements in both segmentation accuracy and computational speed.
文章引用:勾金伟, 李代民, 刘锋. 基于Mamba模型的U型医学图像分割网络[J]. 计算机科学与应用, 2025, 15(10): 341-350. https://doi.org/10.12677/csa.2025.1510273

参考文献

[1] Sandhya, G., Babu Kande, G. and Satya Savithri, T. (2017) An Efficient Approach for the Detection of White Matter, Gray Matter, and Cerebrospinal Fluid from MR Images of the Brain Using an Advanced Multilevel Thresholding. 2017 Third International Conference on Advances in Electrical, Electronics, Information, Communication and Bio-Informatics (AEEICB), Chennai, 27-28 February 2017, 422-426. [Google Scholar] [CrossRef
[2] Weglinski, T. and Fabijanska, A. (2011) Brain Tumor Segmentation from MRI Data Sets Using Region Growing Approach. Proceedings of the 7th International Conference on Perspective Technologies and Methods in MEMS Design, Polyana, 11-14 May 2011, 185-188.
[3] Kaur, T., Saini, B.S. and Gupta, S. (2018) A Novel Fully Automatic Multilevel Thresholding Technique Based on Optimized Intuitionistic Fuzzy Sets and Tsallis Entropy for MR Brain Tumor Image Segmentation. Australasian Physical and Engineering Sciences in Medicine, 41, 41-58.
[4] Osher, S. and Sethian, J.A. (1988) Fronts Propagating with Curvature-Dependent Speed: Algorithms Based on Hamilton-Jacobi Formulations. Journal of Computational Physics, 79, 12-49. [Google Scholar] [CrossRef
[5] Singh, N. and Choudhary, N. (2017) Automatic Localization and Level Set Based Energy Minimization for MRI Brain Tumor. 2017 International Conference on Computer, Communications and Electronics (Comptelix), Jaipur, 1-2 July 2017, 130-134. [Google Scholar] [CrossRef
[6] Zabir, I., Paul, S., Rayhan, M.A., Sarker, T., Fattah, S.A. and Shahnaz, C. (2015). Automatic Brain Tumor Detection and Segmentation from Multi-Modal MRI Images Based on Region Growing and Level Set Evolution. 2015 IEEE International WIE Conference on Electrical and Computer Engineering (WIECON-ECE), Dhaka, 19-20 December 2015, 503-506.[CrossRef
[7] Beddad, B. and Hachemi, K. (2016) Brain Tumor Detection by Using a Modified FCM and Level Set Algorithms. 2016 4th International Conference on Control Engineering & Information Technology (CEIT), Hammamet, 16-18 December 2016, 1-5. [Google Scholar] [CrossRef
[8] Pohle, R. and Toennies, K.D. (2001) Segmentation of Medical Images Using Adaptive Region Growing. SPIE Proceedings, 4322, 1337-1346. [Google Scholar] [CrossRef
[9] Sukanya, A., Rajeswari, R. and Subramaniam Murugan, K. (2020) Region Based Coronary Artery Segmentation Using Modified Frangi’s Vesselness Measure. International Journal of Imaging Systems and Technology, 30, 716-730. [Google Scholar] [CrossRef
[10] Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab, N., Hornegger, J., Wells, W. and Frangi, A., Eds., Medical Image Computing and Computer-Assisted InterventionMICCAI 2015, Springer, 234-241. [Google Scholar] [CrossRef
[11] Wang, L., Xie, C. and Zeng, N. (2019) RP-Net: A 3D Convolutional Neural Network for Brain Segmentation from Magnetic Resonance Imaging. IEEE Access, 7, 39670-39679. [Google Scholar] [CrossRef
[12] Long, J., Shelhamer, E. and Darrell, T. (2015) Fully Convolutional Networks for Semantic Segmentation. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 7-12 June 2015, 3431-3440. [Google Scholar] [CrossRef
[13] Yang, B. and Zhang, W. (2019) FD-FCN: 3D Fully Dense and Fully Convolutional Network for Semantic Segmentation of Brain Anatomy. arXiv: 1907.09194.
[14] Luo, M., Huang, J. and Yang, F. (2014) Multimodal 3D Convolutional Neural Networks Features for Brain Tumor Segmentation. Science Technology and Engineering, 14, 78-83.
[15] Zhao, Z., Bai, H., Zhang, J., Zhang, Y., Xu, S., Lin, Z., et al. (2023) CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 5906-5915. [Google Scholar] [CrossRef
[16] Milletari, F., Navab, N. and Ahmadi, S. (2016) V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. 2016 Fourth International Conference on 3D Vision (3DV), Stanford, 25-28 October 2016, 565-571. [Google Scholar] [CrossRef
[17] Fu, S., Lu, Y., Wang, Y., Zhou, Y., Shen, W., Fishman, E., et al. (2020) Domain Adaptive Relational Reasoning for 3D Multi-Organ Segmentation. In: Martel, A.L., et al., Eds., Medical Image Computing and Computer Assisted InterventionMICCAI 2020, Springer, 656-666. [Google Scholar] [CrossRef
[18] Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab, N., Hornegger, J., Wells, W. and Frangi, A., Eds., Medical Image Computing and Computer-Assisted InterventionMICCAI 2015, Springer, 234-241. [Google Scholar] [CrossRef
[19] Ibtehaz, N. and Rahman, M.S. (2020) MultiResUNet: Rethinking the U-Net Architecture for Multimodal Biomedical Image Segmentation. Neural Networks, 121, 74-87. [Google Scholar] [CrossRef] [PubMed]
[20] Chen, J., Mei, J., Li, X., Lu, Y., Yu, Q., Wei, Q., et al. (2024) TransUNet: Rethinking the U-Net Architecture Design for Medical Image Segmentation through the Lens of Transformers. Medical Image Analysis, 97, Article ID: 103280. [Google Scholar] [CrossRef] [PubMed]
[21] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., et al. (2023) Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation. In: Karlinsky, L., Michaeli, T. and Nishino, K., Eds., Computer VisionECCV 2022 Workshops, Springer, 205-218. [Google Scholar] [CrossRef