MSRGA-Net:用于3D医学图像分割的多尺度混合网络
MSRGA-Net: A Multi-Scale Hybrid Network for 3D Medical Image Segmentation
DOI: 10.12677/csa.2025.1512344, PDF,   
作者: 杨 菲, 陈雪萍:新疆理工职业大学人工智能学院,新疆 喀什
关键词: 医学图像分割MLP注意力机制多轴特征混合Medical Image Segmentation MLP Attention Mechanism Multi-Axial Feature Mixing
摘要: 准确的自动化医学图像分割在自动化诊断和医疗中扮演着重要角色。尽管深度学习方法已经被广泛应用于医学图像分割并获得了成功,但卷积神经网络因其局部卷积特性难以有效捕获全局信息,对不同尺寸大小的组织器官缺乏多尺度特征融合;而基于Transformer的先进方法在捕获全局特征信息方面表现良好,但忽略了像素级空间细节,导致边界定位不够清晰。为了应对这些挑战,本文提出了MSRGA-Net,一个创新的分割框架,能够有效融合低级细节信息、长距离依赖关系和跨尺度信息特征。该网络设计中包含四个核心模块:MSConv,通过多尺度卷积提取细粒度特征;BGAFM,一种新颖的注意力融合模块,可动态赋权特征贡献,以线性复杂度感受全局空间信息;RHAM,用于优化多角度空间与语义信息,同时保留深度方向上的重要信息。此外,提出的MSRGB策略通过选择性地融合多尺度特征,增强了空间一致性,减少特征转换过程中的信息丢失。来自Synapse和医疗分割十项全能(MSD)脑肿瘤分割(BraTS)数据集的广泛实验表明,MSRGA-Net在视觉定量分析和客观评估方面优于最先进的方法。
Abstract: Accurate automated medical image segmentation is essential for computer-aided diagnosis and clinical decision-making, yet achieving both precise boundary localization and robust global understanding remains challenging. Although deep learning methods have achieved strong progress, convolutional neural networks still struggle to capture global context because their receptive fields are inherently local, and they often lack effective multi-scale feature fusion for organs and tissues with large variations in shape and size. Transformer-based approaches can model long-range dependencies more effectively, but they usually overlook pixel-level spatial details, which results in blurred boundaries and incomplete structural representations. To overcome these limitations, we propose MSRGA-Net, an efficient segmentation framework designed to integrate low-level spatial details, long-range contextual relations, and cross-scale features in a balanced manner. The network introduces four core components. MSConv extracts fine-grained representations using multi-scale convolution kernels. BGAFM, a block-grid attention fusion module, dynamically adjusts feature contributions and captures global spatial information with linear computational complexity. RHAM further enhances spatial and semantic cues from multiple perspectives and preserves crucial depth-wise information. In addition, the MSRGB strategy selectively aggregates multi-scale features to improve spatial coherence and reduce information loss during feature transitions. These modules work together to maintain detailed boundaries while ensuring strong global understanding. Extensive experiments on the Synapse dataset and the Medical Segmentation Decathlon (MSD) BraTS dataset demonstrate that MSRGA-Net outperforms state-of-the-art methods in both quantitative visual analysis and objective evaluation.
文章引用:杨菲, 陈雪萍. MSRGA-Net:用于3D医学图像分割的多尺度混合网络[J]. 计算机科学与应用, 2025, 15(12): 288-301. https://doi.org/10.12677/csa.2025.1512344

参考文献

[1] Qureshi, I., Yan, J.H., Abbas, Q., et al. (2023) Medical Image Segmentation Using Deep Semantic-Based Methods: A Review of Techniques, Applications and Emerging Trends. Information Fusion, 90, 316-352. [Google Scholar] [CrossRef
[2] 石军, 王天同, 朱子琦, 等. 基于深度学习的医学图像分割方法综述[J]. 中国图象图形学报, 2025, 30(6): 2161-2186.
[3] Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Lecture Notes in Computer Science, Springer, 234-241. [Google Scholar] [CrossRef
[4] Xiao, X., Lian, S., Luo, Z.M. and Li, S. (2018). Weighted Res-UNet for High-Quality Retina Vessel Segmentation. 2018 9th International Conference on Information Technology in Medicine and Education (ITME), Hangzhou, 19-21 October 2018, 327-331.[CrossRef
[5] Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N. and Liang, J. (2018) UNet++: A Nested U-Net Architecture for Medical Image Segmentation. In: Lecture Notes in Computer Science, Springer, 3-11. _1 [Google Scholar] [CrossRef
[6] Oktay, O., Schlemper, J., Folgoc, L.L., et al. (2018) Attention U-Net: Learning Where to Look for the Pancreas. Medical Imaging with Deep Learning (MIDL), 1-10.
[7] Huang, H.M., Lin, L.F., Tong, R.F., et al. (2020) UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation. 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, 4-8 May 2020, 1055-1059. [Google Scholar] [CrossRef
[8] Milletari, F., Navab, N. and Ahmadi, S. (2016). V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. 2016 4th International Conference on 3D Vision (3DV), Stanford, 25-28 October 2016, 565-571.[CrossRef
[9] Azad, R., Asadi-Aghbolaghi, M., Fathy, M. and Escalera, S. (2019) Bi-Directional ConvLSTM U-Net with Densley Connected Convolutions. 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, 27-28 October 2019, 406-415. [Google Scholar] [CrossRef
[10] Isensee, F., Petersen, J., Klein, A., et al. (2018) NNU-Net: Self-Adapting Framework for U-Net-Based Medical Image Segmentation. Nature Methods, 18, 203-211.
[11] Vaswani, A., Shazeer, N., Parmar, N., et al. (2017) Attention Is All You Need. Curran Associates Inc.
[12] Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2020) An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. International Conference on Learning Representations (ICLR), 1-22.
[13] Liu, Z., Lin, Y.T., Cao, Y., et al. (2021) Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 9992-10002. [Google Scholar] [CrossRef
[14] Chen, J.N., Lu, Y.Y., Yu, Q.H., et al. (2021) TransuNet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv preprint arXiv:2102.04306.
[15] Zhang, Y.D., Liu, H.Y., Hu, Q., Wang, W., et al. (2021) Transbts: Multimodal Brain Tumor Segmentation Using Transformer. In: Lecture Notes in Computer Science, Springer, 109-119. [Google Scholar] [CrossRef
[16] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., et al. (2023) Swin-UNet: Unet-Like Pure Transformer for Medical Image Segmentation. In: Lecture Notes in Computer Science, Springer, 205-218. [Google Scholar] [CrossRef
[17] Wang, H., Xie, S., Lin, L., Iwamoto, Y., Han, X., Chen, Y., et al. (2022). Mixed Transformer U-Net for Medical Image Segmentation. 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 23-27 May 2022, 2390-2394.[CrossRef
[18] Lee, H.H., Bao, S., Huo, Y., et al. (2022) 3d UX-Net: A Large Kernel Volumetric Convnet Modernizing Hierarchical Transformer for Medical Image Segmentation. International Conference on Learning Representations (ICLR), 1-15.
https://iclr.cc/virtual/2023/poster/11340
[19] Hatamizadeh, A., Tang, Y., Nath, V., Yang, D., Myronenko, A., Landman, B., et al. (2022) UNETR: Transformers for 3D Medical Image Segmentation. 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, 3-8 January 2022, 1748-1758. [Google Scholar] [CrossRef
[20] Shaker, A., Maaz, M., Rasheed, H., Khan, S., Yang, M. and Shahbaz Khan, F. (2024) UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation. IEEE Transactions on Medical Imaging, 43, 3377-3390. [Google Scholar] [CrossRef] [PubMed]
[21] Hatamizadeh, A., Nath, V., Tang, Y., Yang, D., Roth, H.R. and Xu, D. (2021) Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images. In: Lecture Notes in Computer Science, Springer, 272-284. [Google Scholar] [CrossRef
[22] Tolstikhin, I.O., Houlsby, N., Kolesnikov, A., et al. (2021) MLP-Mixer: An ALL-MLP Architecture for Vision. Advances in Neural Information Processing Systems, 34, 24261-24272.
[23] Liu, H.X., Dai, Z.H., So, D., et al. (2021) Pay Attention to MLPs. Advances in Neural Information Processing Systems, 34, 9204-9215.
[24] Lian, D.Z., Yu, Z.H., Sun, X., et al. (2022) AS-MLP: An Axial Shifted MLP Architecture for Vision. International Conference on Learning Representations (ICLR), 1-19.
[25] Touvron, H., Bojanowski, P., Caron, M., Cord, M., El-Nouby, A., Grave, E., et al. (2022) Resmlp: Feedforward Networks for Image Classification with Data-Efficient Training. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45, 5314-5321. [Google Scholar] [CrossRef] [PubMed]
[26] Chen, S.F., Xie, E., Ge, C.J., et al. (2021) Cyclemlp: A MLP-Like Architecture for Dense Prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45, 14284-14300.
[27] Tu, Z.Z., Talebi, H., Zhang, H., et al. (2022) MAXIM: Multi-Axis MLP for Image Processing. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 5769-5780. [Google Scholar] [CrossRef
[28] Hou, Q.B., Jiang, Z.H., Yuan, L., et al. (2022) Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45, 1328-1334. [Google Scholar] [CrossRef] [PubMed]
[29] Yu, T., Li, X., Cai, Y., Sun, M. and Li, P. (2022) S2-MLP: Spatial-Shift MLP Architecture for Vision. 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, 3-8 January 2022, 297-306. [Google Scholar] [CrossRef
[30] Valanarasu, J.M.J. and Patel, V.M. (2022) UNext: MLP-Based Rapid Medical Image Segmentation Network. In: Lecture Notes in Computer Science, Springer, 23-33. [Google Scholar] [CrossRef
[31] Lv, J.K., Hu, Y.Y., Fu, Q.S., et al. (2022) CM-MLP: Cascade Multi-Scale MLP with Axial Context Relation Encoder for Edge Segmentation of Medical Image. 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Las Vegas, 6-8 December 2022, 1100-1107. [Google Scholar] [CrossRef
[32] Ji, C., Deng, Z.H., Ding, Y., et al. (2023) RMMLP: Rolling MLP and Matrix Decomposition for Skin Lesion Segmentation. Biomedical Signal Processing and Control, 84, Article 104825. [Google Scholar] [CrossRef
[33] Shao, Y.Q., Zhou, K.Y. and Zhang, L.C. (2024) CSSNet: Cascaded Spatial Shift Network for Multi-Organ Segmentation. Computers in Biology and Medicine, 170, Article 107955. [Google Scholar] [CrossRef] [PubMed]
[34] Hu, J., Shen, L. and Sun, G. (2018) Squeeze-and-Excitation Networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 7132-7141. [Google Scholar] [CrossRef
[35] Yu, F. and Koltun, V. (2015) Multi-Scale Context Aggregation by Dilated Convolutions. International Conference on Learning Representations (ICLR), 1-13.