多层级融合多尺度注意力的医学图像分割方法
Multi-Level Progressive Fusion of Multi-Scale Split Attention for Medical Image Segmentation Method
摘要: 在计算机视觉领域,以卷积神经网络为核心构建的深度学习方法已取得突破性进展。以U-Net为代表的编码器–解码器架构革新了生物医学影像分割领域,其独特的跨层连接机制已成功应用于多种临床场景。但该架构在编码器部分采用的结构同质性降采样模块与连续卷积核的简单叠加策略,导致不同网络层级间的多尺度特征表达存在显著局限性。特别是在病灶区域与正常组织呈现低对比度的应用场景中,传统层级式特征提取方法难以满足临床诊断对分割精度的严苛要求,这已成为亟待解决的技术难题。作为医学影像分割中多尺度特征融合的核心机制,注意力模块在捕获异质性病理特征方面具有关键作用。然而,传统方法存在三个固有局限:1) 固定尺度的卷积核难以动态适应病灶尺寸变化;2) 同质化特征聚合导致跨网络层级的计算冗余;3) 跨层特征传播缺乏内容自适应的通道优先级分配。针对医学影像多尺度病理特征建模中存在的浅层语义衰减、跨层级关联弱化及计算冗余等关键瓶颈,文章提出了多层级渐进融合框架MLP-MSA,其创新性体现在三重架构设计,由特征保持模块、融合分裂注意力模块和多尺度渐进注意力模块组成。并在三个医学图像分割数据集上评估了该模型,结果表明,本研究提出的网络架构在DSC和mIoU指标上的得分高于其他SOTA模型,特别是在多类分割任务和复杂图像上表现更好,并且其在算力受限平台上的应用效率相较于其他模型具有明显优势,为后续的模型优化部署工作提供了有力的支持。未来,我们将继续专注于优化本研究提出的网络架构,以完成更具挑战的医学图像分割任务。
Abstract: In the domain of computer vision, deep learning methodologies centered on convolutional neural networks have demonstrated groundbreaking advancements. The encoder-decoder architecture epitomized by U-Net has revolutionized biomedical image segmentation, with its distinctive cross-layer connectivity mechanisms achieving successful implementation across diverse clinical applications. However, the structurally homogeneous down-sampling modules in the encoder pathway, coupled with simplistic sequential convolution stacking strategies, fundamentally compromise multi-scale feature representation across hierarchical network layers. Particularly in scenarios where lesion areas exhibit low-contrast boundaries with surrounding healthy tissues, conventional layer-wise feature extraction paradigms prove inadequate to meet the stringent precision requirements mandated by clinical diagnostic protocols. This critical limitation has emerged as a pivotal technical challenge demanding urgent resolution in contemporary medical image analysis. As a core mechanism for multi-scale feature fusion in medical image segmentation, attention modules play a pivotal role in capturing heterogeneous pathological characteristics. However, traditional approaches exhibit three inherent limitations: 1) Fixed-scale convolutional kernels demonstrate insufficient adaptability to dynamic lesion size variations; 2) Homogeneous feature aggregation induces computational redundancy across network hierarchies; 3) Cross-layer feature propagation lacks content-adaptive channel priority allocation. To address critical bottlenecks in multi-scale pathological feature modeling for medical imaging, including shallow semantic attenuation, weakened cross-hierarchical correlations, and computational redundancy, this paper proposes a Multi-Level Progressive fusion framework (MLP-MSA) featuring three innovative architectural components: a feature preservation module, a fusion-split attention module, and a multi-scale progressive attention module. Our model was evaluated on three medical image segmentation datasets. Experimental results demonstrate that the proposed network architecture achieves superior performance over state-of-the-art models in both Dice Similarity Coefficient (DSC) and mean Intersection over Union (mIoU) metrics. Notably, our framework exhibits enhanced capabilities in multi-class segmentation tasks and complex image-processing scenarios while maintaining significant computational efficiency advantages on resource-constrained platforms. These findings provide robust support for subsequent model optimization and deployment. Future research will focus on the architectural refinement of the proposed network to address increasingly challenging medical image segmentation tasks.
文章引用:黄世龙, 张孙杰. 多层级融合多尺度注意力的医学图像分割方法[J]. 建模与仿真, 2025, 14(5): 887-899. https://doi.org/10.12677/mos.2025.145442

参考文献

[1] Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. Proceedings of the Medical Image Computing and Computer-Assisted Intervention, 18th International Conference, Munich, 5-9 October 2015, 234-241.
[2] Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., et al. (2018) U-Net++: A Nested U-Net Architecture for Medical Image Segmentation. Proceedings of the Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, and 8th International Workshop, Granada and Spain, 20 September 2018, 3-11.
[3] Howard, A.G., Zhu, M., Chen, B., et al. (2017) Mobile Nets: Efficient Convolutional Neural Networks for Mobile Vision Applications.
[4] Wang, H., Cao, P., Wang, J. and Zaiane, O.R. (2022) UCTransNet: Rethinking the Skip Connections in U-Net from a Channel-Wise Perspective with Transformer. Proceedings of the AAAI Conference on Artificial Intelligence, 36, 2441-2449. [Google Scholar] [CrossRef
[5] Jha, D., Riegler, M.A., Johansen, D., Halvorsen, P. and Johansen, H.D. (2020) DoubleU-Net: A Deep Convolutional Neural Network for Medical Image Segmentation. 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), Rochester, 28-30 July 2020, 558-564. [Google Scholar] [CrossRef
[6] Oktay, O., Schlemper, J., Folgoc, L.L., et al. (2018) Attention U-Net: Learning Where to Look for the Pancreas.
[7] Chen, K., Wang, J., Pang, J., et al. (2019) MMDetection: Open MMLab Detection Toolbox and Benchmark.
[8] Jha, D., Smedsrud, P.H., Riegler, M.A., Johansen, D., Lange, T.D., Halvorsen, P., et al. (2019) ResUNet++: An Advanced Architecture for Medical Image Segmentation. 2019 IEEE International Symposium on Multimedia (ISM), San Diego, 9-11 December 2019, 225-2255. [Google Scholar] [CrossRef
[9] Wei, Y., Xiao, H., Shi, H., Jie, Z., Feng, J. and Huang, T.S. (2018) Revisiting Dilated Convolution: A Simple Approach for Weakly and Semi-Supervised Semantic Segmentation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 7268-7277. [Google Scholar] [CrossRef
[10] Qi, K., Yang, H., Li, C., et al. (2019) X-Net: Brain Stroke Lesion Segmentation Based on Depth-Wise Separable Convolution and Long-Range Dependencies. Proceedings of the Medical Image Computing and Computer Assisted InterventionMICCAI 2019: 22nd International Conference, Shenzhen, 13-17 October 2019, 247-255.
[11] Simonyan, K. and Zisserman, A. (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition.
[12] He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef
[13] Zhang, Z., Liu, Q. and Wang, Y. (2018) Road Extraction by Deep Residual U-Net. IEEE Geoscience and Remote Sensing Letters, 15, 749-753. [Google Scholar] [CrossRef
[14] Gao, S., Cheng, M., Zhao, K., Zhang, X., Yang, M. and Torr, P. (2021) Res2Net: A New Multi-Scale Backbone Architecture. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 652-662. [Google Scholar] [CrossRef] [PubMed]
[15] Zhang, H., Wu, C., Zhang, Z., Zhu, Y., Lin, H., Zhang, Z., et al. (2022) ResNeSt: Split-Attention Networks. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), New Orleans, 19-20 June 2022, 2735-2745. [Google Scholar] [CrossRef
[16] Huang, H., Lin, L., Tong, R., Hu, H., Zhang, Q., Iwamoto, Y., et al. (2020) UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, 4-8 May 2020, 1055-1059. [Google Scholar] [CrossRef
[17] Chen, J., Lu, Y., Yu, Q., et al. (2021) TransuNet: Transformers Make Strong Encoders for Medical Image Segmentation.
[18] Xu, G., Zhang, X., He, X. and Wu, X. (2023) Levit-UNet: Make Faster Encoders with Transformer for Medical Image Segmentation. In: Lecture Notes in Computer Science, Springer, 42-53. [Google Scholar] [CrossRef
[19] 李丹. 自适应特征融合的轻量级多模态医学图像分割模型研究[D]: [硕士学位论文]. 广州: 广州大学, 2024.