L-SwinUNet:融合注意力增强与深度可分离卷积的轻量化分割模型
L-SwinUNet: A Lightweight Segmentation Model that Combines Attention Enhancement with Deep Separable Convolutions
DOI: 10.12677/sea.2026.151004, PDF,    科研立项经费支持
作者: 檀文文:合肥综合性国家科学中心能源研究院(安徽省能源实验室),安徽 合肥;安徽理工大学计算机科学与工程学院,安徽 淮南;卢 棚:合肥综合性国家科学中心能源研究院(安徽省能源实验室),安徽 合肥;姜 韦:安徽理工大学计算机科学与工程学院,安徽 淮南
关键词: 医学图像分割注意力机制深度可分离卷积上采样优化Medical Image Segmentation Attention Mechanism Depthwise Separable Convolutions Upsampling Optimisation
摘要: 针对现有基于Transformer的分割网络普遍存在参数冗余、计算复杂度高、推理效率低等问题,本文提出一种轻量化医学图像分割网络L-SwinUNet。该网络从三个维度进行协同优化:首先,在编码器–解码器跳跃连接中嵌入无参数SimAM注意力模块,自适应增强浅层空间语义与边界敏感特征;其次,在解码器中采用深度可分离卷积替代标准卷积,以分离式特征提取策略大幅削减参数规模与浮点运算量;最后,在上采样阶段引入CARAFE内容感知重组算子,通过自适应核预测机制精细重建高分辨率边缘细节。实验在Synapse数据集上验证了方法的有效性。结果表明,相较原始Swin-UNet,L-SwinUNet在减少约48%参数与40%计算量的同时还提升了Dice与HD95指标,证明其在医学图像分割中的轻量化优势与精度潜力。
Abstract: To address the prevalent issues of parameter redundancy, high computational complexity, and low inference efficiency in existing Transformer-based segmentation networks, this paper proposes a lightweight medical image segmentation network, L-SwinUNet. The network undergoes synergistic optimisation across three dimensions: firstly, embedding parameter-free SimAM attention modules within the encoder-decoder skip connections to adaptively enhance shallow spatial semantic and boundary-sensitive features; secondly, it employs depthwise separable convolutions within the decoder instead of standard convolutions, significantly reducing parameter size and floating-point operations through a separable feature extraction strategy; finally, it introduces the CARAFE content-aware reorganisation operator during the upsampling stage, which employs an adaptive kernel prediction mechanism to finely reconstruct high-resolution edge details. Experiments on the Synapse dataset validate the method’s efficacy. Results demonstrate that compared to the original Swin-UNet, L-SwinUNet achieves approximately 48% fewer parameters and 40% reduced computational load while improving Dice and HD95 metrics, proving its lightweight advantages and accuracy potential in medical image segmentation.
文章引用:檀文文, 卢棚, 姜韦. L-SwinUNet:融合注意力增强与深度可分离卷积的轻量化分割模型[J]. 软件工程与应用, 2026, 15(1): 28-37. https://doi.org/10.12677/sea.2026.151004

参考文献

[1] Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2017) Imagenet Classification with Deep Convolutional Neural Networks. Communications of the ACM, 60, 84-90. [Google Scholar] [CrossRef
[2] Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., et al. (2022) Swin Transformer V2: Scaling up Capacity and Resolution. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 12009-12019. [Google Scholar] [CrossRef
[3] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., et al. (2023) Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation. In: Karlinsky, L., Michaeli, T. and Nishino, K., Eds., Lecture Notes in Computer Science, Springer, 205-218. [Google Scholar] [CrossRef
[4] Shamshad, F., Khan, S., Zamir, S.W., Khan, M.H., Hayat, M., Khan, F.S., et al. (2023) Transformers in Medical Imaging: A Survey. Medical Image Analysis, 88, Article 102802. [Google Scholar] [CrossRef] [PubMed]
[5] Mehta, S. and Rastegari, M. (2021) Mobilevit: Lightweight, General-Purpose, and Mobile-Friendly Vision Transformer. arXiv:2110.02178.
[6] Hinton, G., Vinyals, O. and Dean, J. (2015) Distilling the Knowledge in a Neural Network. arXiv:1503.02531.
[7] Yang, L., Zhang, R.Y., Li, L., et al. (2021) SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks. Proceedings of the 38th International Conference on Machine Learning, 139, 11863-11874.
[8] Howard, A.G., Zhu, M., Chen, B., et al. (2017) Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv:1704.04861.
[9] Wang, J., Chen, K., Xu, R., Liu, Z., Loy, C.C. and Lin, D. (2019) CARAFE: Content-Aware Reassembly of Features. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October 2019-2 November 2019, 3007-3016. [Google Scholar] [CrossRef
[10] Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. and Chen, L. (2018) Mobilenetv2: Inverted Residuals and Linear Bottlenecks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 4510-4520. [Google Scholar] [CrossRef
[11] Goodfellow, I., Bengio, Y. and Courville, A. (2016) Deep Feedforward Networks. Deep Learning, 1, 161-217.
[12] 康家荣, 邵鹏飞, 王元. 基于Swin-Unet改进的医学图像分割算法[J]. 人工智能与机器人研究, 2024, 13(2): 354-362.
[13] 张文豪, 瞿绍军, 颜美丽. 基于深度学习的视网膜血管分割研究进展[J]. 计算机应用研究, 2025, 42(5): 1299-1311.
[14] 任怡璇, 崔容宇. 人工智能深度学习在单光子计算机断层显像中的研究进展[J]. 新医学, 2024, 55(3): 159-164.