面向深度感知的空间层次学习用于稳健的医学图像分割
Depth-Aware Spatial Hierarchy Learning for Robust Medical Image Segmentation
摘要: 皮肤镜图像分割是计算机辅助诊断皮肤癌的关键步骤,但现有方法多依赖二维纹理信息,忽略了单目图像中隐含的视觉远近线索(伪深度)。本文提出一种融合伪深度注意力机制与边界监督的多任务U-Net框架——DASH-Net (Depth-Aware Spatial Hierarchy Network)。其核心创新在于通过深度引导的空间–通道双注意力模块(DSCA),建立二维图像特征通道与视觉远近层次之间的自适应关联,使分割网络能够根据深度先验选择性地增强来自相关深度范围的特征通道。此外,针对临床图像提出轻量变体DASH-Light,以亮度映射替代伪深度图,实现无外部深度估计模型的高效分割。在ISIC 2018皮肤病变基准数据集上的实验表明,DASH-Net的Dice系数达88.99%,Jaccard指数达82.02%,参数量仅7.85 M;替换为PVTv2骨干后Dice进一步提升至90.61%。在私有静脉畸形临床数据集上,DASH-Light取得66.80% Dice和30.21的HD95,较当前最优方法提升1.1%和下降12.4。消融实验与可视化分析揭示了DSCA模块的工作机制:部分通道专门化编码深层病灶信息,部分编码背景表层信息,形成类似人类视觉系统的深度语义分工。
Abstract: Skin lesion segmentation is a critical step in computer-aided diagnosis of skin cancer. Existing methods mostly rely on two-dimensional texture information and ignore the implicit visual depth cues in monocular images. This study proposes DASH-Net, a multi-task U-Net framework integrating pseudo-depth attention with boundary supervision. Its core innovation is the Depth-guided Spatial-Channel Attention (DSCA) module that establishes an adaptive association between image feature channels and visual depth hierarchy. A lightweight variant DASH-Light is also proposed for clinical images where brightness serves as a depth proxy. On ISIC 2018, DASH-Net achieves 88.99% Dice with only 7.85 M parameters, and reaches 90.61% Dice with PVTv2 backbone. On a private venous malformation clinical dataset, DASH-Light achieved a 66.80% Dice score and a 30.21 HD95, improving by 1.1% and decreasing by 12.4 compared to the current best method. Ablation studies reveal that DSCA channels spontaneously specialize into deep-preferring and background-preferring groups, forming a depth semantic division.
文章引用:周世钦. 面向深度感知的空间层次学习用于稳健的医学图像分割[J]. 软件工程与应用, 2026, 15(3): 494-509. https://doi.org/10.12677/sea.2026.153046

参考文献

[1] Esteva, A., Kuprel, B., Novoa, R.A., Ko, J., Swetter, S.M., Blau, H.M., et al. (2017) Dermatologist-Level Classification of Skin Cancer with Deep Neural Networks. Nature, 542, 115-118. [Google Scholar] [CrossRef] [PubMed]
[2] Nachbar, F., Stolz, W., Merkle, T., et al. (1994) The ABCD Rule of Dermatoscopy: High Prospective Value in the Diagnosis of Doubtful Melanocytic Skin Lesions. Journal of the American Academy of Dermatology, 30, 551-559.
[3] Codella, N.C.F., Gutman, D., Celebi, M.E., Helba, B., Marchetti, M.A., Dusza, S.W., et al. (2018). Skin Lesion Analysis toward Melanoma Detection: A Challenge at the 2017 International Symposium on Biomedical Imaging (ISBI). 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), Washington, DC, 4-7 April 2018, 168-172.[CrossRef
[4] Sultana, F., Sufian, A. and Dutta, P. (2018) Advancements in Image Classification Using Convolutional Neural Network. 2018 Fourth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN), Kolkata, 22-23 November 2018, 122-129. [Google Scholar] [CrossRef
[5] Shelhamer, E., Long, J. and Darrell, T. (2017) Fully Convolutional Networks for Semantic Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 640-651. [Google Scholar] [CrossRef] [PubMed]
[6] Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Lecture Notes in Computer Science, Springer, 234-241. [Google Scholar] [CrossRef
[7] Zhou, Z., Siddiquee, M.M.R., Tajbakhsh, N. and Liang, J. (2020) UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation. IEEE Transactions on Medical Imaging, 39, 1856-1867. [Google Scholar] [CrossRef] [PubMed]
[8] Oktay, O., Schlemper, J., Folgoc, L.L., et al. (2018) Attention U-Net: Learning Where to Look for the Pancreas.
[9] Hu, J., Shen, L. and Sun, G. (2018) Squeeze-and-Excitation Networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 7132-7141. [Google Scholar] [CrossRef
[10] Woo, S., Park, J., Lee, J. and Kweon, I.S. (2018) CBAM: Convolutional Block Attention Module. In: Lecture Notes in Computer Science, Springer, 3-19. [Google Scholar] [CrossRef
[11] Dai, D., Dong, C., Xu, S., Yan, Q., Li, Z., Zhang, C., et al. (2022) MS RED: A Novel Multi-Scale Residual Encoding and Decoding Network for Skin Lesion Segmentation. Medical Image Analysis, 75, Article 102293. [Google Scholar] [CrossRef] [PubMed]
[12] Yang, C. and Zhang, Z. (2024) PFD-Net: Pyramid Fourier Deformable Network for Medical Image Segmentation. Computers in Biology and Medicine, 172, Article 108302. [Google Scholar] [CrossRef] [PubMed]
[13] Huang, H., Xie, Q., Hu, Y., et al. (2023) CPCA: Channel Prior Convolutional Attention for Medical Image Segmentation.
[14] Ruan, J., Xiang, S., Xie, M., Liu, T. and Fu, Y. (2022) MALUNet: A Multi-Attention and Light-Weight Unet for Skin Lesion Segmentation. 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Las Vegas, 6-8 December 2022, 1280-1285. [Google Scholar] [CrossRef
[15] Al-Masni, M.A., Kim, D.H. and Kim, T.S. (2020) Multiple Skin Lesions Diagnostics via Integrated Deep Convolutional Networks for Segmentation and Classification. Computer Methods and Programs in Biomedicine, 190, Article 105351. [Google Scholar] [CrossRef] [PubMed]
[16] Sarker, M.M.K., Rashwan, H.A., Akram, F., Banu, S.F., Saleh, A., Singh, V.K., et al. (2018) SLSDeep: Skin Lesion Segmentation Based on Dilated Residual and Pyramid Pooling Networks. In: Lecture Notes in Computer Science, Springer, 21-29. [Google Scholar] [CrossRef
[17] Gu, Z., Cheng, J., Fu, H., Zhou, K., Hao, H., Zhao, Y., et al. (2019) CE-Net: Context Encoder Network for 2D Medical Image Segmentation. IEEE Transactions on Medical Imaging, 38, 2281-2292. [Google Scholar] [CrossRef] [PubMed]
[18] Kaul, M.K. and Zhang, S. (2021) FocusNet: An Attentive-Based Fully Convolutional Network for Medical Image Segmentation. Medical Image Analysis, 67, Article 101845.
[19] Feng, S., Zhao, H., Shi, F., Cheng, X., Wang, M., Ma, Y., et al. (2020) CPFNet: Context Pyramid Fusion Network for Medical Image Segmentation. IEEE Transactions on Medical Imaging, 39, 3008-3018. [Google Scholar] [CrossRef] [PubMed]
[20] Chen, J., Lu, Y., Yu, Q., et al. (2021) TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation.
[21] Wang, W., Xie, E., Li, X., Fan, D., Song, K., Liang, D., et al. (2022) PVT V2: Improved Baselines with Pyramid Vision Transformer. Computational Visual Media, 8, 415-424. [Google Scholar] [CrossRef
[22] Cao, H., Wang, Y., Chen, J., et al. (2021) Swin-UNet: UNet-Like Pure Transformer for Medical Image Segmentation.
[23] Valanarasu, J.M.J., Oza, P., Hachialloglu, I., et al. (2021) MedT: Medical Transformer for Medical Image Segmentation.
[24] Isensee, F., Jaeger, P.F., Kohl, S.A.A., Petersen, J. and Maier-Hein, K.H. (2020) nnU-Net: A Self-Configuring Method for Deep Learning-Based Biomedical Image Segmentation. Nature Methods, 18, 203-211. [Google Scholar] [CrossRef] [PubMed]
[25] Kui, X.Y., Yan, H.N., Li, Q.S., et al. (2025) WinGraphUNet: Advanced Windowed Graph Modeling for Skin Lesion Segmentation. Knowledge-Based Systems, 329, Article 114417.
[26] Ji, Z.X., Ye, Y.X. and Ma, X. (2025) BDFormer: Boundary-Aware Dual-Decoder Transformer for Skin Lesion Segmentation. Artificial Intelligence in Medicine, 162, Article 103079.
[27] Söylemez, Ö.F. (2025) SkinAttn-Net: A Multi-Level Attention-Based Network for Skin Lesion Segmentation. Scientific Reports, 16, Article No. 3700.
[28] Toptaş, B. (2025) Enhanced Skin Lesion Segmentation via Attentive Reverse-Attention U-Net. Symmetry, 17, Article 2002.
[29] Naveed, A., Naqvi, S.S., Khan, T.M., et al. (2024) AD-Net: Attention-Based Dilated Convolutional Residual Network for Skin Lesion Segmentation.
[30] Xu, R.T., Wang, C.W., Zhang, J.G., et al. (2024) SkinFormer: Learning Statistical Texture Representation for Efficient Skin Lesion Segmentation.
[31] Akram, A., Rashid, J., Jaffar, M.A., Faheem, M. and Amin, R.U. (2023) Segmentation and Classification of Skin Lesions Using Hybrid Deep Learning Method in the Internet of Medical Things. Skin Research and Technology, 29, e13524. [Google Scholar] [CrossRef] [PubMed]
[32] Ma, C., Tian, S. and Yu, L. (2024) Crfnet: A Medical Image Segmentation Method Using the Cross Attention Mechanism and Refined Feature Fusion Strategy. In: Lin, Z., et al., eds., Pattern Recognition and Computer Vision, Springer Nature Singapore, 247-260. [Google Scholar] [CrossRef
[33] Wang, S., Xu, L., Zhang, L., Zhang, Y., Li, C., Grzegorzek, M., et al. (2025) Hhbsnet: A Global Channel–spatial Attention and Multi‐Scale Dilated Convolution Network for Automatic Melasma Segmentation. Frontiers in Physiology, 16, Article 1665138. [Google Scholar] [CrossRef
[34] Wang, Y., Zhang, H., Fu, J. and Tian, H. (2025) MSCB-unet: Elevating Skin Lesion Segmentation Performance with Multi-Scale Spatial-Channel Bridging Network. Biomedical Signal Processing and Control, 110, 107986. [Google Scholar] [CrossRef
[35] Wang, L., Li, C., Zhang, Z., et al. (2021) Information Capacity-Based Depth Adjustment for Multiple Baselines Stereo. IEEE Transactions on Circuits and Systems for Video Technology, 31, 4086-4100.
[36] He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef
[37] Huang, G., Liu, Z., Van Der Maaten, L. and Weinberger, K.Q. (2017) Densely Connected Convolutional Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 2261-2269. [Google Scholar] [CrossRef
[38] Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2021) An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale.
[39] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., et al. (2021) Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 9992-10002. [Google Scholar] [CrossRef
[40] Ranftl, R., Bochkovskiy, A. and Koltun, V. (2021) Vision Transformers for Dense Prediction. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 12159-12168. [Google Scholar] [CrossRef
[41] Miangoleh, S.M.H., Dille, S., Mai, L., et al. (2023) Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, 18-22 June 2023, 9685-9695.
[42] Yang, L., Kang, B., Huang, Z., Xu, X., Feng, J. and Zhao, H. (2024) Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 16-22 June 2024, 10371-10381. [Google Scholar] [CrossRef
[43] Canny, J. (1986) A Computational Approach to Edge Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8, 679-698. [Google Scholar] [CrossRef
[44] Karimi, D. and Salcudean, S.E. (2020) Reducing the Hausdorff Distance in Medical Image Segmentation with Convolutional Neural Networks. IEEE Transactions on Medical Imaging, 39, 499-513. [Google Scholar] [CrossRef] [PubMed]