基于对比学习增强的Lora微调超声影像分割模型
A Contrastive Learning-Enhanced Lora Fine-Tuned SAM-Med3D Model for Ultrasound Image Segmentation
DOI: 10.12677/mos.2025.144332, PDF,   
作者: 张雨萌, 李逸凡:上海理工大学光电信息与计算机工程学院,上海
关键词: 3D超声分割数据集对比学习LoRA微调3D Ultrasound Segmentation Dataset Contrastive Learning LoRA Fine-Tuning
摘要: 超声影像分析在现代医学中扮演着至关重要的角色,但精确分割是其面临的主要挑战之一。尽管现有的深度学习模型如SAM在自然图像上表现出色,但在医学图像分割上仍存在性能差距。本研究提出了一种基于对比学习增强的LoRA微调SAM-Med3D超声影像分割模型(USCL-Med3D),旨在提高3D超声影像分割的精确度和效率。为此,设计了一种半监督伪标签数据集训练方法,通过自动化获取标注数据,降低了标注难度并保证了标注效果。同时,引入对比学习架构VCL-head,增强了模型对3D超声影像上下文信息的提取能力。此外,还对SAM-Med3D模型进行了LoRA微调,从而使模型具有更好的分割能力。实验结果表明,所提方法在3D超声数据集和一些公开的3D医疗影像数据集上取得了优异的分割效果。
Abstract: Ultrasound image analysis plays a critical role in modern medicine, but precise segmentation remains one of its major challenges. Although existing deep learning models like SAM perform well on natural images, there is still a performance gap in medical image segmentation. This study proposes a contrastive learning-enhanced LoRA fine-tuned SAM-Med3D ultrasound image segmentation model (USCL-Med3D) to improve the accuracy and efficiency of 3D ultrasound image segmentation. We designed a semi-supervised pseudo-label dataset training method to automatically obtain annotated data, reducing annotation difficulty while ensuring annotation quality. Additionally, a contrastive learning architecture was introduced to enhance the model’s ability to extract contextual information from 3D ultrasound images. Furthermore, we fine-tuned the SAM-Med3D model using LoRA, effectively incorporating the feature representation abilities of the 3D ultrasound dataset. Our method achieved excellent segmentation performance on the 3D ultrasound dataset and several publicly available 3D medical imaging datasets.
文章引用:张雨萌, 李逸凡. 基于对比学习增强的Lora微调超声影像分割模型[J]. 建模与仿真, 2025, 14(4): 811-825. https://doi.org/10.12677/mos.2025.144332

参考文献

[1] Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Munich, 5-9 October 2015, 234-241. [Google Scholar] [CrossRef
[2] Ke, J., Lu, Y., Shen, Y., Zhu, J., Zhou, Y., Huang, J., et al. (2023) ClusterSeg: A Crowd Cluster Pinpointed Nucleus Segmentation Framework with Cross-Modality Datasets. Medical Image Analysis, 85, Article 102758. [Google Scholar] [CrossRef] [PubMed]
[3] Gao, H., Li, Y., Long, K., et al. (2024) A Survey for Foundation Models in Autonomous Driving. arXiv: 2402.01105. [Google Scholar] [CrossRef
[4] Amrehn, M., Gaube, S., Unberath, M., et al. (2017) UI-Net: Interactive Artificial Neural Networks for Iterative Image Segmentation Based on a User Model. arXiv: 1709.03450. [Google Scholar] [CrossRef
[5] Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., et al. (2023) Segment Anything. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, 1-6 October 2023, 3992-4003. [Google Scholar] [CrossRef
[6] Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2020) An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv: 2010.11929. [Google Scholar] [CrossRef
[7] Zhang, Y., Shen, Z. and Jiao, R. (2024) Segment Anything Model for Medical Image Segmentation: Current Applications and Future Directions. Computers in Biology and Medicine, 171, Article 108238. [Google Scholar] [CrossRef] [PubMed]
[8] Ma, J., He, Y., Li, F., Han, L., You, C. and Wang, B. (2024) Segment Anything in Medical Images. Nature Communications, 15, Article No. 654. [Google Scholar] [CrossRef] [PubMed]
[9] Cheng, J., Ye, J., Deng, Z., et al. (2023) SAM-Med2d. arXiv: 2308.16184. [Google Scholar] [CrossRef
[10] Mazurowski, M.A., Dong, H., Gu, H., Yang, J., Konz, N. and Zhang, Y. (2023) Segment Anything Model for Medical Image Analysis: An Experimental Study. Medical Image Analysis, 89, Article 102918. [Google Scholar] [CrossRef] [PubMed]
[11] Wang, H., Guo, S., Ye, J., et al. (2023) SAM-Med3D. arXiv: 2310.15161. [Google Scholar] [CrossRef
[12] Wu, L., Zhuang, J. and Chen, H. (2024) VoCo: A Simple-Yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 16-22 June 2024, 22873-22882. [Google Scholar] [CrossRef
[13] Achiam, J., Adler, S., Agarwal, S., et al. (2023) GPT-4 Technical Report. arXiv: 2303.08774. [Google Scholar] [CrossRef
[14] Wang, X., Zhang, X., Cao, Y., Wang, W., Shen, C. and Huang, T. (2023) SegGPT: Towards Segmenting Everything in Context. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, 1-6 October 2023, 1130-1140. [Google Scholar] [CrossRef
[15] Radford, A., Kim, J.W., Hallacy, C., et al. (2021) Learning Transferable Visual Models from Natural Language Supervision. Proceedings of the 38th International Conference on Machine Learning, Virtual, 18-24 July 2021, 8748-8763.
[16] Oquab, M., Darcet, T., Moutakanni, T., et al. (2023) DiNOv2: Learning Robust Visual Features without Supervision. arXiv: 2304.07193. [Google Scholar] [CrossRef
[17] Zou, X., Yang, J., Zhang, H., et al. (2024) Segment Everything Everywhere All at Once. Proceedings of the 37th International Conference on Neural Information Processing System, New Orleans, 10-16 December 2023, 19769-19782.
[18] Betker, J., Goh, G., Jing, L., et al. (2023) Improving Image Generation with Better Captions. Computer Science.
https://cdn.openai.com/papers/dall-e-3.pdf
[19] Chen, T., Zhu, L., Ding, C., Cao, R., Wang, Y., Zhang, S., et al. (2023) SAM-Adapter: Adapting Segment Anything in Underperformed Scenes. 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Paris, 2-6 October 2023, 3359-3367. [Google Scholar] [CrossRef
[20] Wu, J., Ji, W., Liu, Y., et al. (2023) Medical SAM Adapter: Adapting Segment Anything Model for Medical Image Segmentation. arXiv: 2304.12620. [Google Scholar] [CrossRef
[21] Gong, S., Zhong, Y., Ma, W., Li, J., Wang, Z., Zhang, J., et al. (2024) 3DSAM-Adapter: Holistic Adaptation of SAM from 2D to 3D for Promptable Tumor Segmentation. Medical Image Analysis, 98, Article 103324. [Google Scholar] [CrossRef] [PubMed]
[22] Chen, C., Miao, J., Wu, D., Zhong, A., Yan, Z., Kim, S., et al. (2024) MA-SAM: Modality-Agnostic SAM Adaptation for 3D Medical Image Segmentation. Medical Image Analysis, 98, Article 103310. [Google Scholar] [CrossRef] [PubMed]
[23] He, K., Fan, H., Wu, Y., Xie, S. and Girshick, R. (2020) Momentum Contrast for Unsupervised Visual Representation Learning. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 9726-9735. [Google Scholar] [CrossRef
[24] He, Y., Yang, G., Ge, R., Chen, Y., Coatrieux, J., Wang, B., et al. (2023) Geometric Visual Similarity Learning in 3D Medical Image Self-Supervised Pre-Training. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 9538-9547. [Google Scholar] [CrossRef
[25] Tang, Y., Yang, D., Li, W., Roth, H.R., Landman, B., Xu, D., et al. (2022) Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 20698-20708. [Google Scholar] [CrossRef
[26] Du, H., Dong, Q., Xu, Y. and Liao, J. (2023) Weakly-Supervised 3D Medical Image Segmentation Using Geometric Prior and Contrastive Similarity. IEEE Transactions on Medical Imaging, 42, 2936-2947. [Google Scholar] [CrossRef] [PubMed]
[27] Cui, J., Zhong, Z., Tian, Z., et al. (2023) Generalized Parametric Contrastive Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46, 7463-7474. [Google Scholar] [CrossRef
[28] Caron, M., Touvron, H., Misra, I., Jegou, H., Mairal, J., Bojanowski, P., et al. (2021) Emerging Properties in Self-Supervised Vision Transformers. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 9630-9640. [Google Scholar] [CrossRef
[29] Taleb, A., Loetzsch, W., Danz, N., et al. (2020) 3D Self-Supervised Methods for Medical Imaging. Advances in Neural Information Processing Systems, 33, 18158-18172.
[30] Zhou, H., Lu, C., Chen, C., Yang, S. and Yu, Y. (2023) A Unified Visual Information Preservation Framework for Self-Supervised Pre-Training in Medical Image Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45, 8020-8035. [Google Scholar] [CrossRef] [PubMed]
[31] Zhou, X., Gao, H., Xu, X., et al. (2022) PCRL: Priority Convention Reinforcement Learning for Microscopically Sequencable Multi-Agent Problems. 36th Conference on Neural Information Processing Systems (NeurIPS 2022), New Orleans, 28 November-9 December 2022.
[32] Zhang, Z. and Gong, X. (2023) Positional Label for Self-Supervised Vision Transformer. Proceedings of the AAAI Conference on Artificial Intelligence, 37, 3516-3524. [Google Scholar] [CrossRef
[33] Tao, X., Li, Y., Zhou, W., Ma, K. and Zheng, Y. (2020) Revisiting Rubik’s Cube: Self-Supervised Learning with Volume-Wise Transformation for 3D Medical Image Segmentation. Medical Image Computing and Computer Assisted Intervention—MICCAI 2020, Lima, 4-8 October 2020, 238-248. [Google Scholar] [CrossRef
[34] He, K., Chen, X., Xie, S., Li, Y., Dollar, P. and Girshick, R. (2022) Masked Autoencoders Are Scalable Vision Learners. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 15979-15988. [Google Scholar] [CrossRef
[35] He, Z., Unberath, M., Ke, J. and Shen, Y. (2023) TransNuSeg: A Lightweight Multi-Task Transformer for Nuclei Segmentation. Medical Image Computing and Computer Assisted Intervention—MICCAI 2023, Vancouver, 8-12 October 2023, 206-215. [Google Scholar] [CrossRef
[36] Chen, T., Kornblith, S., Norouzi, M., et al. (2020) A Simple Framework for Contrastive Learning of Visual Representations. International Conference on Machine Learning. PmLR, 1597-1607.
[37] Chen, X. and He, K. (2021) Exploring Simple Siamese Representation Learning. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 15745-15753. [Google Scholar] [CrossRef
[38] Yang, N., Zhang, Y., Wang, Y., Tang, D., Li, Y. and Yuan, D. (2024) Adaptformer: An Adaptive Multimodal Deep Decomposition Approach for Power Consumption Forecasting. Advanced Data Mining and Applications, Sydney, 3-5 December 2024, 48-62. [Google Scholar] [CrossRef