SegNext框架下融合边缘特征的息肉分割方法
A Polyp Segmentation Method Fusing Edge Features under the SegNext Framework
DOI: 10.12677/mos.2026.154063, PDF,    科研立项经费支持
作者: 刘一帆, 魏 赟*:上海理工大学光电信息与计算机工程学院,上海
关键词: 息肉分割预训练模型微调交互提示多模态特征融合Polyp Segmentation Pre-Trained Model Fine-Tuning Interactive Prompts Multi-Modal Feature Fusion
摘要: 结直肠息肉的精准分割是结直肠癌早期诊断与治疗的关键技术环节,现有融合卷积神经网络(Convolutional Neural Network, CNN)与视觉Transformer (Vision Transformer, ViT)的双分支架构,在息肉医学图像分割中仍面临特征融合不充分、边缘细节捕捉不足及语义表征鲁棒性欠佳等问题。针对上述问题,本文提出一种基于SegNext框架的多模态边缘感知提示息肉分割方法。该方法以多模态边缘感知适配器(Multi-modal Edge-Aware Adapter, MEAA)为核心,协同提取CNN局部纹理特征、小波高频分量(Wavelet High-Frequency Component, WHFC)边缘特征及ViT全局语义特征,通过通道–空间双注意力机制实现多模态特征的自适应融合与增强,并引入密集图视觉提示策略以弥补细节丢失。在Kvasir-SEG、CVC-ClinicDB等5个公开息肉分割数据集上的验证实验表明,所提框架在0次、1次、2次交互提示设置下,平均Dice系数分别达到0.854、0.910、0.935,平均交并比(Intersection over Union, IoU)分别达到0.781、0.858、0.895,分割性能显著优于现有主流方法,具备临床实时应用潜力,为结直肠息肉计算机辅助诊断提供了高效精准的技术方案。
Abstract: Accurate segmentation of colorectal polyps is a key technical step in the early diagnosis and treatment of colorectal cancer. Existing dual-branch architectures that combine Convolutional Neural Networks (CNN) and Vision Transformers (ViT) still suffer from insufficient feature fusion, inadequate capture of edge details, and poor robustness of semantic representation in polyp medical image segmentation. To address these issues, this paper proposes a multi-modal edge-aware prompt polyp segmentation method based on the SegNext framework. The method is centered on a Multi-modal Edge-Aware Adapter (MEAA), which jointly extracts local texture features from CNN, edge features from Wavelet High-Frequency Components (WHFC), and global semantic features from ViT. A channel-spatial dual attention mechanism is adopted to achieve adaptive fusion and enhancement of multi-modal features, and a dense map visual prompt strategy is introduced to compensate for detail loss. Experimental results on five public polyp segmentation datasets, including Kvasir-SEG and CVC-ClinicDB, show that the proposed framework achieves mean Dice coefficients of 0.854, 0.910, and 0.935, and mean Intersection over Union (IoU) scores of 0.781, 0.858, and 0.895 under 0, 1, and 2 interactive prompt settings, respectively. The segmentation performance is significantly superior to existing mainstream methods, demonstrating potential for real-time clinical application and providing an efficient and accurate technical solution for computer-aided diagnosis of colorectal polyps.
文章引用:刘一帆, 魏赟. SegNext框架下融合边缘特征的息肉分割方法[J]. 建模与仿真, 2026, 15(4): 181-192. https://doi.org/10.12677/mos.2026.154063

参考文献

[1] Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab, N., Hornegger, J., Wells, W. and Frangi, A., Eds., Lecture Notes in Computer Science, Springer International Publishing, 234-241. [Google Scholar] [CrossRef
[2] Oktay, O., Schlemper, J., Le Folgoc, L., et al. (2018) Attention U-Net: Learning Where to Look for the Pancreas. arXiv:1804.03999.
https://arxiv.org/abs/1804.03999
[3] Li, L., Verma, M., Nakashima, Y., Nagahara, H. and Kawasaki, R. (2020) IterNet: Retinal Image Segmentation Utilizing Structural Redundancy in Vessel Networks. 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), Snowmass, 1-5 March 2020, 3656-3665. [Google Scholar] [CrossRef
[4] Alom, M.Z., Yakopcic, C., Taha, T.M. and Asari, V.K. (2018) Nuclei Segmentation with Recurrent Residual Convolutional Neural Networks Based U-Net (R2U-Net). NAECON 2018-IEEE National Aerospace and Electronics Conference, Dayton, 23-26 July 2018, 228-233. [Google Scholar] [CrossRef
[5] Gu, Z., Cheng, J., Fu, H., Zhou, K., Hao, H., Zhao, Y., et al. (2019) Ce-Net: Context Encoder Network for 2D Medical Image Segmentation. IEEE Transactions on Medical Imaging, 38, 2281-2292. [Google Scholar] [CrossRef] [PubMed]
[6] Zhou, Z., Siddiquee, M.M.R., Tajbakhsh, N. and Liang, J. (2020) UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation. IEEE Transactions on Medical Imaging, 39, 1856-1867. [Google Scholar] [CrossRef] [PubMed]
[7] Jha, D., Smedsrud, P.H., Riegler, M.A., Johansen, D., Lange, T.D., Halvorsen, P., et al. (2019) ResUNet++: An Advanced Architecture for Medical Image Segmentation. 2019 IEEE International Symposium on Multimedia (ISM), San Diego, 9-11 December 2019, 225-255. [Google Scholar] [CrossRef
[8] Tomar, N.K., Jha, D., Ali, S., Johansen, H.D., Johansen, D., Riegler, M.A., et al. (2021) DDANet: Dual Decoder Attention Network for Automatic Polyp Segmentation. In: Del Bimbo, A., et al., Eds., Lecture Notes in Computer Science, Springer International Publishing, 307-314. [Google Scholar] [CrossRef
[9] Fan, D., Ji, G., Zhou, T., Chen, G., Fu, H., Shen, J., et al. (2020) PraNet: Parallel Reverse Attention Network for Polyp Segmentation. In: Martel, A.L., et al., Eds., Lecture Notes in Computer Science, Springer International Publishing, 263-273. [Google Scholar] [CrossRef
[10] Zhang, Y., Liu, H. and Hu, Q. (2021) TransFuse: Fusing Transformers and CNNs for Medical Image Segmentation. In: de Bruijne, M., et al., Eds., Lecture Notes in Computer Science, Springer International Publishing, 14-24. [Google Scholar] [CrossRef
[11] Fan, X., Zhou, J., Jiang, X., Xin, M. and Hou, L. (2024) CSAP-UNet: Convolution and Self-Attention Paralleling Network for Medical Image Segmentation with Edge Enhancement. Computers in Biology and Medicine, 172, Article 108265. [Google Scholar] [CrossRef] [PubMed]
[12] Yuan, F., Zhang, Z. and Fang, Z. (2023) An Effective CNN and Transformer Complementary Network for Medical Image Segmentation. Pattern Recognition, 136, Article 109228. [Google Scholar] [CrossRef
[13] Chen, J., Lu, Y., Yu, Q., et al. (2021) TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv:2102.04306.
https://arxiv.org/abs/2102.04306
[14] Zhang, K. and Liu, D. (2023) Customized Segment Anything Model for Medical Image Segmentation. arXiv:2304.13785.
https://arxiv.org/abs/2304.13785
[15] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., et al. (2023) Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation. In: Karlinsky, L., Michaeli, T. and Nishino, K., Eds., Lecture Notes in Computer Science, Springer, 205-218. [Google Scholar] [CrossRef
[16] Roy, S., Wald, T., Koehler, G., et al. (2023) SAM.MD: Zero-Shot Medical Image Segmentation Capabilities of the Segment Anything Model. arXiv:2304.05396.
https://arxiv.org/abs/2304.05396
[17] Bui, N., Hoang, D., Tran, M., Doretto, G., Adjeroh, D., Patel, B., et al. (2024) SAM3D: Segment Anything Model in Volumetric Medical Images. 2024 IEEE International Symposium on Biomedical Imaging (ISBI), Athens, 27-30 May 2024, 1-4. [Google Scholar] [CrossRef
[18] Liu, Q., Cho, J., Bansal, M. and Niethammer, M. (2024) Rethinking Interactive Image Segmentation with Low Latency, High Quality, and Diverse Prompts. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 16-22 June 2024, 3773-3782. [Google Scholar] [CrossRef
[19] Jha, D., Smedsrud, P.H., Riegler, M.A., Halvorsen, P., de Lange, T., Johansen, D., et al. (2019) Kvasir-SEG: A Segmented Polyp Dataset. In: Ro, Y., et al., Eds., Lecture Notes in Computer Science, Springer International Publishing, 451-462. [Google Scholar] [CrossRef
[20] Bernal, J., Sánchez, F.J., Fernández-Esparrach, G., Gil, D., Rodríguez, C. and Vilariño, F. (2015) WM-DOVA Maps for Accurate Polyp Highlighting in Colonoscopy: Validation vs. Saliency Maps from Physicians. Computerized Medical Imaging and Graphics, 43, 99-111. [Google Scholar] [CrossRef] [PubMed]
[21] Tajbakhsh, N., Gurudu, S.R. and Liang, J. (2016) Automated Polyp Detection in Colonoscopy Videos Using Shape and Context Information. IEEE Transactions on Medical Imaging, 35, 630-644. [Google Scholar] [CrossRef] [PubMed]
[22] Silva, J., Histace, A., Romain, O., Dray, X. and Granado, B. (2014) Toward Embedded Detection of Polyps in WCE Images for Early Diagnosis of Colorectal Cancer. International Journal of Computer Assisted Radiology and Surgery, 9, 283-293. [Google Scholar] [CrossRef] [PubMed]
[23] Vazquez, D., Bernal, J., Sanchez, F.J., et al. (2016) A Benchmark for Endoluminal Scene Segmentation of Colonoscopy Images. arXiv:1612.00799.
https://arxiv.org/abs/1612.00799
[24] Fan, H. and Ling, H. (2017) SANet: Structure-Aware Network for Visual Tracking. 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, 21-26 July 2017, 1-8. [Google Scholar] [CrossRef
[25] Zhu, X., Liang, J. and Hauptmann, A.G. (2020) MSNet: A Multilevel Instance Segmentation Network for Natural Disaster Damage Assessment in Aerial Videos. arXiv:2006.16479.
https://arxiv.org/abs/2006.16479
[26] Zhang, R., Lai, P., Wan, X., et al. (2023) Lesion-Aware Dynamic Kernel for Polyp Segmentation. arXiv:2301.04904.
https://arxiv.org/abs/2301.04904
[27] Shi, W., Xu, J. and Gao, P. (2022) SSformer: A Lightweight Transformer for Semantic Segmentation. 2022 IEEE 24th International Workshop on Multimedia Signal Processing (MMSP), Shanghai, 26-28 September 2022, 1-5. [Google Scholar] [CrossRef
[28] Dong, B., Wang, W., Fan, D., Li, J., Fu, H. and Shao, L. (2023) Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers. CAAI Artificial Intelligence Research, 2, 9150015. [Google Scholar] [CrossRef
[29] Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., et al. (2023) Segment Anything. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, 1-6 October 2023, 3992-4003. [Google Scholar] [CrossRef
[30] Chen, T., Zhu, L., Ding, C., Cao, R., Wang, Y., Zhang, S., et al. (2023) SAM-Adapter: Adapting Segment Anything in Underperformed Scenes. 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Paris, 2-6 October 2023, 3359-3367. [Google Scholar] [CrossRef
[31] Li, H., Zhang, D., Yao, J., Han, L., Li, Z. and Han, J. (2024) ASPS: Augmented Segment Anything Model for Polyp Segmentation. In: Linguraru, M.G., et al., Eds., Lecture Notes in Computer Science, Springer, 118-128. [Google Scholar] [CrossRef
[32] Ma, J., He, Y., Li, F., Han, L., You, C. and Wang, B. (2024) Segment Anything in Medical Images. Nature Communications, 15, Article No. 654. [Google Scholar] [CrossRef] [PubMed]