基于文本引导原型特征调制的小样本语义分割方法
A Text-Guided Prototype Feature Modulation Method for Few-Shot Semantic Segmentation
摘要: 小样本语义分割旨在利用极少量标注样本实现对新类别的像素级分割。然而,受限于支持样本数量有限,传统方法主要依赖视觉特征构建类别原型,容易受到背景干扰及类内差异的影响,导致原型表达不稳定。为解决上述问题,本文提出一种基于文本引导原型特征调制的小样本语义分割方法。该方法引入类别级文本描述作为高层语义先验,通过文本–视觉相似性建模自适应聚合多条文本特征,并利用特征级线性调制机制对支持原型进行动态调节,从而增强类别判别性并抑制无关语义干扰。所提出的文本引导特征调制模块在不显著增加计算开销的前提下提升模型对新类别的泛化能力。实验结果表明,该方法在PASCAL-5i和COCO-20i数据集上均取得了优于基线模型的分割性能,验证了所提方法的有效性。
Abstract: Few-shot semantic segmentation aims to achieve pixel-level segmentation of novel classes with only a limited number of annotated samples. However, due to the scarcity of support samples, existing methods mainly rely on visual features to construct class prototypes, which are easily affected by background clutter and intra-class variations, leading to unstable prototype representations. To address this issue, this paper proposes a text-guided prototype feature modulation approach for few-shot semantic segmentation. The proposed method introduces category-level textual descriptions as high-level semantic priors, and adaptively aggregates multiple text features through text-visual similarity modeling. Furthermore, a feature-wise linear modulation mechanism is employed to dynamically adjust the support prototypes, thereby enhancing class discriminability and suppressing irrelevant semantic interference. The proposed text-guided feature modulation module improves the generalization ability to novel classes without introducing significant computational overhead. Experimental results on the PASCAL-5i and COCO-20i datasets demonstrate that the proposed method consistently outperforms the baseline models, validating its effectiveness.
文章引用:庞云帆. 基于文本引导原型特征调制的小样本语义分割方法[J]. 计算机科学与应用, 2026, 16(2): 337-347. https://doi.org/10.12677/csa.2026.162063

参考文献

[1] Zhang, X., Wei, Y., Yang, Y. and Huang, T.S. (2020) SG-One: Similarity Guidance Network for One-Shot Semantic Segmentation. IEEE Transactions on Cybernetics, 50, 3855-3865. [Google Scholar] [CrossRef] [PubMed]
[2] Xie, G., Liu, J., Xiong, H. and Shao, L. (2021) Scale-Aware Graph Neural Network for Few-Shot Semantic Segmentation. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 5471-5480. [Google Scholar] [CrossRef
[3] Zhu, L., Chen, T., Yin, J., See, S. and Liu, J. (2024) Addressing Background Context Bias in Few-Shot Segmentation through Iterative Modulation. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 16-22 June 2024, 3370-3379. [Google Scholar] [CrossRef
[4] Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., et al. (2021) Learning Transferable Visual Models from Natural Language Supervision. International Conference on Machine Learning, 18-24 July 2021, 8748-8763.
[5] Li, F.F., Fergus, R. and Perona, P. (2006) One-Shot Learning of Object Categories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 594-611. [Google Scholar] [CrossRef] [PubMed]
[6] Jamal, M.A. and Qi, G. (2019) Task Agnostic Meta-Learning for Few-Shot Learning. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 11711-11719. [Google Scholar] [CrossRef
[7] Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H.S. and Hospedales, T.M. (2018) Learning to Compare: Relation Network for Few-Shot Learning. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 1199-1208. [Google Scholar] [CrossRef
[8] Tan, W., Chen, S. and Yan, B. (2023) DifFSS: Diffusion Model for Few-Shot Semantic Segmentation. arXiv: 2307.00773.
[9] Tian, Z., Zhao, H., Shu, M., Yang, Z., Li, R. and Jia, J. (2022) Prior Guided Feature Enrichment Network for Few-Shot Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 1050-1065. [Google Scholar] [CrossRef] [PubMed]
[10] Wang, J., Zhang, B., Pang, J., Chen, H. and Liu, W. (2024) Rethinking Prior Information Generation with CLIP for Few-Shot Segmentation. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 16-22 June 2024, 3941-3951. [Google Scholar] [CrossRef
[11] Bi, H., Feng, Y., Diao, W., Wang, P., Mao, Y., Fu, K., et al. (2025) Prompt-And-Transfer: Dynamic Class-Aware Enhancement for Few-Shot Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47, 131-148. [Google Scholar] [CrossRef] [PubMed]
[12] Shaban, A., Bansal, S., Liu, Z., Essa, I. and Boots, B. (2017) One-Shot Learning for Semantic Segmentation. Proceedings of the British Machine Vision Conference 2017, London, 4-7 September 2017, 167.1-167.13. [Google Scholar] [CrossRef
[13] Nguyen, K. and Todorovic, S. (2019) Feature Weighting and Boosting for Few-Shot Segmentation. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 622-631. [Google Scholar] [CrossRef
[14] Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J. and Zisserman, A. (2009) The Pascal Visual Object Classes (VOC) Challenge. International Journal of Computer Vision, 88, 303-338. [Google Scholar] [CrossRef
[15] Hariharan, B., Arbelaez, P., Bourdev, L., Maji, S. and Malik, J. (2011) Semantic Contours from Inverse Detectors. 2011 International Conference on Computer Vision, Barcelona, 6-13 November 2011, 991-998. [Google Scholar] [CrossRef
[16] Lin, T., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., et al. (2014) Microsoft COCO: Common Objects in Context. In: Fleet, D., Pajdla, T., Schiele, B. and Tuytelaars, T., Eds., Computer VisionECCV 2014e, Springer International Publishing, 740-755. [Google Scholar] [CrossRef
[17] He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef
[18] Zhang, C., Lin, G., Liu, F., Yao, R. and Shen, C (2019) CANet: Class-Agnostic Segmentation Networks with Iterative Refinement and Attentive Few-Shot Learning. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 5212-5221. [Google Scholar] [CrossRef
[19] Zhang, C., Lin, G., Liu, F., Guo, J., Wu, Q. and Yao, R. (2019) Pyramid Graph Networks with Connection Attentions for Region-Based One-Shot Semantic Segmentation. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 9586-9594. [Google Scholar] [CrossRef
[20] Liu, W., Zhang, C., Lin, G. and Liu, F. (2020) CRNet: Cross-Reference Networks for Few-Shot Segmentation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 4164-472. [Google Scholar] [CrossRef
[21] Liu, Y., Zhang, X., Zhang, S. and He, X. (2020) Part-Aware Prototype Network for Few-Shot Semantic Segmentation. In: Vedaldi, A., Bischof, H., Brox, T. and Frahm, J.M., Eds., Computer VisionECCV 2020, Springer, 142-158. [Google Scholar] [CrossRef
[22] Min, J., Kang, D. and Cho, M. (2021) Hypercorrelation Squeeze for Few-Shot Segmenation. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 6921-6932. [Google Scholar] [CrossRef
[23] Lang, C., Tu, B., Cheng, G. and Han, J. (2022) Beyond the Prototype: Divide-And-Conquer Proxies for Few-Shot Segmentation. Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, Vienna, 23-29 July 2022, 1024-1030. [Google Scholar] [CrossRef
[24] Liu, J., Bao, Y., Xie, G., Xiong, H., Sonke, J. and Gavves, E. (2022) Dynamic Prototype Convolution Network for Few-Shot Semantic Segmentation. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 11543-11552. [Google Scholar] [CrossRef
[25] Lang, C., Cheng, G., Tu, B. and Han, J. (2022) Learning What Not to Segment: A New Perspective on Few-Shot Segmentation. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 8047-8057. [Google Scholar] [CrossRef
[26] Chen, H., Dong, Y., Lu, Z., Yu, Y., Li, Y., Han, J., et al. (2024) Dense Affinity Matching for Few-Shot Segmentation. Neurocomputing, 577, Article ID: 127348. [Google Scholar] [CrossRef
[27] Wang, Y., Sun, R. and Zhang, T. (2023) Rethinking the Correlation in Few-Shot Segmentation: A Buoys View. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 7183-7192. [Google Scholar] [CrossRef