基于金字塔池化以及掩码生成的特征知识蒸馏
Feature Knowledge Distillation Based on Pyramid Pooling and Mask Generation
DOI: 10.12677/mos.2025.142151, PDF,   
作者: 陆新才, 孙占全, 李庆蓬:上海理工大学光电信息与计算机工程学院,上海;王 贺:河南大学经济学院,郑州 河南
关键词: 模型压缩知识蒸馏特征蒸馏Model Compression Knowledge Distillation Feature Distillation
摘要: 知识蒸馏(KD)的目标是将知识从大型教师网络传递到轻量级的学生网络中去。主流的KD方法可以被分为Logit蒸馏和特征蒸馏。基于特征的知识蒸馏是KD的重要组成部分,它利用中间层来监督学生网络的训练过程。然而,中间层的潜在不匹配可能会在训练过程中适得其反,并且目前的学生模型往往直接通过模仿老师的特征来学习。针对这一问题,本文提出了一种新的知识蒸馏框架,称为解耦空间金字塔池知识蒸馏,以区分特征图中区域的重要性。同时,本文还提出了一种掩码生成特征蒸馏模块,指导学生模型通过一个块生成而不是模仿教师的完整特征。与之前复杂的蒸馏方法相比,本文提出的方法在CIFAR-100和Tiny-ImageNet数据集上取得了更高的知识蒸馏模型分类结果。
Abstract: The goal of Knowledge Distillation (KD) is to transfer knowledge from a large teacher network to a lightweight student network. Mainstream KD methods can be divided into logit distillation and feature distillation. Feature-based knowledge distillation is a critical component of KD, utilizing intermediate layers to supervise the training process of the student network. However, potential mismatches in intermediate layers may backfire during training, and current student models often learn directly by imitating the teacher’s features. To address this issue, this paper proposes a novel distillation framework called Decoupled Spatial Pyramid Pooling Knowledge Distillation, which distinguishes the importance of regions in feature maps. This paper also introduces a mask-based feature distillation module, which guides the student model to generate features from a block rather than mimicking the complete features of the teacher model. Compared to previous complex distillation methods, the proposed approach achieves superior classification results on the CIFAR-100 and Tiny-ImageNet datasets.
文章引用:陆新才, 孙占全, 王贺, 李庆蓬. 基于金字塔池化以及掩码生成的特征知识蒸馏[J]. 建模与仿真, 2025, 14(2): 279-290. https://doi.org/10.12677/mos.2025.142151

参考文献

[1] He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef
[2] Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2017) ImageNet Classification with Deep Convolutional Neural Networks. Communications of the ACM, 60, 84-90. [Google Scholar] [CrossRef
[3] Hu, J., Shen, L. and Sun, G. (2018) Squeeze-and-Excitation Networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 7132-7141. [Google Scholar] [CrossRef
[4] Li, Q., Jin, S. and Yan, J. (2017) Mimicking Very Efficient Network for Object Detection. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 7341-7349. [Google Scholar] [CrossRef
[5] Lin, T., Dollar, P., Girshick, R., He, K., Hariharan, B. and Belongie, S. (2017) Feature Pyramid Networks for Object Detection. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 936-944. [Google Scholar] [CrossRef
[6] Long, J., Shelhamer, E. and Darrell, T. (2015) Fully Convolutional Networks for Semantic Segmentation. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 7-12 June 2015, 3431-3440. [Google Scholar] [CrossRef
[7] Ren, S., He, K., Girshick, R. and Sun, J. (2015) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1137-1149. [Google Scholar] [CrossRef
[8] Geoffrey, H., Vinyals, O. and Dean, J. (2015) Distilling the Knowledge in a Neural Network. arXiv: 1503.02531. [Google Scholar] [CrossRef
[9] Zhao, B., Cui, Q., Song, R., Qiu, Y. and Liang, J. (2022) Decoupled Knowledge Distillation. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 11943-11952. [Google Scholar] [CrossRef
[10] Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C. and Bengio, Y. (2015) FitNets: Hints for Thin Deep Nets. arXiv: 1412.6550. [Google Scholar] [CrossRef
[11] He, K., Zhang, X., Ren, S. and Sun, J. (2015) Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 1904-1916. [Google Scholar] [CrossRef] [PubMed]
[12] Mirzadeh, S.I., Farajtabar, M., Li, A., Levine, N., Matsukawa, A. and Ghasemzadeh, H. (2020) Improved Knowledge Distillation via Teacher Assistant. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 5191-5198. [Google Scholar] [CrossRef
[13] Zhang, Y., Xiang, T., Hospedales, T.M. and Lu, H. (2018) Deep Mutual Learning. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 4320-4328. [Google Scholar] [CrossRef
[14] Jin, Y., Wang, J. and Lin, D. (2023) Multi-Level Logit Distillation. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 24276-24285. [Google Scholar] [CrossRef
[15] Li, Z., Li, X., Yang, L., Zhao, B., Song, R., Luo, L., et al. (2023) Curriculum Temperature for Knowledge Distillation. Proceedings of the AAAI Conference on Artificial Intelligence, 37, 1504-1512. [Google Scholar] [CrossRef
[16] Phuong, M. and Lampert, C.H. (2019) Towards Understanding Knowledge Distillation. International Conference on Machine Learning. arXiv: 2105.13093. [Google Scholar] [CrossRef
[17] Cheng, X., Rao, Z., Chen, Y. and Zhang, Q. (2020) Explaining Knowledge Distillation by Quantifying the Knowledge. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 12922-12932. [Google Scholar] [CrossRef
[18] Chen, P., Liu, S., Zhao, H. and Jia, J. (2021) Distilling Knowledge via Knowledge Review. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 5006-5015. [Google Scholar] [CrossRef
[19] Chen, D., Mei, J., Zhang, H., Wang, C., Feng, Y. and Chen, C. (2022) Knowledge Distillation with the Reused Teacher Classifier. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 11923-11932. [Google Scholar] [CrossRef
[20] Tian, Y., Krishnan, D. and Isola P. (2019) Contrastive Representation Distillation. arXiv: 1910.10699. [Google Scholar] [CrossRef
[21] Zagoruyko, S. and Komodakis, N. (2016) Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer. arXiv: 1612.03928. [Google Scholar] [CrossRef
[22] Gou, J., Yu, B., Maybank, S.J. and Tao, D. (2021) Knowledge Distillation: A Survey. International Journal of Computer Vision, 129, 1789-1819. [Google Scholar] [CrossRef
[23] Yang, Z., Li, Z., Shao, M., Shi, D., Yuan, Z. and Yuan, C. (2022) Masked Generative Distillation. Computer Vision—ECCV 2022, Tel Aviv, 23-27 October 2022, 53-69. [Google Scholar] [CrossRef
[24] Park, W., Kim, D., Lu, Y. and Cho, M. (2019) Relational Knowledge Distillation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 3962-3971. [Google Scholar] [CrossRef
[25] Tung, F. and Mori, G. (2019) Similarity-Preserving Knowledge Distillation. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 1365-1374. [Google Scholar] [CrossRef
[26] Song, J., Chen, Y., Ye, J. and Song, M. (2022) Spot-Adaptive Knowledge Distillation. IEEE Transactions on Image Processing, 31, 3359-3370. [Google Scholar] [CrossRef] [PubMed]
[27] Guo, Z., Yan, H., Li, H. and Lin, X. (2023) Class Attention Transfer Based Knowledge Distillation. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 11868-11877. [Google Scholar] [CrossRef
[28] Gao, L. and Gao, H. (2023) Feature Decoupled Knowledge Distillation via Spatial Pyramid Pooling. Computer Vision—ACCV 2022, Macao, 4-8 December 2022, 732-745. [Google Scholar] [CrossRef
[29] Krizhevsky, A. and Hinton, G. (2009) Learning Multiple Layers of Features from Tiny Images. Technical Report, University of Toronto, Toronto.
[30] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015) ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 115, 211-252. [Google Scholar] [CrossRef
[31] Zhang, X., Zhou, X., Lin, M. and Sun, J. (2018) Shufflenet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 6848-6856. [Google Scholar] [CrossRef
[32] Simonyan, K. and Zisserman, A. (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv: 1409.1556. [Google Scholar] [CrossRef
[33] Zagoruyko, S. and Komodakis, N. (2016) Wide Residual Networks. In: Wilson, R.C., Hancock, E.R. and Smith, W.A.P., Eds., Proceedings of the British Machine Vision Conference 2016, BMVA Press. [Google Scholar] [CrossRef
[34] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., et al. (2017) MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv: 1704.04861. [Google Scholar] [CrossRef