一种阶段重置的知识蒸馏方法研究与仿真
Design and Simulation of Stage Reset Knowledge Distillation Method
DOI: 10.12677/MOS.2024.132137, PDF,    国家自然科学基金支持
作者: 陈骏立*, 孙占全:上海理工大学光电信息与计算机工程学院,上海
关键词: 神经网络分类模型模型压缩知识蒸馏阶段重置Neural Network Classification Model Model Compression Knowledge Distillation Stage Reset
摘要: 知识蒸馏是一种将知识从教师网络传递到学生网络的模型压缩方法。目前的知识蒸馏方法存在教师网络和学生网络之间的语义信息不一致的问题,具体而言,师生模型之间的前向推理距离不一致导致语义信息不一致,最终损耗蒸馏性能。为了解决这个问题,本文探索一种新的阶段重置知识蒸馏方法。该方法设计了以阶段为单位的知识蒸馏,师生网络相同阶段共享输出,降低了由学生与教师推理路径长度差异过大造成的特征语义不匹配的影响,从而提升学生网络的性能。最后,本文用提出的方法在公共数据集上进行仿真实验,并与最新的方法进行比较,实验结果表明本文提出的方法更具优势。
Abstract: Knowledge distillation is a compression technique used to transfer knowledge from a teacher net-work to a student network. However, the current knowledge distillation methods suffer from an is-sue of inconsistent semantic information between the teacher and student networks. This incon-sistency arises due to variations in forward reasoning distance between the teacher-student model, resulting in a loss of distillation performance. To address this problem, this study introduces a novel approach called “stage reset knowledge distillation.” This method incorporates stage-based knowledge distillation, where the output is shared within the same stage of the teacher-student network, which reduced the influence of feature semantic mismatch caused by the large difference in reasoning path length between students and teachers, thus enhancing the performance of the student network. Experimental evaluations on a public dataset are conducted to validate the pro-posed method’s efficacy. Comparative analysis against state-of-the-art techniques demonstrates the superior advantages offered by the proposed method.
文章引用:陈骏立, 孙占全. 一种阶段重置的知识蒸馏方法研究与仿真[J]. 建模与仿真, 2024, 13(2): 1455-1465. https://doi.org/10.12677/MOS.2024.132137

参考文献

[1] Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2012) ImageNet Classification with Deep Convolutional Neural Networks. Communications of the ACM, 60, 84-90. [Google Scholar] [CrossRef
[2] Redmon, J., Divvala, S., Girshick, R. and Farhadi, A. (2016) You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 779-788. [Google Scholar] [CrossRef
[3] Long, J., Shelhamer, E. and Darrell, T. (2015) Fully Convolutional Networks for Semantic Segmentation. 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, 7-12 June 2015, 3431-3440. [Google Scholar] [CrossRef
[4] Cui, J., Chen, P., Li, R., et al. (2019) Fast and Practical Neural Archi-tecture Search. 2019 IEEE/CVF International Conference on Computer Vision, Seoul, 27 October 2019-2 November 2019, 6509-6518. [Google Scholar] [CrossRef
[5] Luo, J.-H., Wu, J. and Lin, W. (2017) ThiNet: A Filter Level Pruning Method for Deep neural Network Compression. 2017 IEEE International Conference on Computer Vision, Venice, 22-29 Oc-tober 2017, 5058-5066. [Google Scholar] [CrossRef
[6] Jacob, B., Kligys, S., et al. (2018) Quantization and Training of Neural Net-works for Efficient Integer-Arithmetic-Only Inference. 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 2704-2713. [Google Scholar] [CrossRef
[7] Hinton, G., Vinyals, O. and Dean, J. (2015) Distilling the Knowledge in a Neural Network. arXiv: 1503.02531.
[8] Romero, A., Ballas, N., Kahou, S.E., et al. (2015) FitNets: Hints for Thin Deep Nets. arXiv: 1412.6550.
[9] Zagoruyko, S. and Komodakis, N. (2016) Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer. arXiv: 1612.03928.
[10] Huang, Z. and Wang, N. (2017) Like What You Like: Knowledge Distill via Neuron Selectivity Transfer. arXiv: 1707.01219.
[11] Mirzadeh, S.I., Farajtabar, M., Li, A., Levine, N., Matsukawa, A. and Ghasemzadeh, H. (2020) Im-proved Knowledge Distillation via Teacher Assistant. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 5191-5198. [Google Scholar] [CrossRef
[12] Ji, M., Heo, B. and Park, S. (2021) Show, Attend and Distill: Knowledge Distillation via Attention-Based Feature Matching. Proceedings of the AAAI Conference on Artificial Intelligence, 35, 7945-7952. [Google Scholar] [CrossRef
[13] Wang, C., Chen, D., Mei, J.-P., Zhang, Y., Feng, Y. and Chen, C. (2022) SemCKD: Semantic Calibration for Cross- Layer Knowledge Distillation. IEEE Transactions on Knowledge and Data Engi-neering, 35, 6305-6319. [Google Scholar] [CrossRef
[14] Chen, P., Liu, S., Zhao, H. and Jia, J. (2021) Distilling Knowledge via Knowledge Review. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, 20-25 June 2021, 5008-5017. [Google Scholar] [CrossRef
[15] Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D. and Batra, D. (2017) Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. 2017 IEEE In-ternational Conference on Computer Vision, Venice, 22-29 October 2017, 618-626. [Google Scholar] [CrossRef
[16] Peng, B., Jin, X., et al. (2019) Correlation Congruence for Knowledge Distil-lation. IEEE/CVF International Conference on Computer Vision, Seoul, 27 October 2019-2 November 2019, 5006-5015. [Google Scholar] [CrossRef
[17] Park, W., Kim, D., Lu, Y. and Cho, M. (2019) Relational Knowledge Dis-tillation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 15-20 June 2019, 3967-3976. [Google Scholar] [CrossRef
[18] Passalis, N. and Tefas, A. (2018) Learning Deep Representations with Probabilistic Knowledge Transfer. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds., Computer Vision—ECCV 2018. Lecture Notes in Computer Science, Springer, Cham, 268-284. [Google Scholar] [CrossRef