|
[1]
|
You, S., Xu, C., Xu, C. and Tao, D.C. (2017) Learning from Multiple Teacher Networks. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, 13-17 August 2017, 1285-1294. [Google Scholar] [CrossRef]
|
|
[2]
|
Fukuda, T., Suzuki, M., Kurata, G., Thomas, S., Cui, J. and Ramabhadran, B. (2017) Efficient Knowledge Distillation from an Ensemble of Teachers. Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, 20-24 August 2017, 3697-3701. [Google Scholar] [CrossRef]
|
|
[3]
|
Wu, M.-C., Chiu, C.-T. and Wu, K.-H. (2019) Multi-Teacher Knowledge Distillation for Compressed Video Action Recognition on Deep Neural Networks. ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, 12-17 May 2019, 2202-2206.
|
|
[4]
|
Zhang, H., Chen, D. and Wang, C. (2022) Confidence-Aware Multi-Teacher Knowledge Distillation. ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 22-27 May 2022, 4498-4502. [Google Scholar] [CrossRef]
|
|
[5]
|
Kim, Y. (2014) Convolu-tional Neural Networks for Sentence Classification. Proceedings of the 2014 Conference on Empirical Methods in Natu-ral Language Processing (EMNLP), Doha, 25-29 October 2014, 1746-1751. [Google Scholar] [CrossRef]
|
|
[6]
|
杨丽, 吴雨茜, 王俊丽, 刘义理. 循环神经网络研究综述[J]. 计算机应用, 2018, 38(S2): 1-6+26.
|
|
[7]
|
Hochreiter, S. and Schmidhuber, J. (1997) Long Short-Term Memory. Neural Com-putation, 9, 1735-1780. [Google Scholar] [CrossRef] [PubMed]
|
|
[8]
|
Kenton, J. and Toutanova, L.K. (2019) BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. Proceedings of NAACL-HLT, Minneapolis, 2-7 June 2019, 4171-4186.
|
|
[9]
|
Bahdanau, D., Cho, K. and Bengio, Y. (2015) Neural Machine Translation by Jointly Learning to Align and Translate. The 3rd International Conference on Learning Representations, San Diego, 7-9 May 2015, 1-15.
|
|
[10]
|
Chin, T.-W., Ding, R.Z., Zhang, C. and Marculescu, D. (2020) Towards Efficient Model Compression via Learned Global Ranking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, 13-19 June 2020, 1518-1528. [Google Scholar] [CrossRef]
|
|
[11]
|
He, Y.H., Zhang, X.Y. and Sun, J. (2017) Channel Pruning for Accelerating Very Deep Neural Networks. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 1389-1397.
|
|
[12]
|
Zhuang, Z.W., Tan, M.K., Zhuang, B.H., Liu, J., Guo, Y., Wu, Q.Y., Huang, J.Z. and Zhu, J.H. (2018) Discrimination-Aware Channel Pruning for Deep Neural Net-works. Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, Montréal, 3-8 December 2018, 875-886.
|
|
[13]
|
Wang, K., Liu, Z.J., Lin, Y.J., Lin, J. and Han, S. (2019) Haq: Hardware-Aware Automated Quan-tization with Mixed Precision. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR Work-shops 2020, Seattle, 14-19 June 2020, 8612-8620. [Google Scholar] [CrossRef]
|
|
[14]
|
Wu, J.X., Leng, C., Wang, Y.H., Hu, Q.H. and Cheng, J. (2016) Quantized Convolutional Neural Networks for Mobile Devices. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 4820-4828.
|
|
[15]
|
Xie, Z., Wen, Z.Q., Liu, J., Liu, Z.Q., Wu, X.X. and Tan, M.K. (2020) Deep Transferring Quantization. 16th European Conference on Computer Vision, Glasgow, 23-28 August 2020, 625-642. [Google Scholar] [CrossRef]
|
|
[16]
|
Pham, H., Guan, M.Y., Zoph, B., Le, Q.V. and Dean, J. (2018) Efficient Neural Architecture Search via Parameter Sharing. Proceedings International Conference on Machine Learning, Vol. 2, 4092-4101.
|
|
[17]
|
Hinton, G., Vinyals, O. and Dean, J. (2015) Distilling the Knowledge in a Neural Network. Computerence, 14, 38-39.
|
|
[18]
|
Romero, A., Ballas, N., et al. (2015) Fitnets: Hints for Thin Deep Nets.
|
|
[19]
|
Yuan, L., Tay, F.E.H., Li, G.L., Wang, T. and Feng, J.S. (2020) Revisiting Knowledge Distillation via Label Smoothing Regulari-zation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, 14-19 June 2020, 3903-3911. [Google Scholar] [CrossRef]
|
|
[20]
|
Ma, X.Y., Shen, Y.L., et al. (2020) Adversarial Self-Supervised Data-Free Distillation for Text Classification.
|
|
[21]
|
廖胜兰, 吉建民, 俞畅, 陈小平. 基于BERT模型与知识蒸馏的意图分类方法[J]. 计算机工程, 2021, 47(5): 73-79.
|
|
[22]
|
Nityasya, M.N., Wibowo, H.A., Chevi, R., Prasojo, R.E. and Aji, A.F. (2022) Which Student Is Best? A Comprehensive Knowledge Distillation Exam for Task-Specific BERT Models.
|
|
[23]
|
Du, S.C., You, S., Li, X.J., et al. (2020) Agree to Disagree: Adaptive Ensemble Knowledge Distillation in Gradient Space. 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, 6-12 December 2020, 12345-12355.
|
|
[24]
|
Kwon, K., Na, H., Lee, H., et al. (2020) Adaptive Knowledge Distillation based on Entropy. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Pro-cessing (ICASSP), Barcelona, 4-8 May 2020, 7409-7413. [Google Scholar] [CrossRef]
|