相近尺寸下生成式模型和判别式模型在文本多分类任务上的性能比较研究
A Comparative Study on the Performance of Generative and Discriminative Models in Text Multi-Classification Tasks under Similar Model Sizes
摘要: 针对自然语言处理中应用广泛的文本多分类任务,为了探究其在实际落地过程中最合适的模型选型。本文从预测准确率和输出响应时延两个维度,对尺寸大小相近的生成式模型Qwen3-0.6b和判别式模型Bert-base在文本多分类任务上的性能表现进行了对比研究。作者使用了THUCNews新闻数据集,设置了Bert微调、Qwen3-0.6b零样本提示、Qwen3-0.6b全参微调和Qwen3-0.6bLoRA微调四组对比实验。实验表明,在相同训练样本的条件下,判别式模型在宏观平均F1值和预测速度上都优于最好的生成式模型方案,分别高出了0.8%和356.8%。
Abstract: In the widely applied text multi-classification task of natural language processing, this paper aims to explore the most suitable model selection for practical implementation. From the perspectives of prediction accuracy and response latency, we conducted a comparative study on the performance of the generative model Qwen3-0.6b and the discriminative model Bert-base in text multi-classification tasks. The experiments were based on the THUCNews news dataset, with four comparison groups set up: Bert fine-tuning, Qwen3-0.6b zero-shot prompting, Qwen3-0.6b full-parameter fine-tuning, and Qwen3-0.6b LoRA fine-tuning. The results demonstrated that, under the same training data conditions, the discriminative model outperformed the best-performing generative model approach in both macro-average F1 score and prediction speed, achieving improvements of 0.8% and 356.8%, respectively.
文章引用:黄浩, 李崭, 李雨航. 相近尺寸下生成式模型和判别式模型在文本多分类任务上的性能比较研究[J]. 计算机科学与应用, 2025, 15(8): 11-20. https://doi.org/10.12677/csa.2025.158193

参考文献

[1] Devlin, J., Chang, M.-W., Lee, K. and Toutanova, K. (2018) Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805.
[2] Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., et al. (2023) Llama: Open and Efffcient Foundation Language Models. arXiv:2302.13971.
[3] Bai, J., Bai, S., Chu, Y., Cui, Z., Dang, K. and Deng, X. (2023) Qwen Technical Report. arXiv:2309.16609.
[4] Hey, T., Keim, J., Koziolek, A. and Tichy, W.F. (2020) NoRBERT: Transfer Learning for Requirements Classification. 2020 IEEE 28th International Requirements Engineering Conference (RE), Zurich, 31 August 2020-4 September 2020, 169-179. [Google Scholar] [CrossRef
[5] Sainani, A., Anish, P.R., Joshi, V. and Ghaisas, S. (2020) Extracting and Classifying Requirements from Software Engineering Contracts. 2020 IEEE 28th International Requirements Engineering Conference (RE), Zurich, 31 August 2020-4 September 2020, 147-157. [Google Scholar] [CrossRef
[6] Chatterjee, R., Ahmed, A., Rose Anish, P., Suman, B., Lawhatre, P. and Ghaisas, S. (2021) A Pipeline for Automating Labeling to Prediction in Classification of NFRs. 2021 IEEE 29th International Requirements Engineering Conference (RE), Notre Dame, 20-24 September 2021, 323. [Google Scholar] [CrossRef
[7] Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., et al. (2020) Language Models Are Few-Shot Learners. Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, Virtual, 6-12 December 2020, 877-901.
[8] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y.-T. and Iwasawa, Y. (2022) Large Language Models Are Zero-Shot Reasoners. Advances in Neural Information Processing Systems, 35, 22199-22213.
[9] Han, Z., Gao, C., Liu, J., Zhang, J. and Zhang, S.Q. (2024) Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey. arXiv:2403.14608.
[10] 罗鹏程, 王继民, 聂磊. 基于生成式大语言模型的文献资源自动分类研究[J]. 情报理论与实践, 2024, 47(12): 174-182.
[11] Awad, M. and Khanna, R. (2015) Support Vector Regression. In: Awad, M. and Khanna, R., Eds., Efficient Learning Machines, Apress, 67-80. [Google Scholar] [CrossRef