一种面向资源受限环境的高吞吐量、置信度门控的细粒度文本分类框架
A Confidence-Gated Framework for High-Throughput Fine-Grained Text Classification in Resource-Constrained Environments
DOI: 10.12677/csa.2026.161005, PDF,   
作者: 董致男:纽约大学坦登工程学院,美国 纽约;潘昱曈:辽宁工程技术大学软件学院,辽宁 葫芦岛;黎阳诗:Oracle,美国 奥斯汀
关键词: 细粒度文本分类资源受限环境置信度门控高吞吐量双模型验证Fine-Grained Text Classification Resource-Constrained Environments Confidence-Gated High-Throughput Dual-Model Verification
摘要: 针对通用大语言模型在资源受限环境下处理大规模细粒度文本分类任务时面临的效率瓶颈与幻觉问题,本文提出了一种名为DeepConf-Verify (DCV)的高性能框架。该框架首先通过领域微调将小参数量模型的准确率基线从不足30%显著提升至90%以上;进而引入双阈值动态置信度门控机制,利用词元级置信度轨迹实时监控生成过程,实现对“困惑”样本的立即熔断和对高确信样本的快速通行;最后,对处于临界置信度区间的样本执行双模型一致性验证以消除尾部风险。实验结果表明,在单张NVIDIA A100 GPU受限条件下,DCV框架在保持95.2%企业级准确率的同时,相比原有的通用大模型系统实现了超过1200%的吞吐量提升(达60.2条/秒),相比同参数量的单一微调模型亦有24%的效率优化。系统成功支持日处理超过500万条评论数据,并将人工审核率控制在4.5%以内。本研究为在低资源环境下构建高吞吐、高可靠的垂直领域AI系统提供了有效的理论与实践范式。
Abstract: To address the efficiency bottlenecks and hallucination issues faced by general-purpose Large Language Models (LLMs) in handling large-scale, fine-grained text classification tasks within resource-constrained environments, this paper proposes a high-performance framework named DeepConf-Verify (DCV). Building upon domain-specific fine-tuning, which elevates the accuracy baseline of small-parameter models from under 30% to over 90%, the framework integrates a Dual-Threshold Dynamic Confidence Gating mechanism. This mechanism utilizes token-level confidence trajectories to monitor the generation process in real-time, executing an immediate “Panic Exit” for “confused” samples and a “Fast Pass” for high-confidence samples. Furthermore, a Dual-Model Verification protocol is employed to enforce consensus on samples within critical confidence intervals, thereby mitigating tail risks. Experimental results on a single NVIDIA A100 GPU demonstrate that DCV achieves an enterprise-grade accuracy of 95.2%. Notably, it boosts throughput by over 1200% (reaching 60.2 comments/sec) compared to the original general-purpose LLM system, and achieves a 24% efficiency optimization compared to a single fine-tuned model of equivalent parameter size. The system successfully scales to process over 5 million comments daily while keeping the manual audit rate within 4.5%. This study provides a robust theoretical and practical paradigm for constructing high-throughput and reliable vertical-domain AI systems in low-resource settings.
文章引用:董致男, 潘昱曈, 黎阳诗. 一种面向资源受限环境的高吞吐量、置信度门控的细粒度文本分类框架[J]. 计算机科学与应用, 2026, 16(1): 44-55. https://doi.org/10.12677/csa.2026.161005

参考文献

[1] Madhan, S., Monish Raju, T. and S, V. (2025) The Future of Social Media in Marketing with Reference to Electronic Goods. ASET Journal of Management Science, 4, 408-418. [Google Scholar] [CrossRef
[2] Pravina, S. and Muthulakshmi, K. (2025) A Study on Customer Attitudes and Purchase Intentions Toward White Goods Through Social Media Marketing. International Journal of Management, 16, 48-59. [Google Scholar] [CrossRef
[3] Sonawane, A. and Shinde, S. (2025) Sentiment Analysis for Social Media: Using Natural Language Processing to Understand Public Opinion. International Journal of Scientific Research in Science, Engineering and Technology, 12, 110-113.
[4] Zavala, A. and Ramirez-Marquez, J.E. (2019) Visual Analytics for Identifying Product Disruptions and Effects via Social Media. International Journal of Production Economics, 208, 544-559. [Google Scholar] [CrossRef
[5] Chakraborty, K., Bhattacharyya, S. and Bag, R. (2020) A Survey of Sentiment Analysis from Social Media Data. IEEE Transactions on Computational Social Systems, 7, 450-464. [Google Scholar] [CrossRef
[6] Yan, X., Yang, X., Jin, N., Chen, Y. and Li, J. (2025) A General AI Agent Framework for Smart Buildings Based on Large Language Models and React Strategy. Smart Construction, 2, Article 4. [Google Scholar] [CrossRef
[7] Ren, L., Wang, H., Dong, J., Jia, Z., Li, S., Wang, Y., et al. (2025) Industrial Foundation Model. IEEE Transactions on Cybernetics, 55, 2286-2301. [Google Scholar] [CrossRef] [PubMed]
[8] Gu, A., Gulcehre, C., Paine, T., Hoffman, M. and Pascanu, R. (2020) Improving the Gating Mechanism of Recurrent Neural Networks. Proceedings of the 37th International Conference on Machine Learning, 13-18 July 2020, 3800-3809.
[9] Zhang, J., Jin, X., Sun, J., Wang, J. and Li, K. (2019) Dual Model Learning Combined with Multiple Feature Selection for Accurate Visual Tracking. IEEE Access, 7, 43956-43969. [Google Scholar] [CrossRef
[10] Said, A.J. and Ismail, A.M. (2025) Trends in Natural Language Processing for Text Classification: A Comprehensive Survey. International Journal of Science and Research Archive.
[11] Bansod, D.A. (2025) Enhanced Deep Learning Approaches for Text Classification: A Comprehensive Review. International Journal for Research in Applied Science and Engineering Technology, 13, 2067-2071. [Google Scholar] [CrossRef
[12] Ilhan, F., Chow, K., Hu, S., Huang, T., Tekin, S., Wei, W., et al. (2024) Adaptive Deep Neural Network Inference Optimization with Eenet. 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, 3-8 January 2024, 1362-1371. [Google Scholar] [CrossRef
[13] Scardapane, S., Comminiello, D., Scarpiniti, M., Baccarelli, E. and Uncini, A. (2020) Differentiable Branching in Deep Networks for Fast Inference. ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, 4-8 May 2020, 4167-4171. [Google Scholar] [CrossRef
[14] Zhang, J., Tan, M., Dai, P. and Zhu, W. (2023) LECO: Improving Early Exiting via Learned Exits and Comparison-Based Exiting Mechanism. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop), Toronto, 10-12 July 2023, 298-309. [Google Scholar] [CrossRef
[15] Nam, G., Yoon, J., Lee, Y. and Lee, J.Y. (2021) Diversity Matters When Learning from Ensembles. Advances in Neural Information Processing Systems, 34, 35687-35698.
[16] Jung, Y. (2017) Multiple Predicting k-Fold Cross-Validation for Model Selection. Journal of Nonparametric Statistics, 30, 197-215. [Google Scholar] [CrossRef
[17] Ling, C., Zhao, X., Lu, J., Deng, C., Zheng, C., Wang, J., Chowdhury, T., Li, Y.Q., Cui, H., Zhang, X., et al. (2023) Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey. arXiv: 2305.18703.
[18] Kang, W. (2024) QOS-Aware Inference Acceleration Using Adaptive Depth Neural Networks. IEEE Access, 12, 49329-49340. [Google Scholar] [CrossRef
[19] Wang, M., Kim, J. and Yan, Y. (2025) Syntactic-aware Text Classification Method Embedding the Weight Vectors of Feature Words. IEEE Access, 13, 37572-37590. [Google Scholar] [CrossRef
[20] Zhuang, Z., Liu, M., Cutkosky, A. and Orabona, F. (2022) Understanding AdamW through Proximal Methods and Scale-Freeness. arXiv: 2202.00089.
[21] Chen, Z., Li, Y., Bengio, S. and Si, S. (2019) You Look Twice: GaterNet for Dynamic Filter Selection in CNNs. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 9164-9172. [Google Scholar] [CrossRef
[22] Desreumaux, L. (2019) An Empirical Study of Active Learning Strategies for Supervised Classification. Centrale-Supélec.
https://www.semanticscholar.org/paper/7c0ebd3116b8d7bab080351c33c8bc7a3154e01a
[23] Scotta, S. and Messina, A. (2025) Experimenting Task-Specific LLMs.
https://www.semanticscholar.org/paper/053529283167c72e61dbc257ee541f9fef27beed
[24] Betta, G., Capriglione, D. and Corvino, M. (2014) A Proposal for the Management of the Measurement Uncertainty in Classification and Recognition Problems. IEEE Transactions on Instrumentation and Measurement, 63, 2056-2064.
[25] Ma, S., Wang, X., Lei, Y., Shi, C., Yin, M. and Ma, X. (2024) “Are You Really Sure?” Understanding the Effects of Human Self-Confidence Calibration in AI-Assisted Decision Making. Proceedings of the CHI Conference on Human Factors in Computing Systems, Honolulu, 11-16 May 2024, 1-20. [Google Scholar] [CrossRef
[26] Shi, J., Wang, Z., Zhou, J., Liu, C., Sun, P.Z.H., Zhao, E., et al. (2025) MentalQLM: A Lightweight Large Language Model for Mental Healthcare Based on Instruction Tuning and Dual LoRA Modules. IEEE Journal of Biomedical and Health Informatics. [Google Scholar] [CrossRef] [PubMed]
[27] Taylor, R., Ojha, V. and Martino, I. (2021) Sensitivity Analysis for Deep Learning: Ranking Hyper-Parameter Influence. IEEE Access, 9, 171457-171465.