医疗电商平台中大语言模型驱动的中文医学对话系统研究
Research on Chinese Medical Dialogue System Driven by Large Language Models in Medical E-Commerce Platforms
DOI: 10.12677/ecl.2024.1341314, PDF,   
作者: 滚流海, 曾以春:贵州大学大数据与信息工程学院,贵州 贵阳;吴 娜:贵州中医药大学第一临床医学院,贵州 贵阳
关键词: 大语言模型监督微调医疗电商医学对话系统Large Language Models Supervised Fine-Tuning Medical E-Commerce Medical Dialogue Systems
摘要: 随着互联网技术和人工智能的迅猛发展,医疗电商平台在现代医药服务中扮演着越来越重要的角色。本研究提出了一种基于大语言模型(LLM)的中文医学对话系统模型MedAsst,并探讨其在医疗电商平台中的应用。该模型以Qwen2-7B为基础,通过LoRA方法在147万条医学问答数据上进行监督微调。本文在医学多项选择题测试和自定义医学问答数据集上对MedAsst的有效性进行了全面评估。实验结果显示,MedAsst在BLEU-4、ROUGE-1、ROUGE-2和ROUGE-L等评价指标上均优于其他基线模型,特别是在医学问答能力上展现出显著优势。与LlaMa-3-8B、Gemma-7B、Mistral-7B和未经微调的Qwen2-7B模型相比,MedAsst通过合理的微调策略在特定领域的任务中表现出色,证明了监督微调的必要性和有效性。本文的研究不仅提升了模型在中文医学问答任务中的表现,也展示了大语言模型在医疗电商平台中的应用潜力,为未来在更复杂场景中的优化和实际应用提供了有力支持。
Abstract: With the rapid development of Internet technology and artificial intelligence, medical e-commerce platforms play an increasingly important role in modern pharmaceutical services. This study proposes a Chinese medical dialogue system model MedAsst based on Large Language Model (LLM) and explores its application in medical e-commerce platform. The model is based on Qwen2-7B, and supervised fine-tuning is performed on 1.47 million medical question and answer data by LoRA method. In this paper, the effectiveness of MedAsst is thoroughly evaluated on a medical multiple-choice test and a customised medical quiz dataset. The experimental results show that MedAsst outperforms other baseline models on the evaluation metrics of BLEU-4, ROUGE-1, ROUGE-2, and ROUGE-L, and in particular demonstrates a significant advantage in medical quizzing ability. Compared with LlaMa-3-8B, Gemma-7B, Mistral-7B, and the unfine-tuned Qwen2-7B model, MedAsst performs well in domain-specific tasks through reasonable fine-tuning strategies, demonstrating the necessity and effectiveness of supervised fine-tuning. The research in this paper not only improves the performance of the model in the Chinese medical Q&A task, but also demonstrates the potential application of large language models in medical e-commerce platforms, which provides strong support for future optimisation and practical application in more complex scenarios.
文章引用:滚流海, 吴娜, 曾以春. 医疗电商平台中大语言模型驱动的中文医学对话系统研究[J]. 电子商务评论, 2024, 13(4): 1611-1620. https://doi.org/10.12677/ecl.2024.1341314

参考文献

[1] 赵敏, 原超, 李朝霞. 大数据背景下医药电子商务服务模式的提升与探究[J]. 山西经济管理干部学院学报, 2018, 26(1): 45-48.
[2] 苏尤丽, 胡宣宇, 马世杰, 等. 人工智能在中医诊疗领域的研究综述[J]. 计算机工程与应用, 2024, 60(16): 1-18.
[3] Brown, T.B., Mann, B., Ryder, N., et al. (2020) Language Models Are Few-Shot Learners.
[4] Wang, H.C., Liu, C., Xi, N.W., Qiang, Z.W., Zhao, S.D., Qin, B. and Liu, T. (2023) HuaTuo: Tuning LLaMA Model with Chinese Medical Knowledge.
https://arxiv.org/abs/2304.06975
[5] Bai, J., Bai, S., Chu, Y., et al. (2023) Qwen Technical Report.
https://arxiv.org/abs/2309.16609
[6] Yang, A., Yang, B., Hui, B., et al. (2024) Qwen2 Technical Report.
https://arxiv.org/abs/2407.10671
[7] Hu, E.J., Shen, Y., Wallis, P., et al. (2021) LoRA: Low-Rank Adaptation of Large Language Models.
https://arxiv.org/abs/2106.09685
[8] 任芳慧, 郭熙铜, 彭昕, 等. 医疗领域对话系统口语理解综述[J]. 中文信息学报, 2024, 38(1): 24-35.
[9] Hayashi, Y. (1990) A Neural Expert System with Automated Extraction of Fuzzy If-Then Rules and Its Application to Medical Diagnosis. In: Proceedings of the 3rd International Conference on Neural Information Processing Systems, Morgan Kaufmann Publishers Inc., 578-584.
[10] Wong, W., Thangarajah, J. and Padgham, L. (2011). Health Conversational System Based on Contextual Matching of Community-Driven Question-Answer Pairs. Proceedings of the 20th ACM international conference on Information and knowledge management, 19-23 October 2020, 2577-2580.[CrossRef
[11] Li, Y.S., Lam, C.S.N. and See, C. (2021) Using a Machine Learning Architecture to Create an Ai-Powered Chatbot for Anatomy Education. Medical Science Educator, 31, 1729-1730. [Google Scholar] [CrossRef] [PubMed]
[12] 颜永, 白宗文. 基于强化学习的生成式对话系统研究[J]. 数据挖掘, 2023, 13(2): 185-193.
[13] Wang, S., Wang, S., Liu, Z. and Zhang, Q. (2023) A Role Distinguishing Bert Model for Medical Dialogue System in Sustainable Smart City. Sustainable Energy Technologies and Assessments, 55, Article ID: 102896. [Google Scholar] [CrossRef
[14] 马德草, 杨桂松. 基于实体知识推理的端到端任务型对话[J]. 建模与仿真, 2024, 13(3): 3212-3221.
[15] Touvron, H., Lavril, T., Izacard, G., et al. (2023) LLaMA: Open and Efficient Foundation Language Models.
https://arxiv.org/abs/2302.13971
[16] Gemma Team, Mesnard, T., Hardin, C., et al. (2024) Gemma: Open Models Based on Gemini Research and Technology.
https://arxiv.org/abs/2403.08295
[17] Jiang, A.Q., Sablayrolles, A., Mensch, A., et al. (2023) Mistral 7B.
https://arxiv.org/abs/2310.06825
[18] Yang, S., Zhao, H., Zhu, S., et al. (2023) Zhongjing: Enhancing the Chinese Medical Capabilities of Large Language Model through Expert Feedback and Real-World Multi-Turn Dialogue.
https://arxiv.org/abs/2308.03549
[19] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., et al. (2017) Attention is All You Need. Annual Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, Long Beach, 5998-6008.
[20] Ainslie, J., Lee-Thorp, J., de Jong, M., Zemlyanskiy, Y., Lebron, F. and Sanghai, S. (2023). GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 6-10 December 2023, 4895-4901.[CrossRef
[21] Yang, T., et al. (2024) Generative Large Language Models (LLMs), Question-Answering (QA), Dialogue Model, Traditional Chinese Medical QA, Fine-Tuning.
https://github.com/tyang816/MedChatZH
[22] Li, J.Q., et al. (2023) Huatuo-26M, a Large-Scale Chinese Medical QA Dataset.
https://arxiv.org/abs/2305.01526
[23] He, J., Fu, M. and Tu, M. (2019) Applying Deep Matching Networks to Chinese Medical Question Answering: A Study and a Dataset. BMC Medical Informatics and Decision Making, 19, Article No. 52. [Google Scholar] [CrossRef] [PubMed]
[24] Taori, R., et al. (2023) Stanford Alpaca: An Instruction-Following LLaMA Model. GitHub Repository.
https://github.com/tatsu-lab/stanford_alpaca
[25] Zheng, Y., Zhang, R., Zhang, J., YeYanhan, Y. and Luo, Z. (2024) LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), Bangkok, August 2024, 400-410. [Google Scholar] [CrossRef
[26] Model Scope.
https://www.modelscope.cn/my/overview
[27] Li, H.N., et al. (2023) CMMLU: Measuring Massive Multitask Language Understanding in Chinese.
[28] Huang, Y., et al. (2023) C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models. 37th Conference on Neural Information Processing Systems (NeurIPS 2023), New Orleans, 10-16 December 2023, 2-6.