主流大语言模型在输液港健康科普中的表现 评估:可读性与质量分析
Evaluation of Mainstream Large Language Models in Health Education on Infusion Port Care: Readability and Quality Analysis
摘要: 目的:系统评估主流大型语言模型(LLMs)生成植入式静脉输液港(IVAPs)科普文本的可读性与专业质量,为临床健康教育工具的选择提供循证依据。方法:选取5款主流通用大模型(GPT-5、豆包、深度求索、通义千问、文心一言),针对输液港5大核心主题生成100篇科普文本。采用7项国际通用可读性指标,结合中文版患者教育材料评估工具(C-PEMAT-P)和全球质量量表(GQS),对文本进行多维度量化分析。结果:不同模型生成文本的专业质量存在极显著差异(P < 0.001),GPT-5综合表现最优,豆包和深度求索紧随其后;各模型文本的可读性同样存在显著差异。科普主题的复杂度会显著影响文本可读性,但不会改变高质量模型的内容产出水准,证实可读性与质量是两个相对独立的评价维度。结论:模型类型是决定输液港科普文本质量的核心因素,临床应用优先推荐GPT-5、豆包和深度求索。医护人员应优选高质量模型生成专业内容,再结合患者认知特点进行针对性的可读性优化,实现医学专业性与大众易懂性的平衡。
Abstract: Objective: To systematically evaluate the readability and professional quality of health education texts on implantable venous access ports (IVAPs) generated by mainstream large language models (LLMs), thereby providing an evidence-based basis for selecting clinical health education tools. Methods: A total of 100 texts were generated by 5 widely available LLMs (GPT-5, Doubao, DeepSeek, Tongyi Qianwen, Wenxin Yiyan), covering 5 core education themes related to IVAPs. Seven internationally validated readability indices, the Chinese version of the Patient Education Materials Assessment Tool (C-PEMAT-P), and the Global Quality Scale (GQS) were used for multidimensional quantitative analysis of the texts. Results: Significant differences in text quality were found among the models (P < 0.001), with GPT-5 achieving the highest overall performance, followed by Doubao and DeepSeek; significant differences in text readability were also observed across models. Topic complexity significantly affected readability but not the quality of content produced by high-performing models. Topic complexity significantly affected readability but not quality, indicating that readability and quality are relatively independent evaluation dimensions. Conclusion: Model type is the key determinant of text quality on IVAPs, with GPT-5, Doubao, and DeepSeek being the optimal choices. Healthcare providers should prioritize high-quality models for professional content generation, then optimize readability based on patients’ cognitive characteristics to balance medical accuracy and public comprehensibility.
文章引用:黄俊豪, 刘艳玲, 杨永刚, 杨猛, 叶文彬, 赖雁玲. 主流大语言模型在输液港健康科普中的表现 评估:可读性与质量分析[J]. 临床医学进展, 2026, 16(6): 2655-2665. https://doi.org/10.12677/acm.2026.1662488

参考文献

[1] 刘鹏, 吴巍巍. 静脉输液港植入与管理多学科专家共识(2023版) [J]. 中国普通外科杂志, 2023, 32(6): 799-814.
[2] 何越, 孙艳萍, 李宁, 沈继龙. 血液恶性肿瘤患者应用PICC与植入式静脉输液港的效果比较[J]. 中华护理杂志, 2012, 47(11): 1001-1003.
[3] 王建新, 唐甜甜, 谢艳丽. 植入式静脉输液港常见并发症的临床分析[J]. 护士进修杂志, 2012, 27(10): 958-960.
[4] Paterick, T.E., Patel, N., Tajik, A.J. and Chandrasekaran, K. (2017) Improving Health Outcomes through Patient Education and Partnerships with Patients. Baylor University Medical Center Proceedings, 30, 112-113. [Google Scholar] [CrossRef] [PubMed]
[5] Daraz, L., Morrow, A.S., Ponce, O.J., Farah, W., Katabi, A., Majzoub, A., et al. (2018) Readability of Online Health Information: A Meta-Narrative Systematic Review. American Journal of Medical Quality, 33, 487-492. [Google Scholar] [CrossRef] [PubMed]
[6] Radford, A., Narasimhan, K., Salimans, T., et al. (2018) Improving Language Understanding by Generative Pre-Training. OpenAI. (Preprint)
[7] OpenAI (2023) GPT-4 Technical Report. arXiv: 2303.08774.
https://arxiv.org/abs/2303.08774
[8] Liu, S., McCoy, A.B. and Wright, A. (2025) Improving Large Language Model Applications in Biomedicine with Retrieval-Augmented Generation: A Systematic Review, Meta-Analysis, and Clinical Development Guidelines. Journal of the American Medical Informatics Association, 32, 605-615. [Google Scholar] [CrossRef] [PubMed]
[9] Parameswaran, V., Bernard, J., Bernard, A., Deo, N., Tsung, S., Lyytinen, K., et al. (2025) Evaluating Large Language Models and Retrieval-Augmented Generation Enhancement for Delivering Guideline-Adherent Nutrition Information for Cardiovascular Disease Prevention: Cross-Sectional Study. Journal of Medical Internet Research, 27, e78625. [Google Scholar] [CrossRef
[10] DuBay, W.H. (2004) The Principles of Readability (ED490073). ERIC.
[11] Perni, S., Rooney, M.K., Horowitz, D.P., Golden, D.W., McCall, A.R., Einstein, A.J., et al. (2019) Assessment of Use, Specificity, and Readability of Written Clinical Informed Consent Forms for Patients with Cancer Undergoing Radiotherapy. JAMA Oncology, 5, e190260. [Google Scholar] [CrossRef] [PubMed]
[12] Shoemaker, S.J., Wolf, M.S. and Brach, C. (2014) Development of the Patient Education Materials Assessment Tool (PEMAT): A New Measure of Understandability and Actionability for Print and Audiovisual Patient Information. Patient Education and Counseling, 96, 395-403. [Google Scholar] [CrossRef] [PubMed]
[13] Bernard, A., Langille, M., Hughes, S., Rose, C., Leddin, D. and Veldhuyzen van Zanten, S. (2007) A Systematic Review of Patient Inflammatory Bowel Disease Information Resources on the World Wide Web. The American Journal of Gastroenterology, 102, 2070-2077. [Google Scholar] [CrossRef] [PubMed]
[14] 肖仰华, 徐一丹. 大规模生成式语言模型在医疗领域的应用: 机遇与挑战[J]. 医学信息学杂志, 2023, 44(9): 1-11.
[15] 王蕾, 汪秋伊, 李星, 等. 网络健康信息可读性评估研究现状及展望[J]. 医学信息学杂志, 2020, 41(12): 20-25, 40.
[16] Lee, H., Kim, S., Kim, S., Seo, J., Kim, W.H., Kim, J., et al. (2025) Readability versus Accuracy in LLM-Transformed Radiology Reports: Stakeholder Preferences across Reading Grade Levels. La Radiologia Medica, 130, 1986-1999. [Google Scholar] [CrossRef
[17] Hovingh, J.W., Elderson-van Duin, C., Kuipers, D.A., van Rood, Y., Ludden, G.D.S., Hanssen, D.J.C., et al. (2025) Tailoring for Health Literacy in the Design and Development of Ehealth Interventions: Systematic Review. JMIR Human Factors, 12, e76172. [Google Scholar] [CrossRef
[18] Basso, I., El Motarajji, S., Ferrari, M., Airoldi, C., Durante, A., Brovarone, S., et al. (2026) The Effectiveness of a Multimedia Education versus a Standard Education Program in the Self-Management of Central Venous Catheters for Long-Term Use: A Systematic Review. The Journal of Vascular Access, 27, 885-894. [Google Scholar] [CrossRef
[19] 蒋璐璐, 王喜益, 徐洁慧, 等. 智能交互式护理信息支持系统的构建及在乳腺癌患者中的应用研究[J]. 中华护理杂志, 2023, 58(6): 654-661.