大语言模型重塑外科未来——大语言模型在外科领域的应用进展
Large Language Models Reshaping the Future of Surgery—Advances in the Application of Large Language Models in the Field of Surgery
DOI: 10.12677/acm.2026.161080, PDF,   
作者: 沈泽林:暨南大学第二临床医学院,深圳市人民医院胸外科,广东 深圳;王光锁*:深圳市人民医院(南方科技大学第一附属医院、暨南大学第二附属医院),胸外科,广东 深圳
关键词: 大型语言模型人工智能外科学教育诊断临床决策支持自然语言处理Large Language Models Artificial Intelligence Surgery Education Diagnosis Clinical Decision Support Natural Language Processing
摘要: 以ChatGPT、Gemini、DeepSeek、通义千问等为代表的大语言模型正蓬勃发展,其应用已渗透至医疗实践的各个领域,将深刻改变未来医院的格局。在胸外科、心脏病学、口腔外科、肾脏病学、骨科、胃肠病学和影像科学等领域,变革尤为迅速。大语言模型在辅助医学文档书写、提供临床决策支持、进行医学健康教育及患者围手术期管理等方面展现出巨大的应用潜力。本文综述了大语言模型在电子病例书写、临床辅助诊断、临床决策支持、患者健康管理、医学教育及科研论文撰写等多个外科相关场景的应用。大语言模型能够高效处理与分析大规模数据集,并具备出色的自然语言理解能力。然而,这些技术的应用仍存在局限性,如模型的“幻觉”现象、潜在的学术不端风险、临床过度依赖、误诊与治疗失误的可能性以及责任归属不清等问题。在充分利用大语言模型益处的同时,我们必须认识并解决这些伦理与实践挑战,以确保其在医学领域的应用是负责任且有效的。
Abstract: Large language models (LLMs), represented by ChatGPT, Gemini, DeepSeek, Tongyi Qianwen, and others, are flourishing. Their applications have penetrated various fields of medical practice and are poised to profoundly reshape the future landscape of hospitals. Transformations are occurring particularly rapidly in fields such as thoracic surgery, cardiology, oral surgery, nephrology, orthopedics, gastroenterology, and imaging sciences. LLMs demonstrate immense application potential in assisting with medical documentation, providing clinical decision support, conducting medical health education, and managing patients during the perioperative period, among other areas. This article reviews the applications of LLMs in various surgery-related scenarios, including electronic medical record writing, clinical auxiliary diagnosis, clinical decision support, patient health management, medical education, and scientific research paper writing. LLMs can efficiently process and analyze large-scale datasets and possess remarkable natural language understanding capabilities. However, the application of these technologies still has limitations, such as model “hallucination,” potential risks of academic misconduct, clinical over-reliance, possibilities of misdiagnosis and treatment errors, as well as unclear attribution of responsibility. While fully leveraging the benefits of LLMs, we must recognize and address these ethical and practical challenges to ensure their application in the medical field is responsible and effective.
文章引用:沈泽林, 王光锁. 大语言模型重塑外科未来——大语言模型在外科领域的应用进展[J]. 临床医学进展, 2026, 16(1): 588-596. https://doi.org/10.12677/acm.2026.161080

参考文献

[1] Hashimoto, D.A., Rosman, G., Rus, D. and Meireles, O.R. (2018) Artificial Intelligence in Surgery: Promises and Perils. Annals of Surgery, 268, 70-76. [Google Scholar] [CrossRef] [PubMed]
[2] Wall, J. and Krummel, T. (2020) The Digital Surgeon: How Big Data, Automation, and Artificial Intelligence Will Change Surgical Practice. Journal of Pediatric Surgery, 55, 47-50. [Google Scholar] [CrossRef] [PubMed]
[3] Zhang, K., Liu, X., Shen, J., Li, Z., Sang, Y., Wu, X., et al. (2020) Clinically Applicable AI System for Accurate Diagnosis, Quantitative Measurements, and Prognosis of COVID-19 Pneumonia Using Computed Tomography. Cell, 181, 1423-1433.e11. [Google Scholar] [CrossRef] [PubMed]
[4] Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., et al. (2020) Language Models Are Few-Shot Learners.
https://arxiv.org/abs/2005.14165
[5] Kung, T.H., Cheatham, M., Medenilla, A., Sillos, C., De Leon, L., Elepaño, C., et al. (2023) Performance of ChatGPT on USMLE: Potential for Ai-Assisted Medical Education Using Large Language Models. PLOS Digital Health, 2, e0000198. [Google Scholar] [CrossRef] [PubMed]
[6] Vaswani, A., et al. (2017) Attention Is All You Need. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 6000-6010.
[7] Thirunavukarasu, A.J., Ting, D.S.J., Elangovan, K., Gutierrez, L., Tan, T.F. and Ting, D.S.W. (2023) Large Language Models in Medicine. Nature Medicine, 29, 1930-1940. [Google Scholar] [CrossRef] [PubMed]
[8] Marwaha, J.S., Raza, M.M. and Kvedar, J.C. (2023) The Digital Transformation of Surgery. NPJ Digital Medicine, 6, Article No. 103. [Google Scholar] [CrossRef] [PubMed]
[9] Maier-Hein, L., Eisenmann, M., Sarikaya, D., März, K., Collins, T., Malpani, A., et al. (2022) Surgical Data Science—From Concepts toward Clinical Translation. Medical Image Analysis, 76, Article ID: 102306. [Google Scholar] [CrossRef] [PubMed]
[10] Ayers, J.W., Poliak, A., Dredze, M., Leas, E.C., Zhu, Z., Kelley, J.B., et al. (2023) Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum. JAMA Internal Medicine, 183, 589-596. [Google Scholar] [CrossRef] [PubMed]
[11] Meskó, B. and Görög, M. (2020) A Short Guide for Medical Professionals in the Era of Artificial Intelligence. NPJ Digital Medicine, 3, Article No. 126. [Google Scholar] [CrossRef] [PubMed]
[12] Liu, T., Hetherington, T.C., Stephens, C., McWilliams, A., Dharod, A., Carroll, T., et al. (2024) AI-Powered Clinical Documentation and Clinicians’ Electronic Health Record Experience: A Nonrandomized Clinical Trial. JAMA Network Open, 7, e2432460. [Google Scholar] [CrossRef] [PubMed]
[13] Patil, A., Serrato, P., Chisvo, N., Arnaout, O., See, P.A. and Huang, K.T. (2024) Large Language Models in Neurosurgery: A Systematic Review and Meta-Analysis. Acta Neurochirurgica, 166, Article No. 475. [Google Scholar] [CrossRef] [PubMed]
[14] Arndt, B.G., Beasley, J.W., Watkinson, M.D., Temte, J.L., Tuan, W., Sinsky, C.A., et al. (2017) Tethered to the EHR: Primary Care Physician Workload Assessment Using EHR Event Log Data and Time-Motion Observations. The Annals of Family Medicine, 15, 419-426. [Google Scholar] [CrossRef] [PubMed]
[15] Shanafelt, T.D., Hasan, O., Dyrbye, L.N., Sinsky, C., Satele, D., Sloan, J., et al. (2015) Changes in Burnout and Satisfaction with Work-Life Balance in Physicians and the General US Working Population between 2011 and 2014. Mayo Clinic Proceedings, 90, 1600-1613. [Google Scholar] [CrossRef] [PubMed]
[16] Huang, H., Zheng, O., Wang, D., Yin, J., Wang, Z., Ding, S., et al. (2023) ChatGPT for Shaping the Future of Dentistry: The Potential of Multi-Modal Large Language Model. International Journal of Oral Science, 15, Article No. 29. [Google Scholar] [CrossRef] [PubMed]
[17] Di Ieva, A., Stewart, C. and Suero Molina, E. (2024) Large Language Models in Neurosurgery. In: Di Ieva, A., Ed., Artificial Intelligence in Clinical Neurosciences, Springer, 177-198. [Google Scholar] [CrossRef] [PubMed]
[18] Hurley, E.T., Crook, B.S., Lorentz, S.G., Danilkowicz, R.M., Lau, B.C., Taylor, D.C., et al. (2024) Evaluation High-Quality of Information from ChatGPT (Artificial Intelligence—Large Language Model) Artificial Intelligence on Shoulder Stabilization Surgery. Arthroscopy: The Journal of Arthroscopic & Related Surgery, 40, 726-731.e6. [Google Scholar] [CrossRef] [PubMed]
[19] Chacko, R.S., Chacko, S.M., Srinivasan, G., Davis, M. and LeBoeuf, M. (2025) Automated Note Generation for Mohs Micrographic Surgery Using a Large Language Model: A Retrospective Cohort Study. Journal of the American Academy of Dermatology, 93, 1077-1079. [Google Scholar] [CrossRef] [PubMed]
[20] Bhayana, R., Alwahbi, O., Ladak, A.M., Deng, Y., Basso Dias, A., Elbanna, K., et al. (2025) Leveraging Large Language Models to Generate Clinical Histories for Oncologic Imaging Requisitions. Radiology, 314, e242134. [Google Scholar] [CrossRef] [PubMed]
[21] Gong, E.J., Bang, C.S., Lee, J.J., Park, J., Kim, E., Kim, S., et al. (2024) Large Language Models in Gastroenterology: Systematic Review. Journal of Medical Internet Research, 26, e66648. [Google Scholar] [CrossRef] [PubMed]
[22] Wu, C., Liu, W., Mei, P., Liu, Y., Cai, J., Liu, L., et al. (2025) The Large Language Model Diagnoses Tuberculous Pleural Effusion in Pleural Effusion Patients through Clinical Feature Landscapes. Respiratory Research, 26, Article No. 52. [Google Scholar] [CrossRef] [PubMed]
[23] Kaygisiz, Ö.F. and Teke, M.T. (2025) Can Deepseek and ChatGPT Be Used in the Diagnosis of Oral Pathologies? BMC Oral Health, 25, Article No. 638. [Google Scholar] [CrossRef] [PubMed]
[24] Srivastav, S., Chandrakar, R., Gupta, S., Babhulkar, V., Agrawal, S., Jaiswal, A., et al. (2023) ChatGPT in Radiology: The Advantages and Limitations of Artificial Intelligence for Medical Imaging Diagnosis. Cureus, 15, e41435. [Google Scholar] [CrossRef] [PubMed]
[25] Yang, X., Li, T., Wang, H., Zhang, R., Ni, Z., Liu, N., et al. (2025) Multiple Large Language Models versus Experienced Physicians in Diagnosing Challenging Cases with Gastrointestinal Symptoms. NPJ Digital Medicine, 8, Article No. 85. [Google Scholar] [CrossRef] [PubMed]
[26] Sandmann, S., Hegselmann, S., Fujarski, M., Bickmann, L., Wild, B., Eils, R., et al. (2025) Benchmark Evaluation of DeepSeek Large Language Models in Clinical Decision-Making. Nature Medicine, 31, 2546-2549. [Google Scholar] [CrossRef] [PubMed]
[27] Thirunavukarasu, A.J., Hassan, R., Mahmood, S., Sanghera, R., Barzangi, K., El Mukashfi, M., et al. (2023) Trialling a Large Language Model (ChatGPT) in General Practice with the Applied Knowledge Test: Observational Study Demonstrating Opportunities and Limitations in Primary Care. JMIR Medical Education, 9, e46599. [Google Scholar] [CrossRef] [PubMed]
[28] Choi, J. (2025) Artificial Intelligence in Surgery Research: Successfully Implementing AI Clinical Decision Support Models. Journal of Trauma and Acute Care Surgery, 99, 518-521. [Google Scholar] [CrossRef] [PubMed]
[29] Liang, B., Gao, Y., Wang, T., Zhang, L. and Wang, Q. (2025) Multimodal Large Language Models Address Clinical Queries in Laryngeal Cancer Surgery: A Comparative Evaluation of Image Interpretation across Different Models. International Journal of Surgery, 111, 2727-2730. [Google Scholar] [CrossRef] [PubMed]
[30] Palenzuela, D.L., Mullen, J.T. and Phitayakorn, R. (2024) AI versus MD: Evaluating the Surgical Decision-Making Accuracy of ChatGPT-4. Surgery, 176, 241-245. [Google Scholar] [CrossRef] [PubMed]
[31] Xu, M., Huang, Z., Zhang, J., Zhang, X. and Dou, Q. (2025) Surgical Action Planning with Large Language Models. 28th International Conference MICCAI 2025, Daejeon, 23-27 September 2025, 563-572. [Google Scholar] [CrossRef
[32] Yu, P., Fang, C., Liu, X., Fu, W., Ling, J., Yan, Z., et al. (2024) Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study. JMIR Medical Education, 10, e48514. [Google Scholar] [CrossRef] [PubMed]
[33] Long, C., Lowe, K., Zhang, J., Santos, A.d., Alanazi, A., O'Brien, D., et al. (2024) A Novel Evaluation Model for Assessing ChatGPT on Otolaryngology—Head and Neck Surgery Certification Examinations: Performance Study. JMIR Medical Education, 10, e49970. [Google Scholar] [CrossRef] [PubMed]
[34] Prazeres, F. (2025) ChatGPT’s Performance on Portuguese Medical Examination Questions: Comparative Analysis of ChatGPT-3.5 Turbo and ChatGPT-4o Mini. JMIR Medical Education, 11, e65108-e65108. [Google Scholar] [CrossRef] [PubMed]
[35] Maruyama, H., Toyama, Y., Takanami, K., Takase, K. and Kamei, T. (2025) Role of Artificial Intelligence in Surgical Training by Assessing GPT-4 and GPT-4o on the Japan Surgical Board Examination with Text-Only and Image-Accompanied Questions: Performance Evaluation Study. JMIR Medical Education, 11, e69313-e69313. [Google Scholar] [CrossRef] [PubMed]
[36] Park, J.J., Tiefenbach, J. and Demetriades, A.K. (2022) The Role of Artificial Intelligence in Surgical Simulation. Frontiers in Medical Technology, 4, Article ID: 1076755. [Google Scholar] [CrossRef] [PubMed]
[37] Azari, D.P., Frasier, L.L., Quamme, S.R.P., Greenberg, C.C., Pugh, C.M., Greenberg, J.A., et al. (2019) Modeling Surgical Technical Skill Using Expert Assessment for Automated Computer Rating. Annals of Surgery, 269, 574-581. [Google Scholar] [CrossRef] [PubMed]
[38] Wu, C., Chen, L., Han, M., Li, Z., Yang, N. and Yu, C. (2024) Application of ChatGPT-Based Blended Medical Teaching in Clinical Education of Hepatobiliary Surgery. Medical Teacher, 47, 445-449. [Google Scholar] [CrossRef] [PubMed]
[39] Wang, B., Tian, Y. and Wang, X.T. (2025) An Exploratory Comparison of AI Models for Preoperative Anesthesia Planning: Assessing ChatGPT-4o, Claude 3.5 Sonnet, and ChatGPT-O1 in Clinical Scenario Analysis. Journal of Medical Systems, 49, Article No. 104. [Google Scholar] [CrossRef] [PubMed]
[40] Ramamurthi, A., Neupane, B., Deshpande, P., Hanson, R., Vegesna, S., Cray, D., et al. (2025) Applying Large Language Models for Surgical Case Length Prediction. JAMA Surgery, 160, 894-902. [Google Scholar] [CrossRef] [PubMed]
[41] Srinivasan, N., Samaan, J.S., Rajeev, N.D., Kanu, M.U., Yeo, Y.H. and Samakar, K. (2024) Large Language Models and Bariatric Surgery Patient Education: A Comparative Readability Analysis of GPT-3.5, GPT-4, Bard, and Online Institutional Resources. Surgical Endoscopy, 38, 2522-2532. [Google Scholar] [CrossRef] [PubMed]
[42] Lee, J., Byun, H.K., Kim, Y.T., Shin, J. and Kim, Y.B. (2025) A Study on Breast Cancer Patient Care Using Chatbot and Video Education for Radiation Therapy: A Randomized Controlled Trial. International Journal of Radiation Oncology Biology Physics, 122, 84-92. [Google Scholar] [CrossRef] [PubMed]
[43] Holland, A.M., Lorenz, W.R., Cavanagh, J.C., Smart, N.J., Ayuso, S.A., Scarola, G.T., et al. (2024) Comparison of Medical Research Abstracts Written by Surgical Trainees and Senior Surgeons or Generated by Large Language Models. JAMA Network Open, 7, e2425373. [Google Scholar] [CrossRef] [PubMed]
[44] Cao, C., Sang, J., Arora, R., Chen, D., Kloosterman, R., Cecere, M., et al. (2025) Development of Prompt Templates for Large Language Model-Driven Screening in Systematic Reviews. Annals of Internal Medicine, 178, 389-401. [Google Scholar] [CrossRef] [PubMed]
[45] Stadler, R.D., Sudah, S.Y., Moverman, M.A., Denard, P.J., Duralde, X.A., Garrigues, G.E., et al. (2025) Identification of ChatGPT-Generated Abstracts within Shoulder and Elbow Surgery Poses a Challenge for Reviewers. Arthroscopy: The Journal of Arthroscopic & Related Surgery, 41, 916-924.e2. [Google Scholar] [CrossRef] [PubMed]
[46] Khan, M.A., Ayub, U., Naqvi, S.A.A., Khakwani, K.Z.R., Sipra, Z.b.R., Raina, A., et al. (2025) Collaborative Large Language Models for Automated Data Extraction in Living Systematic Reviews. Journal of the American Medical Informatics Association, 32, 638-647. [Google Scholar] [CrossRef] [PubMed]