大语言模型在口腔医学的应用
Application of Large Language Models in Stomatology
DOI: 10.12677/acm.2026.1651874, PDF,    科研立项经费支持
作者: 冯铖煜, 朱海华*:浙江大学医学院附属口腔医院,浙江大学口腔医学院,浙江省口腔疾病临床医学研究中心,浙江省口腔生物医学研究重点实验室,浙江 杭州
关键词: 大语言模型口腔医学人工智能Large Language Models Stomatology Artificial Intelligence
摘要: 目的:系统梳理大语言模型(Large language models, LLMs)在口腔医学领域的应用现状、技术进展与未来发展方向,为口腔医学的智能化转型提供参考。方法:通过文献综述与分析,从临床诊断与治疗规划、患者教育与沟通、口腔医学教育三个核心应用领域,系统归纳LLMs的应用效果与性能表现,并深入探讨其在口腔医学应用中面临的挑战与未来发展方向。结果:LLMs在口腔医学多个领域展现出巨大潜力。在临床诊断方面,通过微调可显著提升诊断准确性(如多生牙检测准确率从63%提升至91%),在病理诊断中与专家一致性最高可达68.6%;在治疗规划中,模型在遵循临床指南方面准确率可达80%,但整体性能仍不稳定且存在异质性。在患者教育与沟通方面,LLMs回答问题的综合准确率为81.87%,并能生成科学可靠的科普信息与知情同意材料。在口腔医学教育中,LLMs作为辅助工具,在多项目牙科考试中的表现已达到甚至超过考生平均水平。然而,当前应用面临准确性波动、“幻觉”现象、数据隐私风险、模型“黑箱”问题以及缺乏标准化评估框架等主要挑战。结论:LLMs在口腔医学中具有广阔的应用前景,其定位应是辅助而非替代临床决策。未来,通过发展多模态模型、建立统一评估基准、增强模型可解释性以及开展前瞻性临床验证,LLMs有望成为可靠的医生助手,助力口腔医学服务体系向更高效、高质量的方向发展。
Abstract: Objective: This paper aims to systematically review the current state, technological advancements, and future directions of large language models (LLMs) in the field of stomatology, providing insights for the intelligent transformation of oral healthcare. Methods: A comprehensive literature review was conducted to synthesize applications and performance of LLMs across three core domains: clinical diagnosis and treatment planning, patient education and communication, and dental education. The analysis also focused on identifying the primary challenges and outlining future development pathways for integrating LLMs into stomatology. Results: LLMs have demonstrated significant potential across multiple facets of stomatology. In clinical diagnosis, domain-specific fine-tuning substantially improved performance, for instance, increasing the accuracy of supernumerary tooth detection from 63% to 91%. In pathological diagnosis, the highest level of agreement with experts reached 68.6%. For treatment planning, LLMs showed an 80% accuracy in adhering to clinical guidelines; however, overall performance remained inconsistent and was assessed using heterogeneous methods. In patient education and communication, LLMs achieved a pooled accuracy of 81.87% in answering patient inquiries and generated scientifically reliable information and informed consent materials. Within dental education, LLMs served as powerful adjunct tools, performing at or above the average level of human examinees on various dental board examinations. Despite these promising results, several critical challenges persist, including performance variability, the propensity to generate inaccurate “hallucinations”, data privacy concerns, the inherent “black box” nature of the models, and the absence of a standardized evaluation framework. Conclusion: LLMs hold immense promise for stomatology, positioned as assistive tools to augment rather than replace clinical expertise. Future progress hinges on the development of multimodal models, the establishment of unified and reliable benchmarks, advancements in model interpretability, and rigorous prospective clinical validation. With these advances, LLMs are poised to evolve into reliable assistants for clinicians, ultimately facilitating a more efficient and high-quality oral healthcare delivery system.
文章引用:冯铖煜, 朱海华. 大语言模型在口腔医学的应用[J]. 临床医学进展, 2026, 16(5): 786-793. https://doi.org/10.12677/acm.2026.1651874

参考文献

[1] Liu, M., Okuhara, T., Huang, W., Ogihara, A., Nagao, H.S., Okada, H., et al. (2025) Large Language Models in Dental Licensing Examinations: Systematic Review and Meta-Analysis. International Dental Journal, 75, 213-222. [Google Scholar] [CrossRef] [PubMed]
[2] Meskó, B. (2023) The Impact of Multimodal Large Language Models on Health Care’s Future. Journal of Medical Internet Research, 25, e52865. [Google Scholar] [CrossRef] [PubMed]
[3] 邓旭亮, 徐明明, 杜宸临, 等. 人工智能驱动口腔医学: 临床、科研、教学与管理的创新探索[J]. 北京大学学报(医学版), 2025, 57(5): 821-826.
[4] Ojha, M. and Jawarker, R. (2025) Chat Generative Pretrained Transformer: The Present Scenario in Dentistry and What Lies Beyond. SRM Journal of Research in Dental Sciences, 16, 243-246. [Google Scholar] [CrossRef
[5] Ronsivalle, V., Santonocito, S., Cammarata, U., Lo Muzio, E. and Cicciù, M. (2025) Current Applications of Chatbots Powered by Large Language Models in Oral and Maxillofacial Surgery: A Systematic Review. Dentistry Journal, 13, Article 261. [Google Scholar] [CrossRef] [PubMed]
[6] Sahoo, H.S., Nivedha, R., Nirubama, R. and Kamaleswar, Y. (2025) Performance and Potential of Large Language Models in Restorative Dentistry, Endodontic Diagnosis, and Education: A Systematic Review. Saudi Endodontic Journal, 16, 1-11. [Google Scholar] [CrossRef
[7] Abdul, N.S., Shivakumar, G.C., Sangappa, S.B., Di Blasio, M., Crimi, S., Cicciù, M., et al. (2024) Applications of Artificial Intelligence in the Field of Oral and Maxillofacial Pathology: A Systematic Review and Meta-Analysis. BMC Oral Health, 24, Article No. 122. [Google Scholar] [CrossRef] [PubMed]
[8] Tassoker, M. (2025) Exploring ChatGPT’s Potential in Diagnosing Oral and Maxillofacial Pathologies: A Study of 123 Challenging Cases. BMC Oral Health, 25, Article No. 1187. [Google Scholar] [CrossRef] [PubMed]
[9] Alvarez-Silberberg, V.I., Alvarez-Silberberg, C.P., Galletti, C., Flores-Fraile, J., Galletti, C., Ramirez, V., et al. (2026) Comparative Analysis of Large Language Models as Decision Support Tools in Oral Pathology. Scientific Reports, 16, Article No. 11272. [Google Scholar] [CrossRef
[10] Mohammad-Rahimi, H., Khoury, Z.H., Alamdari, M.I., Rokhshad, R., Motie, P., Parsa, A., et al. (2024) Performance of AI Chatbots on Controversial Topics in Oral Medicine, Pathology, and Radiology. Oral Surgery, Oral Medicine, Oral Pathology and Oral Radiology, 137, 508-514. [Google Scholar] [CrossRef] [PubMed]
[11] Aşar, E.M., İpek, İ. and Bi̇lge, K. (2025) Customized GPT-4V(Ision) for Radiographic Diagnosis: Can Large Language Model Detect Supernumerary Teeth? BMC Oral Health, 25, Article No. 756. [Google Scholar] [CrossRef] [PubMed]
[12] Kim, K. and Kim, B.C. (2025) Diagnostic Performance of Large Language Models in Multimodal Analysis of Radiolucent Jaw Lesions. International Dental Journal, 75, Article 103910. [Google Scholar] [CrossRef
[13] Surana, P., P. Ostwal, P., Vishal Dev, S., Tiwari, J., Charan Yadav, K.S. and Renuka, G. (2024) Role of ChatGPT in Dentistry: A Review. Research Journal of Pharmacy and Technology, 17, 3489-3491. [Google Scholar] [CrossRef
[14] Makrygiannakis, M.A., Giannakopoulos, K. and Kaklamanos, E.G. (2025) Evidence-Based Potential of Generative Artificial Intelligence Large Language Models in Orthodontics: A Comparative Study of ChatGPT, Google Bard, and Microsoft Bing. European Journal of Orthodontics, 48, cjae017. [Google Scholar] [CrossRef] [PubMed]
[15] Rewthamrongsris, P., Burapacheep, J., Trachoo, V. and Porntaveetus, T. (2025) Accuracy of Large Language Models for Infective Endocarditis Prophylaxis in Dental Procedures. International Dental Journal, 75, 206-212. [Google Scholar] [CrossRef] [PubMed]
[16] Tokgöz Kaplan, T. and Cankar, M. (2025) Evidence‐Based Potential of Generative Artificial Intelligence Large Language Models on Dental Avulsion: ChatGPT versus Gemini. Dental Traumatology, 41, 178-186. [Google Scholar] [CrossRef] [PubMed]
[17] Camargo, E.S., Quadras, I.C.C., Garanhani, R.R., de Araujo, C.M. and Stuginski‐Barbosa, J. (2025) A Comparative Analysis of Three Large Language Models on Bruxism Knowledge. Journal of Oral Rehabilitation, 52, 896-903. [Google Scholar] [CrossRef] [PubMed]
[18] Zhang, R., Pan, Y., Liu, Y., Deng, Y. and Pow, E.H.N. (2025) Leveraging Large Language Models for Patient Instructions in Dentistry—A Systematic Review and Meta-Analysis. Journal of Prosthodontics. [Google Scholar] [CrossRef
[19] Helvacioglu-Yigit, D., Demirturk, H., Ali, K., Tamimi, D., Koenig, L. and Almashraqi, A. (2025) Evaluating Artificial Intelligence Chatbots for Patient Education in Oral and Maxillofacial Radiology. Oral Surgery, Oral Medicine, Oral Pathology and Oral Radiology, 139, 750-759. [Google Scholar] [CrossRef] [PubMed]
[20] Çi̇ti̇r, M. (2025) ChatGPT and Oral Cancer: A Study on Informational Reliability. BMC Oral Health, 25, Article No. 86. [Google Scholar] [CrossRef] [PubMed]
[21] Aguiar de Sousa, R., Costa, S.M., Almeida Figueiredo, P.H., Camargos, C.R., Ribeiro, B.C. and Alves e Silva, M.R.M. (2024) Is ChatGPT a Reliable Source of Scientific Information Regarding Third-Molar Surgery? The Journal of the American Dental Association, 155, 227-232.e6. [Google Scholar] [CrossRef] [PubMed]
[22] Alhaidry, H.M., Fatani, B., Alrayes, J.O., Almana, A.M. and Alfhaed, N.K. (2023) ChatGPT in Dentistry: A Comprehensive Review. Cureus, 15, e38317. [Google Scholar] [CrossRef] [PubMed]
[23] Nguyen, V.A., Vuong, T.Q.T. and Nguyen, V.H. (2025) Benchmarking Large-Language-Model Vision Capabilities in Oral and Maxillofacial Anatomy: A Cross-Sectional Study. PLOS One, 20, e0335775. [Google Scholar] [CrossRef
[24] Claman, D. and Sezgin, E. (2024) Artificial Intelligence in Dental Education: Opportunities and Challenges of Large Language Models and Multimodal Foundation Models. JMIR Medical Education, 10, e52346-e52346. [Google Scholar] [CrossRef] [PubMed]
[25] Thorat, V.A., Rao, P., Joshi, N., Talreja, P. and Shetty, A. (2024) The Role of Chatbot GPT Technology in Undergraduate Dental Education. Cureus, 16, e54193. [Google Scholar] [CrossRef] [PubMed]
[26] Benítez, T.M., Xu, Y., Boudreau, J.D., Kow, A.W.C., Bello, F., Van Phuoc, L., et al. (2024) Harnessing the Potential of Large Language Models in Medical Education: Promise and Pitfalls. Journal of the American Medical Informatics Association, 31, 776-783. [Google Scholar] [CrossRef] [PubMed]
[27] Künzle, P. and Paris, S. (2024) Performance of Large Language Artificial Intelligence Models on Solving Restorative Dentistry and Endodontics Student Assessments. Clinical Oral Investigations, 28, Article No. 575. [Google Scholar] [CrossRef] [PubMed]
[28] Al-Moghrabi, D., Abu Arqub, S., Maroulakos, M.P., Pandis, N. and Fleming, P.S. (2024) Can ChatGPT Identify Predatory Biomedical and Dental Journals? A Cross-Sectional Content Analysis. Journal of Dentistry, 142, Article 104840. [Google Scholar] [CrossRef] [PubMed]
[29] Freire, Y., Santamaría Laorden, A., Orejas Pérez, J., Gómez Sánchez, M., Díaz-Flores García, V. and Suárez, A. (2024) ChatGPT Performance in Prosthodontics: Assessment of Accuracy and Repeatability in Answer Generation. The Journal of Prosthetic Dentistry, 131, 659.e1-659.e6. [Google Scholar] [CrossRef] [PubMed]
[30] Ohta, K. and Ohta, S. (2023) The Performance of GPT-3.5, GPT-4, and Bard on the Japanese National Dentist Examination: A Comparison Study. Cureus, 15, e50369. [Google Scholar] [CrossRef] [PubMed]
[31] Buldur, M. and Sezer, B. (2024) Evaluating the Accuracy of Chat Generative Pre-Trained Transformer Version 4 (ChatGPT-4) Responses to United States Food and Drug Administration (FDA) Frequently Asked Questions about Dental Amalgam. BMC Oral Health, 24, Article No. 605. [Google Scholar] [CrossRef] [PubMed]
[32] Hager, P., Jungmann, F., Holland, R., Bhagat, K., Hubrecht, I., Knauer, M., et al. (2024) Evaluation and Mitigation of the Limitations of Large Language Models in Clinical Decision-Making. Nature Medicine, 30, 2613-2622. [Google Scholar] [CrossRef] [PubMed]
[33] Zhu, G., Zhang, X. and Chen, C. (2025) Assessing and Enhancing the Reliability of Chinese Large Language Models in Dental Implantology. BMC Oral Health, 25, Article No. 1242. [Google Scholar] [CrossRef] [PubMed]
[34] Hooshiar, M.H. (2025) Artificial Intelligence Reliability in Implant Dentistry: A Comparative Analysis of Clinical Accuracy and Hallucination Patterns across Multiple Language Models. The Journal of Prosthetic Dentistry, Articles in Press. [Google Scholar] [CrossRef
[35] Dwivedi, Y.K., Kshetri, N., Hughes, L., Slade, E.L., Jeyaraj, A., Kar, A.K., et al. (2023) Opinion Paper: “So What If ChatGPT Wrote It?” Multidisciplinary Perspectives on Opportunities, Challenges and Implications of Generative Conversational AI for Research, Practice and Policy. International Journal of Information Management, 71, Article 102642. [Google Scholar] [CrossRef
[36] Baskar, S., Bhuvana, R. and Hemalatha, R.J. (2024) Introduction to Smart Hospital. Wiley Online Library.
[37] Gao, Y., Luo, W., Wang, C., Ahmad, N.S., Wang, X. and Goh, P. (2026) A Privacy-Preserving Multi-User Retrieval System for Multimodal Artificial Intelligence. Scientific Reports, 16, Article No. 10348. [Google Scholar] [CrossRef
[38] Kumar, J., Almustafa, K.M., Madanat, R., Sharma, A.K., Sutcu, M. and Katrib, J. (2025) Privacy-Aware and Interpretable Deep Learning Framework for Dental Caries Classification. Intelligence-Based Medicine, 12, Article 100294. [Google Scholar] [CrossRef
[39] Sallam, M. (2023) ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthcare, 11, Article 887. [Google Scholar] [CrossRef] [PubMed]
[40] Consorti, G., Catarzi, L., Frosolini, A., Vaira, L.A., Committeri, U. and Cirignaco, G. (2026) Artificial Intelligence in Oral and Maxillofacial Surgery: A Scoping Review of Clinical Applications, Ethical Challenges, and Legal Considerations. International Journal of Oral and Maxillofacial Surgery, Articles in Press. [Google Scholar] [CrossRef
[41] Ahn, C. (2023) Exploring ChatGPT for Information of Cardiopulmonary Resuscitation. Resuscitation, 185, Article 109729. [Google Scholar] [CrossRef] [PubMed]
[42] AlSaad, R., Abd-Alrazaq, A., Boughorbel, S., Ahmed, A., Renault, M., Damseh, R., et al. (2024) Multimodal Large Language Models in Health Care: Applications, Challenges, and Future Outlook. Journal of Medical Internet Research, 26, e59505. [Google Scholar] [CrossRef] [PubMed]
[43] Huang, Y., Yuan, Q., Sheng, X., Yang, Z., Wu, H., Chen, P., Yang, Y., Li, L. and Lin, W. (2024) AesBench: An Expert Benchmark for Multi-Modal Large Language Models on Image Aesthetics Perception. arXiv:2401.08276.
[44] Zhao, H., Chen, H., Yang, F., Liu, N., Deng, H., Cai, H., et al. (2024) Explainability for Large Language Models: A Survey. ACM Transactions on Intelligent Systems and Technology, 15, 1-38. [Google Scholar] [CrossRef
[45] 孙运梁, 王璨. 医疗人工智能介入过失诊疗行为的结果归属研究[J]. 江西师范大学学报(哲学社会科学版), 2025, 58(5): 137-148.