基于视觉–语言联合建模与LoRA微调的医疗废弃物检测模型
Medical Waste Detection Model Based on Vision-Language Joint Modeling and LoRA Fine-Tuning
DOI: 10.12677/airr.2025.143053, PDF,   
作者: 刘 奥, 赵海峰*, 曾 耀, 李 卓, 孙 强:沈阳工业大学化工装备学院,辽宁 辽阳;王孟飞:上海理工大学健康科学与工程学院,上海
关键词: 医疗废弃物GroundingDINO开放集检测深度学习参数高效微调Medical Waste GroundingDINO Open-Set Detection Deep Learning Efficient Fine-Tuning of Parameters
摘要: 针对现有医疗废弃物分类模型在开放场景下存在小目标漏检率高、多类别混淆严重等问题,本文提出融合视觉–语言联合建模的改进型GroundingDINO模型。为了增强有效特征和精确位置信息的提取,并减少无效信息的干扰,在模型中构建了跨模态对比学习框架,结合低秩适配技术(Low-Rank Adaptation, LoRA),对模型进行了轻量级优化,使其能够在保证高精度的同时,减少计算资源消耗。并引入EIoU (Enhanced IoU)损失函数,进一步提升了目标框的定位精度,并增强了模型在复杂医疗废弃物分类任务中的鲁棒性。结果表明,在依据国家医疗废弃物管理条例构建覆盖5大类20子类的医疗废弃物图像数据集上取得了良好效果,相比于基线模型GroundingDINO,以及阿里云发布的视觉理解大模型Qwen2.5-vl-72B,本实验基于GroundingDINO微调的GroundingDINO-MW在精确度、召回率、mAP以及F1指标上全面超越这几个检测模型。也充分证明了相较于原始模型可以更好地用在开放场景下的医疗废弃物分类识别中。
Abstract: To address the issues of high missed detection rate for small targets and severe multi-category confusion in existing medical waste classification models under open scenarios, this study proposes an improved GroundingDINO model incorporating visual-language joint modeling. The model constructs a cross-modal contrastive learning framework to enhance effective feature extraction and precise positional information while suppressing irrelevant interference. It implements lightweight optimization through Low-Rank Adaptation (LoRA) technology, achieving high accuracy with reduced computational resource consumption. The introduction of Enhanced IoU (EIoU) loss function further improves bounding box localization accuracy and enhances model robustness in complex medical waste classification tasks. Experimental results demonstrate superior performance in constructing a medical waste image dataset covering 5 major categories and 20 subclasses based on the national medical waste management regulations. Compared with baseline model GroundingDINO and Alibaba Cloud’s Qwen2.5-vl-72B visual understanding model, the fine-tuned GroundingDINO-MW based on GroundingDINO comprehensively outperforms these detection models in precision, recall, mAP, and F1 scores. This also fully validates that it can be better used in open-scenario medical waste classification and recognition compared to original models.
文章引用:刘奥, 赵海峰, 曾耀, 李卓, 孙强, 王孟飞. 基于视觉–语言联合建模与LoRA微调的医疗废弃物检测模型[J]. 人工智能与机器人研究, 2025, 14(3): 536-547. https://doi.org/10.12677/airr.2025.143053

参考文献

[1] 李文宇. 医院内部医疗废弃物回收的规划及管理系统研究[D]: [硕士学位论文]. 合肥: 合肥工业大学, 2020.
[2] 乔翅嵩, 顾登海, 卢广亮, 等. 城镇生活污水处理厂污泥资源化利用研究进展[J/OL]. 工业水处理: 1-25. 2024-11-21.[CrossRef
[3] Purnomo, C.W., Kurniawan, W. and Aziz, M. (2021) Technological Review on Thermochemical Conversion of COVID-19-Related Medical Wastes. Resources, Conservation and Recycling, 167, Article ID: 105429. [Google Scholar] [CrossRef] [PubMed]
[4] Nema, S.K. and Ganeshprasad, K.S. (2002) Plasma Pyrolysis of Medical Waste. Current Science, 83, 271-278.
[5] 王文胜, 年诚旭, 张超, 等. 基于YOLOv5模型的非住宅区自动垃圾分类箱设计[J]. 环境工程, 2022, 40(3): 159-165.
[6] Olorunshola, O.E., Irhebhude, M.E. and Evwiekpaefe, A.E. (2023) A Comparative Study of YOLOv5 and YOLOv7 Object Detection Algorithms. Journal of Computing and Social Informatics, 2, 1-12. [Google Scholar] [CrossRef
[7] 吕运鸿. 基于天空地多源信息的固废堆填场地识别及风险评价研究[D]: [硕士学位论文]. 杭州: 浙江大学, 2023.
[8] Wang, C., Bochkovskiy, A. and Liao, H.M. (2023) YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 7464-7475. [Google Scholar] [CrossRef
[9] Liu, S., Zeng, Z., Ren, T., Li, F., Zhang, H., Yang, J., et al. (2024) Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection. In: Lecture Notes in Computer Science, Springer, 38-55. [Google Scholar] [CrossRef
[10] Bai, S., Chen, K., Liu, X., Wang, J., Ge, W., et al. (2025) Qwen2.5-VL Technical Report.
[11] Kim, W., Son, B. and Kim, I. (2021) Vilt: Vision-and-Language Transformer without Convolution or Region Supervision. International Conference on Machine Learning 2021, Online, 18-24 July 2021, 5583-5594.
[12] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A. and Zagoruyko, S. (2020) End-to-End Object Detection with Transformers. In: Lecture Notes in Computer Science, Springer, 213-229. [Google Scholar] [CrossRef
[13] Caron, M., Touvron, H., Misra, I., Jegou, H., Mairal, J., Bojanowski, P., et al. (2021) Emerging Properties in Self-Supervised Vision Transformers. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 11-17 October 2021, 9650-9660. [Google Scholar] [CrossRef
[14] Sheng, T.J., Islam, M.S., Misran, N., Baharuddin, M.H., Arshad, H., Islam, M.R., et al. (2020) An Internet of Things Based Smart Waste Management System Using Lora and Tensor Flow Deep Learning Model. IEEE Access, 8, 148793-148811. [Google Scholar] [CrossRef
[15] Zhang, Y., Ren, W., Zhang, Z., Jia, Z., Wang, L. and Tan, T. (2022) Focal and Efficient IOU Loss for Accurate Bounding Box Regression. Neurocomputing, 506, 146-157. [Google Scholar] [CrossRef
[16] 雷建云, 邹金林, 夏梦, 等. 基于改进YOLOv5s的可回收垃圾检测方法[J]. 武汉纺织大学学报, 2023, 36(3): 56-64.
[17] 生态环境部, 国家卫生健康委. 关于印发医疗废物分类目录(2021年版)的通知[EB/OL].
https://www.gov.cn/zhengce/zhengceku/2021-12/02/content_5655394.htm, 2021-11-25.
[18] 蔡玉芳, 王涵, 李琦, 等. 联合自然梯度和AdamW算法的RSF图像分割模型[J]. 仪器仪表学报, 2023, 44(3): 261-270.
[19] Gupta, S., Agrawal, A., Gopalakrishnan, K. and Narayanan, P. (2015) Deep Learning with Limited Numerical Precision. International Conference on Machine Learning, Lille, 6-11 July 2015, 1737-1746.
[20] 曹靖城, 张继东, 史国杰. 一种使用边缘增强技术提高相似图片检索召回率的方法[J]. 电信科学, 2021, 37(1): 76-84.
[21] Li, K., Huang, Z., Cheng, Y. and Lee, C. (2014) A Maximal Figure-of-Merit Learning Approach to Maximizing Mean Average Precision with Deep Neural Network Based Classifiers. 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, 4-9 May 2014, 4503-4507. [Google Scholar] [CrossRef
[22] 杨智勇, 许倩倩, 何源, 等. 半监督AUC优化的Boosting算法及理论[J]. 计算机学报, 2022, 45(8): 1598-1617.