图像识别中人工智能模型的性能评估与改进
Performance Evaluation and Improvement of Artificial Intelligence Models in Image Recognition
DOI: 10.12677/airr.2026.152044, PDF,   
作者: 包婉莹:呼和浩特职业技术大学计算机与信息工程学院,内蒙古 呼和浩特
关键词: 图像识别人工智能模型性能评估模型改进泛化能力Image Recognition Artificial Intelligence Model Performance Evaluation Model Improvement Generalization Ability
摘要: 明确图像识别领域人工智能模型性能评估的核心价值,梳理现有评估体系不足,可为模型改进提供理论与实践指引。采用文献梳理与逻辑分析相结合的方法,系统剖析主流评估指标、常用评估方法及典型数据集,归纳复杂场景下模型的性能短板并提出改进策略。结果表明,准确率、精确率等核心指标及单一指标评估、跨数据集验证等方法各有优劣;模型在光照变化、姿态差异、遮挡干扰、小样本数据等场景中存在明显性能瓶颈;数据增强、模型结构优化、迁移学习与多模型融合可有效提升模型性能。结论指出,需构建多维度综合评估体系,从数据、结构、算法多方面协同推进模型改进,以增强其复杂场景下的泛化与鲁棒性,助力图像识别技术实用化发展。
Abstract: Clarifying the core value of performance evaluation of artificial intelligence (AI) models in the field of image recognition and sorting out the deficiencies of existing evaluation systems can provide theoretical and practical guidance for model improvement. By adopting a combination of literature review and logical analysis, this study systematically analyzes mainstream evaluation metrics, common evaluation methods and typical datasets, summarizes the performance shortcomings of models in complex scenarios, and proposes improvement strategies. The results show that core metrics such as accuracy and precision, as well as methods including single-metric evaluation and cross-dataset validation, each have their own advantages and disadvantages; models have obvious performance bottlenecks in scenarios such as illumination changes, pose differences, occlusion interference, and small-sample data; data augmentation, model structure optimization, transfer learning, and multi-model fusion can effectively improve model performance. The conclusion points out that it is necessary to construct a multi-dimensional comprehensive evaluation system and promote model improvement collaboratively from multiple aspects of data, structure, and algorithms to enhance their generalization and robustness in complex scenarios, thereby facilitating the practical development of image recognition technology.
文章引用:包婉莹. 图像识别中人工智能模型的性能评估与改进[J]. 人工智能与机器人研究, 2026, 15(2): 455-463. https://doi.org/10.12677/airr.2026.152044

参考文献

[1] 张丽英, 张永兴, 席云, 等. 基于人工智能的农机自动驾驶系统设计与优化[J]. 中国农机装备, 2025(12): 7-9.
[2] 韩永刚. 基于人工智能算法的图像识别技术分析[J]. 通讯世界, 2025, 32(11): 152-154.
[3] 雷郑波, 涂凯, 张永乐, 等. 一类分布鲁棒指数追踪模型及算法[J/OL]. 运筹学学报(中英文), 1-21[2025-12-28].
[4] 刘平献, 张明明, 王鹏, 等. 基于大模型的便民热线工单智能知识推荐系统的算法优化与性能评估[J]. 数字技术与应用, 2025, 43(3): 16-18.
[5] 孟彬, 杨帆. 基于深度强化学习的数据中心资源调度算法研究[J]. 软件, 2025, 46(11): 1-3.
[6] 阮春珠, 林旭怡, 张燕. 人工智能辅助学习系统技术架构优化与标准化性能评估[J]. 大众标准化, 2025(22): 164-166.
[7] Grill, J.-B., et al. (2020) Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning. Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, 6-12 December 2020, 21271-21284.
[8] Chen, X. and He, K. (2021) Exploring Simple Siamese Representation Learning. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 15745-15753. [Google Scholar] [CrossRef
[9] Caron, M., Touvron, H., Misra, I., Jegou, H., Mairal, J., Bojanowski, P., et al. (2021) Emerging Properties in Self-Supervised Vision Transformers. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 9630-9640. [Google Scholar] [CrossRef
[10] Li, J., et al. (2023) Uniform Masking: Enabling MAE Pre-Training for Pyramid-Based Vision Transformers. IEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, 1-6 October 2023, 1190-1199.
[11] Chen, X., Ding, M., Wang, X., et al. (2024). Context Autoencoder for Self-Supervised Representation Learning. International Journal of Computer Vision, 132, 208-223.[CrossRef
[12] Li, J., et al. (2022) BLIP: Bootstrapping Language-Image Pre-Training for Unified Vision-Language Understanding and Generation. Proceedings of the 39th International Conference on Machine Learning, Baltimore, 17-23 July 2022, 12888-12900.
[13] Wang, P., et al. (2022) OFA: Unifying Architectures, Tasks, and Modalities via a Simple Sequence-to-Sequence Framework. Proceedings of the 39th International Conference on Machine Learning, Baltimore, 17-23 July 2022, 23318-23340.
[14] Alayrac, J.-B., et al. (2022) Flamingo: A Visual Language Model for Few-Shot Learning. NeurIPS 2022, New Orleans, 28 November-9 December 2022, 23716-23736.
[15] Driess, D., et al. (2023) PaLM-E: An Embodied Multimodal Language Model. International Conference on Machine Learning, ICML 2023, Honolulu, 23-29 July 2023, 8469-8488.
[16] Zhu, D., et al. (2023) MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models.
[17] Luo, Z., et al. (2024) Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Models. IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, Seattle, 16-22 June 2024, 42444-42457.
[18] Team Gemini (2025) Gemini 1. 5: Unlocking Multimodal Understanding Across Millions of Tokens of Context.
[19] OpenAI (2023) GPT-4V(ision) System Card & Benchmark Results.
[20] Liu, H., et al. (2023) Visual Instruction Tuning (LLaVA). Proceedings of the 37th International Conference on Neural Information Processing Systems, New Orleans, 10-16 December 2023, 34892-34916.