基于ResNet18-CBAM的蔬菜图像高精度识别研究
Research on High-Accuracy Vegetable Image Recognition Based on ResNet18-CBAM
DOI: 10.12677/csa.2025.1511294, PDF,    科研立项经费支持
作者: 王丽丽*:河北金融学院河北省科技金融重点实验室,河北 保定;周 晗:河北金融学院经济贸易学院,河北 保定;胡梦轩:河北金融学院图书馆,河北 保定
关键词: 蔬菜图像识别ResNet卷积注意力机制(CBAM)深度学习Vegetable Image Recognition ResNet Convolutional Block Attention Module (CBAM) Deep Learning
摘要: 计算机视觉在农业智能化应用中发挥重要作用。针对蔬菜图像分类任务中存在类间相似度高、类内差异性大等多种问题,传统卷积神经网络难以充分提取区分性特征,本文提出一种融合卷积注意力机制(CBAM)的ResNet18改进模型,以提升蔬菜图像的识别精度与鲁棒性。ResNet18-CBAM通过在残差块中嵌入通道与空间注意力模块,增强模型对关键区域与显著特征的关注能力,实现特征的自适应校准与增强。实验基于Kaggle公开的Vegetable Image数据集进行系统训练与测试,结果表明,ResNet18-CBAM在测试集上达到99.67%的分类准确率与0.997的宏平均F1分数,显著优于基准ResNet18模型。研究验证了注意力机制CBAM的引入有效提升模型在复杂农业图像识别任务中的感知能力,为高精度蔬菜分类系统的实际应用提供了可靠技术路径。
Abstract: Computer vision plays a significant role in the intelligent application of agriculture. Addressing challenges in vegetable image classification, such as high inter-class similarity and large intra-class variance, traditional convolutional neural networks (CNNs) often struggle to extract sufficiently discriminative features. This paper proposes an improved ResNet18 model integrated with the Convolutional Block Attention Module (CBAM) to enhance the recognition accuracy and robustness of vegetable images. By embedding channel and spatial attention modules into the residual blocks, ResNet18-CBAM strengthens the model’s ability to focus on critical regions and salient features, achieving adaptive feature calibration and enhancement. Experiments were conducted on the publicly available Kaggle Vegetable Image dataset for systematic training and testing. Results demonstrate that ResNet18-CBAM achieves a classification accuracy of 99.67% and a macro-average F1-score of 0.997 on the test set, significantly outperforming the baseline ResNet18 model. This study validates that the incorporation of the CBAM attention mechanism effectively enhances the model’s perceptual capability in complex agricultural image recognition tasks, providing a reliable technical pathway for the practical application of high-precision vegetable classification systems.
文章引用:王丽丽, 周晗, 胡梦轩. 基于ResNet18-CBAM的蔬菜图像高精度识别研究[J]. 计算机科学与应用, 2025, 15(11): 162-172. https://doi.org/10.12677/csa.2025.1511294

参考文献

[1] Zhou, L., Zhang, C., Liu, F., Qiu, Z. and He, Y. (2019) Application of Deep Learning in Food: A Review. Comprehensive Reviews in Food Science and Food Safety, 18, 1793-1811. [Google Scholar] [CrossRef] [PubMed]
[2] Alzubaidi, L., Zhang, J., Humaidi, A.J., Al-Dujaili, A., Duan, Y., Al-Shamma, O., et al. (2021) Review of Deep Learning: Concepts, CNN Architectures, Challenges, Applications, Future Directions. Journal of Big Data, 8, Article No. 53. [Google Scholar] [CrossRef] [PubMed]
[3] Li, Z., Li, F., Zhu, L. and Yue, J. (2020) Vegetable Recognition and Classification Based on Improved VGG Deep Learning Network Model. International Journal of Computational Intelligence Systems, 13, 559-564. [Google Scholar] [CrossRef
[4] Simonyan, K. and Zisserman, A. (2015) Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv: 1409.1556
[5] Huang, G., Liu, Z., Van Der Maaten, L. and Weinberger, K.Q. (2017) Densely Connected Convolutional Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 2261-2269. [Google Scholar] [CrossRef
[6] He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef
[7] Jiang, S., Min, W., Liu, L. and Luo, Z. (2020) Multi-scale Multi-View Deep Feature Aggregation for Food Recognition. IEEE Transactions on Image Processing, 29, 265-276. [Google Scholar] [CrossRef] [PubMed]
[8] Yang, H., Zhu, B., Zhang, Y., et al. (2025) Review of Plant Disease Image Recognition Algorithms Based on Deep Learning. Application of Electronic Technique, 51, 1-7.
[9] Woo, S., Park, J., Lee, J. and Kweon, I.S. (2018) CBAM: Convolutional Block Attention Module. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., Computer VisionECCV 2018, Springer, 3-19. [Google Scholar] [CrossRef
[10] Hu, J., Shen, L. and Sun, G. (2018) Squeeze-and-Excitation Networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 7132-7141. [Google Scholar] [CrossRef
[11] Ahmed, M. (2021) Vegetable Image Dataset.
https://www.kaggle.com/datasets/misrakahmed/vegetable-image-dataset
[12] Deng, J., Dong, W., Socher, R., Li, L., Li, K. and Li, F.F. (2009) ImageNet: A Large-Scale Hierarchical Image Database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, 20-25 June 2009, 248-255. [Google Scholar] [CrossRef
[13] Kingma, D.P. and Ba, J. (2015) Adam: A Method for Stochastic Optimization. Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015), San Diego, CA, USA.
https://arxiv.org/abs/1412.6980