基于Swin-Transformer的野生动物检测
Wild Animal Detection Based on Swin-Transformer
摘要: 野生动物检测对于更好地开展野生动物保护、维持生物多样性和生态系统平衡具有重要意义。随着科技的进步,野生动物检测已从传统的人工寻觅、人眼识别发展到利用机器学习技术进行快速检测的阶段。然而,当前各种检测模型存在检测精度不高的问题。因此,本文将Swin-Transformer技术应用到野生动物目标检测模型,并与其他的优秀的检测模型进行性能比较。实验结果表明与其他优秀的检测器相比,Swin-Transformer检测的平均检测精度为0.958,领先于其他检测模型至少5%,并且该检测器对绝大多数动物的检测均可取得最优结果,即使是对于样本数量较少的稀有类别,检测精度依然能够达到91%,极大提高了野生动物检测的准确率。
Abstract: Wildlife detection is of great significance for better carrying out wildlife protection, maintaining biodiversity and ecosystem balance. With the advancement of science and technology, wildlife detection has evolved from traditional manual search and human eye recognition to the stage of rapid detection using machine learning technology. However, the current detection models have the problem of low detection accuracy. Therefore, this article applies the Swin-Transformer technology to the wild animal target detection model, and compared it with other excellent models. Experimental results show that compared with other excellent detectors, the average precision value of Swin-Transformer detection is 0.958, which is at least 5% ahead of other detection models, and the detector achieves the best results for most categories, even for rare categories, the accuracy can reach 91%, which greatly improves the detection accuracy.
文章引用:姜福豪, 隋晨红, 欧世峰, 王中训, 胡国英, 杨国斌, 潘云豪, 胡健. 基于Swin-Transformer的野生动物检测[J]. 人工智能与机器人研究, 2021, 10(4): 281-291. https://doi.org/10.12677/AIRR.2021.104028

参考文献

[1] Technicolor, T., Related, S., Technicolor, T., et al. (2012) ImageNet Classification with Deep Convolutional Neural Networks.
[2] Simonyan, K. and Zisserman, A. (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition. 3rd International Conference on Learning Representations, San Diego, 7-9 May 2015. https://arxiv.org/pdf/1409.1556
[3] Szegedy, C., Liu, W., Jia, Y., et al. (2015) Going Deeper with Convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 7-12 June 2015, 1-9. [Google Scholar] [CrossRef
[4] He, K., Zhang, X., Ren, S., et al. (2016) Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef
[5] Huang, G., Liu, Z., Van Der Maaten, L., et al. (2017) Densely Connected Convolutional Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 4700-4708. [Google Scholar] [CrossRef
[6] Liu, W., Anguelov, D., Erhan, D., et al. (2016) SSD: Single Shot MultiBox Detector. In: European Conference on Computer Vision, Springer, Cham, 21-37. [Google Scholar] [CrossRef
[7] Lin, T.Y., Goyal, P., Girshick, R., et al. (2017) Focal Loss for Dense Object Detection. IEEE Transactions on Pattern Analysis & Machine Intelligence, 42, 318-327. [Google Scholar] [CrossRef
[8] Redmon, J. and Farhadi, A. (2018) YOLOv3: An Incremental Improvement. https://arxiv.org/pdf/1804.02767.pdf
[9] Ge, Z., Liu, S., Wang, F., et al. (2021) Yolox: Exceeding Yolo Series in 2021. https://arxiv.org/pdf/2107.08430
[10] Ren, S., He, K., Girshick, R., et al. (2017) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis & Machine Intelligence, 39, 1137-1149. [Google Scholar] [CrossRef
[11] He, K., Gkioxari, G., Dollár, P., et al. (2017) Mask R-CNN. IEEE Transactions on Pattern Analysis & Machine Intelligence, 42, 386-397. [Google Scholar] [CrossRef
[12] 刘文定, 李安琪, 张军国, 等. 基于ROI-CNN的赛罕乌拉国家级自然保护区陆生野生动物自动识别[J]. 北京林业大学学报, 2018, 40(8): 123-131.
[13] 何育欣. 基于卷积神经网络的大熊猫检测与个体识别研究[D]: [硕士学位论文]. 南充: 西华师范大学, 2020.
[14] 黄鑫达. 基于卷积神经网络的动物目标检测算法研究[D]: [硕士学位论文]. 厦门: 华侨大学, 2020.
[15] 陈刚琦. 基于卷积神经网络的高原鼠兔图像检测与分割方法研究[D]: [硕士学位论文]. 兰州: 兰州理工大学, 2020.
[16] 史春妹, 谢佳君, 顾佳音, 刘丹, 姜广顺. 基于目标检测的东北虎个体自动识别[J]. 生态学报, 2021, 41(12): 4685-4693.
[17] 程浙安. 基于深度卷积神经网络的内蒙古地区陆生野生动物自动识别[D]: [硕士学位论文]. 北京: 北京林业大学, 2019.
[18] 黄元涛. 基于深度学习的藏羚羊检测与跟踪[D]: [硕士学位论文]. 西安: 西安电子科技大学, 2020.
[19] 翟俊伟. 基于图像处理的可可西里藏羚羊检测方法[D]: [硕士学位论文]. 西安: 西安电子科技大学, 2018.
[20] 张艺秋. 基于深度学习的森林火灾识别与检测算法研究[D]: [硕士学位论文]. 北京: 北京林业大学, 2020.
[21] 王飞. 基于深度学习的森林火灾识别检测系统的研究与实现[D]: [硕士学位论文]. 成都: 电子科技大学, 2020.
[22] Cui, X. (2021) Attention Is All You Need for General-Purpose Protein Structure Embedding.
[23] Chen, M., Radford, A., Child, R., Wu, J., Jun, H., Dhariwal, P., Luan, D. and Sutskever, I. (2020) Generative Pretraining from Pixels. International Conference on Machine Learning (ICML), Vienna, 12-18 July 2020, 1691-1703.
[24] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J. and Houlsby, N. (2021) An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. International Conference on Learning Representations (ICLR), Vienna, 4-7 May 2021. https://arxiv.org/pdf/2010.11929.pdf
[25] Carion, N., Massa, F., Synnaeve, G., et al. (2020) End-to-End Object Detection with Transformers. [Google Scholar] [CrossRef
[26] Liu, Z., Lin, Y., Cao, Y., et al. (2021) Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows.
[27] Lin, T.Y., Maire, M., Belongie, S., et al. (2014) Microsoft COCO: Common Objects in Context. Springer International Publishing, Berlin. [Google Scholar] [CrossRef