|
[1]
|
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Un-terthiner, T., Dehghani, M., Minderer, M., Heigold, G. and Gelly, S. (2020) An Image Is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv: 2010.11929.
|
|
[2]
|
Dong, P., Niu, X., Tian, Z., Li, L., Wang, X., Wei, Z., et al. (2023) Progressive Meta-Pooling Learning for Lightweight Image Classification Model. ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, 4-10 June 2023, 1-5. [Google Scholar] [CrossRef]
|
|
[3]
|
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A. and J’egou, H. (2021) Training Data-Efficient Image Transformers & Distillation through Attention. arXiv: 2012.12877.
|
|
[4]
|
Wei, Z., Pan, H., Li, L.L., Lu, M., Niu, X., Dong, P. and Li, D. (2022) Convformer: Closing the Gap between CNN and Vision Transformers. arXiv: 2209.07738.
|
|
[5]
|
Wang, W., Xie, E., Li, X., Fan, D., Song, K., Liang, D., et al. (2021) Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 548-558. [Google Scholar] [CrossRef]
|
|
[6]
|
Zhu, X., Su, W., Lu, L., Li, B., Wang, X. and Dai, J. (2020) Deformable DETR: Deformable Transformers for End-to-End Object Detection. arXiv: 2010.04159.
|
|
[7]
|
Qin, J., Wu, J., Xiao, X., Li, L. and Wang, X. (2022) Activation Modulation and Recalibration Scheme for Weakly Supervised Semantic Segmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 36, 2117-2125. [Google Scholar] [CrossRef]
|
|
[8]
|
Tay, Y., Dehghani, M., Bahri, D. and Metzler, D. (2020) Efficient Transformers: A Survey. arXiv: 2009.06732.
|
|
[9]
|
Li, G., Wang, Y., Zhao, Q., Yuan, P. and Chang, B. (2023) PMVT: A Lightweight Vision Transformer for Plant Disease Identification on Mobile Devices. Frontiers in Plant Science, 14, Article 1256773. [Google Scholar] [CrossRef] [PubMed]
|
|
[10]
|
He, F., Liu, Y. and Liu, J. (2024) ECA-ViT: Leveraging ECA and Vision Transformer for Crop Leaves Diseases Identification in Cultivation Environments. 2024 4th International Conference on Machine Learning and Intelligent Systems Engineering (MLISE), Zhuhai, 28-30 June 2024, 101-104. [Google Scholar] [CrossRef]
|
|
[11]
|
Wu, S., Sun, Y. and Huang, H. (2021) Multi-Granularity Feature Extraction Based on Vision Transformer for Tomato Leaf Disease Recognition. 2021 3rd International Academic Exchange Conference on Science and Technology Innovation (IAECST), Guangzhou, 10-12 December 2021, 387-390. [Google Scholar] [CrossRef]
|
|
[12]
|
Sharma, S.K. and Vishwakarma, D.K. (2024) Classification of Banana Plant Leaves Based on Nutrient Deficiency Using Vision Transformer. 2024 5th International Conference for Emerging Technology (INCET), Belgaum, 24-26 May 2024, 1-6. [Google Scholar] [CrossRef]
|
|
[13]
|
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L. and Polosukhin, I. (2017) Attention Is All You Need. arXiv: 1706.03762.
|
|
[14]
|
Sajid, U., Chen, X., Sajid, H., Kim, T. and Wang, G. (2021) Audio-Visual Transformer Based Crowd Counting. 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, 11-17 October 2021, 2249-2259. [Google Scholar] [CrossRef]
|
|
[15]
|
Ba, J.L., Kiros, J.R. and Hinton, G.E. (2016) Layer Normalization. arXiv: 1607.06450.
|
|
[16]
|
Hendrycks, D. and Gimpel, K. (2016) Gaussian Error Linear Units (GELUS). arXiv: 1606.08415.
|
|
[17]
|
Ioffe, S. and Szegedy, C. (2015) Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. Proceedings of the International Conference on Machine Learning (ICML), Lille, 6-11 July 2015, 448-456.
|
|
[18]
|
Steiner, A., Kolesnikov, A., Zhai, X., Wightman, R., Uszkoreit, J. and Beyer, L. (2021) How to Train Your ViT? Data, Augmentation, and Regularization in Vision Transformers. arXiv: 2106.10270.
|
|
[19]
|
Kingma, D.P. and Ba, J. (2014) Adam: A Method for Stochastic Optimization. arXiv: 1412.6980.
|
|
[20]
|
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., et al. (2021) Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 9992-10002. [Google Scholar] [CrossRef]
|
|
[21]
|
Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V. and Le, Q.V. (2019) AutoAugment: Learning Augmentation Strategies from Data. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 113-123. [Google Scholar] [CrossRef]
|
|
[22]
|
Zhong, Z., Zheng, L., Kang, G., Li, S. and Yang, Y. (2020) Random Erasing Data Augmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 13001-13008. [Google Scholar] [CrossRef]
|
|
[23]
|
Zhang, H., Cisse, M., Dauphin, Y.N. and Lopez-Paz, D. (2017) MixUp: Beyond Empirical Risk Minimization. arXiv: 1710.09412.
|
|
[24]
|
Yun, S., Han, D., Chun, S., Oh, S.J., Yoo, Y. and Choe, J. (2019) CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 6022-6031. [Google Scholar] [CrossRef]
|