|
[1]
|
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2020) An Image Is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv: 2010.11929.
|
|
[2]
|
Peng, Z., Dong, L., Bao, H., et al. (2022) Beit v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers. arXiv: 2208.06366.
|
|
[3]
|
Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., et al. (2023) Segment Anything. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, 1-6 October 2023, 4015-4026. [Google Scholar] [CrossRef]
|
|
[4]
|
Devlin, J., Chang, M.W., Lee, K., et al. (2019) Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1, 4171-4186.
|
|
[5]
|
Xie, Z., Zhang, Z., Cao, Y., Lin, Y., Bao, J., Yao, Z., et al. (2022) SimMIM: A Simple Framework for Masked Image Modeling. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 9653-9663. [Google Scholar] [CrossRef]
|
|
[6]
|
He, K., Chen, X., Xie, S., Li, Y., Dollar, P. and Girshick, R. (2022) Masked Autoencoders Are Scalable Vision Learners. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 16000-16009. [Google Scholar] [CrossRef]
|
|
[7]
|
Wang, W., Bao, H., Dong, L., Bjorck, J., Peng, Z., Liu, Q., et al. (2023) Image as a Foreign Language: BEIT Pretraining for Vision and Vision-Language Tasks. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 19175-19186. [Google Scholar] [CrossRef]
|
|
[8]
|
Dong, X., Bao, J., Zhang, T., Chen, D., Zhang, W., Yuan, L., et al. (2023) Peco: Perceptual Codebook for BERT Pre-Training of Vision Transformers. Proceedings of the AAAI Conference on Artificial Intelligence, 37, 552-560. [Google Scholar] [CrossRef]
|
|
[9]
|
Wei, L., Xie, L., Zhou, W., Li, H. and Tian, Q. (2022) MVP: Multimodality-Guided Visual Pre-Training. In: Avidan, S., et al., Eds., European Conference on Computer Vision, Springer, 337-353. [Google Scholar] [CrossRef]
|
|
[10]
|
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., et al. (2014) Generative Adversarial Nets. Communications of the ACM, 63, 139-144.
|
|
[11]
|
Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X. and Huang, T.S. (2018) Generative Image Inpainting with Contextual Attention. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 5505-5514. [Google Scholar] [CrossRef]
|
|
[12]
|
Yang, F., Yang, H., Fu, J., Lu, H. and Guo, B. (2020) Learning Texture Transformer Network for Image Super-Resolution. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 5791-5800. [Google Scholar] [CrossRef]
|
|
[13]
|
He, S., Luo, H., Wang, P., Wang, F., Li, H. and Jiang, W. (2021) TransReID: Transformer-Based Object Re-Identification. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 15013-15022. [Google Scholar] [CrossRef]
|
|
[14]
|
Wang, W., Xie, E., Li, X., Fan, D., Song, K., Liang, D., et al. (2021) Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 548-558. [Google Scholar] [CrossRef]
|
|
[15]
|
Cen, F., Zhao, X., Li, W. and Wang, G. (2021) Deep Feature Augmentation for Occluded Image Classification. Pattern Recognition, 111, Article ID: 107737. [Google Scholar] [CrossRef]
|
|
[16]
|
Yang, Z., Chen, J., Li, J. and Zheng, X. (2025) Multiscale Occlusion-Robust Scene Classification in Remote Sensing Images via Supervised Contrastive Learning. IEEE Geoscience and Remote Sensing Letters, 22, 1-5. [Google Scholar] [CrossRef]
|
|
[17]
|
Kotwal, K., Deshmukh, T. and Gopal, P. (2024) Latent Enhancing Autoencoder for Occluded Image Classification. 2024 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, 27-30 October 2024, 894-900. [Google Scholar] [CrossRef]
|
|
[18]
|
Kortylewski, A., Liu, Q., Wang, H., Zhang, Z. and Yuille, A. (2020) Combining Compositional Models and Deep Networks for Robust Object Classification under Occlusion. 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), Snowmass, 1-5 March 2020, 1322-1330. [Google Scholar] [CrossRef]
|
|
[19]
|
Xiao, M., Kortylewski, A., Wu, R., Qiao, S., Shen, W. and Yuille, A. (2020) TDMPNet: Prototype Network with Recurrent Top-Down Modulation for Robust Object Classification under Partial Occlusion. In: Bartoli, A. and Fusiello, A., Eds., Computer Vision—ECCV 2020 Workshops, Springer International Publishing, 447-463. [Google Scholar] [CrossRef]
|
|
[20]
|
Heo, J., Wang, Y. and Park, J. (2022) Occlusion-Aware Spatial Attention Transformer for Occluded Object Recognition. Pattern Recognition Letters, 159, 70-76. [Google Scholar] [CrossRef]
|
|
[21]
|
Kortylewski, A., He, J., Liu, Q. and Yuille, A.L. (2020) Compositional Convolutional Neural Networks: A Deep Architecture with Innate Robustness to Partial Occlusion. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 8940-8949. [Google Scholar] [CrossRef]
|
|
[22]
|
Zhao, F., Feng, J., Zhao, J., Yang, W. and Yan, S. (2018) Robust LSTM-Autoencoders for Face De-Occlusion in the Wild. IEEE Transactions on Image Processing, 27, 778-790. [Google Scholar] [CrossRef] [PubMed]
|
|
[23]
|
Yun, S., Han, D., Chun, S., Oh, S.J., Yoo, Y. and Choe, J. (2019) Cutmix: Regularization Strategy to Train Strong Classifiers with Localizable Features. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 6023-6032. [Google Scholar] [CrossRef]
|
|
[24]
|
Wang, J.Y., Zhang, Z.S., Xie, C.H., et al. (2015) Unsupervised Learning of Object Semantic Parts from Internal States of CNNs by Population Encoding. arXiv: 1511.06855.
|