基于Swin Transformer的无监督域自适应图像分类
Unsupervised Domain Adaptation Image Classification Based on Swin Transformer
摘要: 大多数当前的无监督域自适应(UDA)技术从域级别或类级别学习域不变的特征表示。基于域级别的主流方法是对抗学习,对抗学习通常不考虑目标数据的固有判别信息。基于类别级别的UDA方法通常是为目标域样本生成伪标签,由于这些伪标签通常噪声太大,这不可避免地会影响UDA性能;其次,现有方法没有明确地强制区分不同类别的特征。为了解决以上问题,我们提出了基于Swin Transformer的无监督域自适应(SwinUDA)。首先,对于域对齐,将Swin Transformer与对抗性自适应相结合,提高模型对噪声输入的鲁棒性,其次,对于类别对齐,使用正交投影损失(OPL)直接在特征空间中实施约束。此外,正交投影损失对标签噪声干扰的影响更有鲁棒性。最后,引入了互信息最大化损失(IML)来保留目标域的可区分特征。本文提出的SwinUDA模型可以同时学习可迁移和可区分的特征。在Office-Home、Office-31和VisDA-2017三个公开数据集上进行实验,SwinUDA都展现了最佳的性能。
Abstract: Most current unsupervised domain adaptation (UDA) techniques learn domain invariant feature representations from the domain-level or class-level. Adversarial learning is the dominating strat-egy based on the domain-level. It tries to align the global feature distributions of the two domains without considering the target data’s innate discriminative information. Class-level-based ap-proaches typically generate pseudo-labels for data in the target domain. These pseudo-labels im-pact UDA’s performance because they are generally overly noisy. In addition, existing methods do not explicitly enforce a good separation of different classes of features. To solve the above problems, we propose the Unsupervised Domain Adaptation Using Swin Transformer (SwinUDA). First, for domain alignment, the Swin Transformer is combined with adversarial adaptation to improve the robustness of the model to noisy inputs. The experimental results show that using the transformer as a feature extractor has higher transferability. Second, constraints are directly enforced in the feature space for class alignment using Orthogonal Projection Loss (OPL). Samples from the same class (whether from the source or target domain) are pulled closer, while samples from different classes are pushed away. In addition, the orthogonal projection loss is more robust to the influence of label noise interference. To preserve the discriminative information of the target domain, a mu-tual information maximization loss (IML) is introduced to protect the discriminating features of the target domain. The SwinUDA model proposed in this paper can simultaneously learn transferable and differentiable features. Experiments were performed on the three public datasets Office-Home, Office-31, and VisDA-2017. SwinUDA showed the best performance.
文章引用:范博文, 徐志洁. 基于Swin Transformer的无监督域自适应图像分类[J]. 建模与仿真, 2023, 12(3): 3051-3062. https://doi.org/10.12677/MOS.2023.123281

参考文献

[1] Ganin, Y. and Lempitsky, V. (2015) Unsupervised Domain Adaptation by Backpropagation. Proceedings of the 32nd Interna-tional Conference on Machine Learning, Lille, 6-11 July 2015, 1180-1189.
[2] Long, M.S., Cao, Z.J., Wang, J.M., et al. (2018) Conditional Adversarial Domain Adaptation. Advances in Neural Information Processing Systems 31: Annual Confer-ence on Neural Information Processing Systems 2018, Montréal, 3-8 December 2018, 31.
[3] Tzeng, E., Hoffman, J., Saenko, K., et al. (2017) Adversarial Discriminative Domain Adaptation. Proceedings of the IEEE Conference on Computer Vi-sion and Pattern Recognition, Honolulu, 21-26 July 2017, 7167-7176. [Google Scholar] [CrossRef
[4] Cui, S., Wang, S., Zhuo, J., et al. (2020) Gradually Vanishing Bridge for Adversarial Domain Adaptation. Conference on Computer Vision and Pattern Recognition, Seattle, 13-19 June 2020, 12455-12464. [Google Scholar] [CrossRef
[5] Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al. (2014) Genera-tive Adversarial Nets. Annual Conference on Neural Information Processing Systems 2014, Montreal, 8-13 December 2014, 2672-2680.
[6] Zhang, Y., Tang, H., Jia, K., et al. (2019) Domain-Symmetric Networks for Adversarial Domain Adaptation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 5031- 5040. [Google Scholar] [CrossRef
[7] Jiang, X., Lao, Q., Matwin, S., et al. (2020) Implicit Class-Conditioned Domain Alignment for Unsupervised Domain Adaptation. International Conference on Machine Learning, 13-18 July 2020, 4816-4827.
[8] Morerio, P., Volpi, R., Ragonesi, R., et al. (2020) Generative Pseudo-Label Refinement for Unsupervised Domain Adaptation. IEEE Winter Conference on Applications of Computer Vision, Snowmass, 1-5 March 2020, 3130-3139. [Google Scholar] [CrossRef
[9] Tang, H., Chen, K. and Jia, K. (2020) Unsupervised Domain Adaptation via Structurally Regularized Deep Clustering. Computer Vision and Pattern Recognition, Seattle, 13-19 June 2020, 8725-8735. [Google Scholar] [CrossRef
[10] Saito, K., Ushiku, Y. and Harada, T. (2017) Asymmetric Tri-Training for Unsupervised Domain Adaptation. International Conference on Machine Learning, Sydney, 6-11 August 2017, 2988-2997.
[11] Xu, T., Chen, W., Wang, P., et al. (2021) CDTrans: Cross-Domain Transformer for Unsupervised Domain Adaptation.
[12] Shi, Y. and Sha, F. (2012) Information-Theoretical Learning of Discriminative Clusters for Unsuper-vised Domain Adaptation.
[13] Saito, K., Kim, D., Sclaroff, S., et al. (2019) Semi-Supervised Domain Adaptation via Minimax Entropy. International Conference on Computer Vision, Seoul, 27 October-2 November 2019, 8050-8058. [Google Scholar] [CrossRef
[14] Ranasinghe, K., Naseer, M., Hayat, M., et al. (2021) Orthogonal Projec-tion Loss. International Conference on Computer Vision, Montreal, 10-17 October 2021, 1233-12343. [Google Scholar] [CrossRef
[15] Peng, X., Usman, B., Kaushik, N., et al. (2017) VisDA: The Visual Domain Adaptation Challenges.
https://arxiv.org/abs/1710.06924
[16] Chen, M., Zhao, S., Liu, H., et al. (2020) Adversarial-Learned Loss for Domain Adaptation. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 3521-3528. [Google Scholar] [CrossRef
[17] Wang, X., Li, L., Ye, W., et al. (2019) Transferable Attention for Domain Adaptation. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 5345-5352. [Google Scholar] [CrossRef
[18] Liang, J., Hu, D. and Feng, J. (2020) Do We Really Need to Access the Source Data? Source Hypothesis Transfer for Unsupervised Domain Adaptation. International Conference on Machine Learning, 13-18 July 2020, 6028-6039.
[19] Saito, K., Watanabe, K., Ushiku, Y., et al. (2018) Maximum Classifier Discrep-ancy for Unsupervised Domain Adaptation. Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 3723-3732. [Google Scholar] [CrossRef
[20] Huang, J., Guan, D., Xiao, A., et al. (2022) Category Contrast for Unsu-pervised Domain Adaptation in Visual Tasks. Computer Vision and Pattern Recognition, New Orleans, 18-24 June 2022, 1203-1214. [Google Scholar] [CrossRef
[21] Lu, Z., Yang, Y., Zhu, X., et al. (2020) Stochastic Classifiers for Unsupervised Domain Adaptation. Computer Vision and Pattern Recognition, Seattle, 13-19 June 2020, 9111-9120. [Google Scholar] [CrossRef
[22] Yang, J., Liu, J., Xu, N., et al. (2023) TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation. IEEE/CVF Winter Conference on Applications of Computer Vision, Waiko-loa, 2-7 January 2023, 520-530. [Google Scholar] [CrossRef
[23] Sun, T., Lu, C., Zhang, T., et al. (2022) Safe Self-Refinement for Transformer-based Domain Adaptation. Computer Vision and Pattern Recognition, New Orleans, 18-24 June 2022, 7191-7200. [Google Scholar] [CrossRef
[24] Venkateswara, H., Eusebio, J., Chakraborty, S., et al. (2017) Deep Hashing Network for Unsupervised Domain Adaptation. Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 5018-5027. [Google Scholar] [CrossRef
[25] Saenko, K., Kulis, B., Fritz, M., et al. (2010) Adapting Visual Category Models to New Domains. 11th European Conference on Computer Vision, Heraklion, 5-11 September 2010, 213-226. [Google Scholar] [CrossRef