面向长尾识别的自适应加权融合策略
Adaptive Weighted Fusion Strategy for Long Tail Recognition
DOI: 10.12677/csa.2026.165158, PDF,   
作者: 陈 佳:广东工业大学数学与统计学院,广东 广州
关键词: 深度学习长尾图像分类动态权重Deep Learning Long Tail Image Classification Dynamic Weight
摘要: 在深度学习的早期发展阶段,研究多聚焦于如ImageNet、CIFAR等类别平衡的基准数据集。然而,在现实世界的视觉场景中,数据分布往往遵循幂律分布。这意味着极少数类别占据了绝大部分的样本量,而绝大多数类别仅拥有极少量的观察样本。这种长尾分布是自然界和人类社会的常态,广泛存在于物种识别、医疗诊断、自动驾驶物体检测以及工业缺陷检测等关键领域。然而,传统的深度学习模型在长尾数据上表现出明显的性能失衡,这源于头部偏见:受经验风险最小化驱动,模型会过度拟合样本丰富的头部类别,导致预测结果向头部偏移。尾部塌陷:对于样本稀缺的尾部类别,模型由于缺乏足够的辨识性特征,分类精度往往出现断崖式下跌。这种现象严重制约了人工智能系统在实际复杂环境中的鲁棒性和可靠性。针对长尾类别不平衡带来的严峻挑战,如何构建鲁棒的识别模型已成为计算机视觉领域的核心课题。研究基于混合专家模型架构,深入长尾数据分布下的图像分类难题,针对性地提出了创新深度视觉识别策略。文章的主要研究内容与贡献概括如下:提出了一种用于多专家长尾图像分类的测试阶段自适应集成的方法,通过在测试阶段引入Test-Time Augmentation (TTA)即测试阶段的图像增强后,并计算两个可靠性分数:1) Stability,通过增强特征间的平均余弦相似度衡量;2) Certainty,由平均概率分布的归一化熵导出。并将由这两个可靠性分数决定的不同权重分配给不同专家,使各个专家对于图像判断的偏好具有侧重性,更好地对数据集进行图像分类任务。该方法无需额外参数,开销极小,能有效抑制不可靠专家,在长尾分布下显著提升尾部类性能,同时保持整体准确率。
Abstract: In the early development stage of deep learning, research focuses on the balanced benchmark data sets of categories such as ImageNet and CIFAR. However, in the real-world visual scene, the data distribution often follows the power law distribution. This means that a few categories account for the vast majority of the sample size, while the vast majority of categories only have a small number of observation samples. This long tail distribution is the normal state of nature and human society, and widely exists in key fields such as species identification, medical diagnosis, automatic driving object detection, and industrial defect detection. However, the traditional deep learning model shows an obvious performance imbalance on the long tail data, which is due to the head bias: driven by the minimization of empirical risk, the model will overfit the head categories with rich samples, leading to the deviation of prediction results to the head. Tail collapse: for tail categories with scarce samples, the classification accuracy of the model often drops precipitously due to the lack of sufficient identification features. This phenomenon seriously restricts the robustness and reliability of the artificial intelligence system in the actual complex environment. In view of the severe challenge brought by the imbalance of long tail categories, how to build a robust recognition model has become a core topic in the field of computer vision. Based on the hybrid expert model architecture, this research delves into the image classification problem under the long tail data distribution and proposes an innovative depth vision recognition strategy. The main research contents and contributions of this paper are summarized as follows: A method of adaptive integration in the test phase for multi-expert long tail image classification is proposed. After introducing Test Time Augmentation (TTA), that is, image enhancement in the test phase, two reliability scores are calculated: 1) Stability, which is measured by the average cosine similarity of features between enhancements; 2) Certainty, which is derived from the normalized entropy of the average probability distribution. And different weights determined by these two reliability scores are allocated to different experts, so that each expert has a preference for image judgment, and can better perform image classification tasks on data sets. This method requires no additional parameters and has minimal overhead. It can effectively restrain unreliable experts, significantly improve tail class performance under the long tail distribution, and maintain the overall accuracy.
文章引用:陈佳. 面向长尾识别的自适应加权融合策略[J]. 计算机科学与应用, 2026, 16(5): 1-12. https://doi.org/10.12677/csa.2026.165158

参考文献

[1] Jin, Y., Li, M., Lu, Y., Cheung, Y. and Wang, H. (2023) Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge Excavation. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 23695-23704. [Google Scholar] [CrossRef
[2] Deng, J., Dong, W., Socher, R., Li, L., Li, K. and Li, F.F. (2009) ImageNet: A Large-Scale Hierarchical Image Database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, 20-25 June 2009, 248-255. [Google Scholar] [CrossRef
[3] Krizhevsky, A. and Hinton, G. (2009) Learning Multiple Layers of Features from Tiny Images. Department of Computer Science, University of Toronto.
[4] Van Horn, G., Mac Aodha, O., Song, Y., Cui, Y., Sun, C., Shepard, A., et al. (2018) The iNaturalist Species Classification and Detection Dataset. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 8769-8788. [Google Scholar] [CrossRef
[5] Zhang, Y., Kang, B., Hooi, B., Yan, S. and Feng, J. (2023) Deep Long-Tailed Learning: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45, 10795-10816. [Google Scholar] [CrossRef] [PubMed]
[6] Jiang, A.Q., Sablayrolles, A., Roux, A., Mensch, A., Savary, B., et al. (2024) Mixtral of Experts. arXiv:2401.04088.
[7] Aimar, E.S., Jonnarth, A., Felsberg, M. and Kuhlmann, M. (2023) Balanced Product of Calibrated Experts for Long-Tailed Recognition. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 19967-19977. [Google Scholar] [CrossRef
[8] Wang, D., Shelhamer, E., Liu, S., Olshausen, B. and Darrell, T. (2021) Tent: Fully Test-Time Adaptation by Entropy Minimization. 2021 International Conference on Learning Representations (ICLR), Vienna, 3-7 May 2021.
[9] Zhang, M., Levine, S. and Finn, C. (2022) MEMO: Test Time Robustness via Adaptation and Augmentation. 2022 Advances in Neural Information Processing Systems (NeurIPS), New Orleans, 28 November-3 December 2022, 1-14.
[10] Li, X. and Xu, H. (2023) MEID: Mixture-of-Experts with Internal Distillation for Long-Tailed Video Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 37, 1451-1459. [Google Scholar] [CrossRef
[11] Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., et al. (2021) Learning Transferable Visual Models from Natural Language Supervision. 2021 International Conference on Machine Learning (ICML), Online, 18-24 July 2021, 1-16.
[12] Cai, J., Wang, Y. and Hwang, J. (2021) ACE: Ally Complementary Experts for Solving Long-Tailed Recognition in One-Shot. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 112-121. [Google Scholar] [CrossRef
[13] Cui, J., Liu, S., Tian, Z., Zhong, Z. and Jia, J. (2022) ResLT: Residual Learning for Long-Tailed Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45, 3695-3706. [Google Scholar] [CrossRef] [PubMed]
[14] Li, M., Cheung, Y. and Lu, Y. (2022) Long-Tailed Visual Recognition via Gaussian Clouded Logit Adjustment. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, 18-24 June 2022, 6929-6938.
[15] Lin, T.Y., Goyal, P., Girshick, R., et al. (2017) Focal Loss for Dense Object Detection. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 2980-2988. [Google Scholar] [CrossRef
[16] Zhou, B., Cui, Q., Wei, X. and Chen, Z. (2020) BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 9719-9728. [Google Scholar] [CrossRef
[17] Zhang, Y., Hooi, B., Hong, L. and Feng, J. (2022) Self-Supervised Aggregation of Diverse Experts for Test-Agnostic Long-Tailed Recognition. 2022 Advances in Neural Information Processing Systems (NeurIPS), New Orleans, 28 November-9 December 2022, 34077-34090.
[18] Menon, A.K., Jayasumana, S., Rawat, A.S., Jain, H., Veit, A. and Kumar, S. (2021) Long-Tail Learning via Logit Adjustment. 2021 International Conference on Learning Representations (ICLR), Vienna, 3-7 May 2021.
[19] Wang, X., Lian, L., Miao, Z., Liu, Z. and Yu, S.X. (2021) Long-Tailed Recognition by Routing Diverse Distribution-Aware Experts. 2021 International Conference on Learning Representations (ICLR), Vienna, 3-7 May 2021.
[20] Cui, Y., Jia, M., Lin, T., Song, Y. and Belongie, S. (2019) Class-Balanced Loss Based on Effective Number of Samples. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 9260-9269. [Google Scholar] [CrossRef
[21] Ren, J., Yu, C., Ma, X., Zhao, H., Yi, S. and Li, H. (2020) Balanced Meta-Softmax for Long-Tailed Visual Recognition. 2020 Advances in Neural Information Processing Systems (NeurIPS), Online, 6-12 December 2020, 4175-4186.
[22] Kang, B., Xie, S., Rohrbach, M., Yan, Z., Gordo, A., Feng, J. and Kalantidis, Y. (2020) Decoupling Representation and Classifier for Long-Tailed Recognition. 2020 International Conference on Learning Representations (ICLR), Addis Ababa, 26-30 April 2020.
[23] He, Y.Y., Wu, J. and Wei, X.S. (2021) Distilling Virtual Examples for Long-Tailed Recognition. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 235-244. [Google Scholar] [CrossRef
[24] Kang, B., Li, Y., Xie, S., Yuan, Z. and Feng, J. (2020) Exploring Balanced Feature Spaces for Representation Learning. 2020 International Conference on Learning Representations (ICLR), Addis Ababa, 23-30 April 2020.