面向电子商务评论的多标签长尾分类样本感知损失函数
Multi-Label Long-Tail Classification Sample-Aware Loss Function for E-Commerce Reviews
摘要: 在电子商务平台中,商品评论的标签分布往往呈现长尾特性:少数热门标签拥有大量训练样本,而大量细粒度标签仅有极少样本支持。这种严重的数据不平衡导致传统分类方法过度拟合头部类别,严重损害尾部标签的分类性能,限制了电商平台对用户评论深度理解的准确性。针对这一问题,本文提出一种面向电商评论的样本感知损失函数,通过三个核心组件的协同设计系统性地缓解长尾分布带来的性能瓶颈。首先,引入温度感知的标签权重机制,利用可调节的温度参数灵活控制各类别权重的分布锐度,实现对电商评论中低频标签的自适应强化。其次,提出多标签适配的LDAM损失,仅对正标签施加类别边际约束,在保留电商评论多标签语义结构的同时融入基于边际的优化策略。第三,设计协同混合损失函数,通过边际调整重构分类决策边界,再在其基础上应用Focal损失调制机制,使二者在训练过程中产生协同效应,提升模型对电商评论中难分样本的感知能力。在EUR-Lex和Wiki10-31K两个公开数据集上的实验结果表明,SAL方法在P@k和N@k等主流评价指标上均显著优于现有的基线方法。同时,通过PSP@k指标的评估证实,SAL方法在保持优秀整体性能的同时,有效兼顾了电商评论中尾部标签的识别能力,验证了各模块设计的有效性与整体方法的优越性。
Abstract: In e-commerce platforms, the label distribution of product reviews often exhibits long-tail characteristics: a small number of popular labels have a large number of training samples, while numerous fine-grained labels are supported by only a few samples. This severe data imbalance leads to overfitting of traditional classification methods on head categories, severely impairing the classification performance of tail labels and limiting the accuracy of e-commerce platforms’ deep understanding of user reviews. To address this issue, this paper proposes a Sample-Aware Loss function for e-commerce reviews, which systematically alleviates the performance bottleneck caused by long-tail distributions through the collaborative design of three core components. Firstly, a temperature-aware label weight mechanism is introduced, utilizing adjustable temperature parameters to flexibly control the sharpness of the distribution of weights for various categories, achieving adaptive reinforcement for low-frequency labels in e-commerce reviews. Secondly, a multi-label adaptive LDAM loss is proposed, which only imposes category margin constraints on positive labels, integrating margin-based optimization strategies while preserving the multi-label semantic structure of e-commerce reviews. Thirdly, a collaborative hybrid loss function is designed, which reconstructs the classification decision boundary through margin adjustment, and then applies the Focal loss modulation mechanism on top of it, enabling the two to produce a synergistic effect during training and enhancing the model’s perception ability for difficult-to-classify samples in e-commerce reviews. Experimental results on two public datasets, EUR-Lex and Wiki10-31K, demonstrate that the SAL method significantly outperforms existing baseline methods in mainstream evaluation metrics such as P@k and N@k. Meanwhile, evaluation through the PSP@k metric confirms that the SAL method effectively balances the recognition ability of tail labels in e-commerce reviews while maintaining excellent overall performance, verifying the effectiveness of each module design and the superiority of the overall method.
文章引用:谢周坚. 面向电子商务评论的多标签长尾分类样本感知损失函数[J]. 电子商务评论, 2026, 15(5): 113-121. https://doi.org/10.12677/ecl.2026.155495

参考文献

[1] 李可悦, 陈轶, 牛少彰. 基于BERT的社交电商文本分类算法[J]. 计算机科学, 2021, 48(2): 87-92.
[2] 蒋佳, 周玉婷. 电商平台内“评论劫持”不正当竞争行为规制研究[J]. 中国价格监管与反垄断, 2026(2): 35-37.
[3] Kaidi, C., Colin, W., Adrien, G., Nikos, A. and Tengyu, M. (2019) Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, 8-14 December 2019, 32.
[4] Lin, T., Goyal, P., Girshick, R., He, K. and Dollar, P. (2020) Focal Loss for Dense Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 318-327. [Google Scholar] [CrossRef] [PubMed]
[5] Wu, T., Huang, Q., Liu, Z., Wang, Y. and Lin, D. (2020) Distribution-Balanced Loss for Multi-Label Classification in Long-Tailed Datasets. In: Vedaldi, A., et al., Eds., Computer VisionECCV 2020, Springer International Publishing, 162-178. [Google Scholar] [CrossRef
[6] Wei, T., Tu, W., Li, Y. and Yang, G. (2021) Towards Robust Prediction on Tail Labels. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Singapore, 14-18 August 2021, 1812-1820. [Google Scholar] [CrossRef
[7] Qaraei, M., Schultheis, E., Gupta, P. and Babbar, R. (2021) Convex Surrogates for Unbiased Loss Functions in Extreme Classification with Missing Labels. Proceedings of the Web Conference 2021, Ljubljana, 19-23 April 2021, 3711-3720. [Google Scholar] [CrossRef
[8] Jiang, T., Wang, D., Sun, L., Yang, H., Zhao, Z. and Zhuang, F. (2021) LightXML: Transformer with Dynamic Negative Sampling for High-Performance Extreme Multi-Label Text Classification. Proceedings of the AAAI Conference on Artificial Intelligence, 35, 7987-7994. [Google Scholar] [CrossRef
[9] Cui, Y., Jia, M., Lin, T., Song, Y. and Belongie, S. (2019) Class-Balanced Loss Based on Effective Number of Samples. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 16-21 June 2019, 9260-9269. [Google Scholar] [CrossRef