一种基于标签比例信息的迁移学习算法
A Label Proportion Information-Based Transfer Learning Algorithm
DOI: 10.12677/CSA.2020.102035, PDF,    国家自然科学基金支持
作者: 汪槐沛*, 肖燕珊:广东工业大学计算机学院,广东 广州;刘 波:广东工业大学自动化学院学院,广东 广州
关键词: 标签比例学习数据挖掘迁移学习Learning with Label Proportions Data Mining Transfer Learning
摘要: 标签比例学习问题是一项仅使用样本标签比例信息去构建分类模型的挖掘任务,由于训练样本不充分,现有方法将该问题视为单一任务,在文本分类中的表现并不理想。考虑到迁移学习在一定程度上能解决训练数据不充分的问题,于是如何利用历史数据(原任务数据)帮助新产生的数据(目标任务数据)进行分类显得异常重要。本文提出了一种基于标签比例信息的迁移学习算法,将知识从原任务迁移到目标任务,帮助目标任务更好构建分类器。为了获得迁移学习模型,该方法将原始优化问题转换为凸优化问题,然后解决对偶优化问题为目标任务建立准确的分类器。实验结果表明,大部分条件下所提算法性能优于传统方法。
Abstract: The learning with label proportions problem is a learning task that only uses bag’s label propor-tions information to build a classification model. Due to insufficient training samples, the existing methods that viewed the above problem as single task did not perform well in text classification. To some extent, transfer learning can solve the problem of insufficient training data, the problem that how to use historical data (the original task data) to help the newly generated data (target task data) to classify becomes extremely important. This paper presents a label proportion information-based transfer learning approach to transfer knowledge from the source task to the target task, helping the target task to build a classifier. In order to obtain the transfer learning model, this method converted the original optimization problem into a convex optimization problem, and then solved the dual optimization problem to establish an accurate classifier for the target task. Extensive experiments have shown that the proposed method outperforms the traditional methods.
文章引用:汪槐沛, 肖燕珊, 刘波. 一种基于标签比例信息的迁移学习算法[J]. 计算机科学与应用, 2020, 10(2): 340-349. https://doi.org/10.12677/CSA.2020.102035

参考文献

[1] Kyuck, H. and de Freitas, N. (2005) Learning about Individuals from Group Statistics. In: Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence, AUAI Press, New York, 332-339.
[2] Chen, Z., Shi, Y. and Qi, Z. (2019) Constrained Matrix Factorization for Semi-Weakly Learning with Label Proportions. Pattern Recognition, 91, 13-24. [Google Scholar] [CrossRef
[3] Tan, B., Song, Y., Zhong, E. and Qiang, Y. (2015) Transitive Transfer Learning. Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2015, 1155-1164. [Google Scholar] [CrossRef
[4] Pan, S.J. and Yang, Q. (2009) A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering, 22, 1345-1359. [Google Scholar] [CrossRef
[5] Hernández, J. and Inza, I. (2011) Learning Naive Bayes Models for Multiple-Instance Learning with Label Proportions. In: Proceedings of Conference of the Spanish Association for Artificial Intelligence, Springer, Berlin, Heidelberg, 134-144. [Google Scholar] [CrossRef
[6] Fan, K., Zhang, H., Yan, S., et al. (2014) Learning a Generative Classifier from Label Proportions. Neurocomputing, 139, 47-55. [Google Scholar] [CrossRef
[7] Sun, T., Sheldon, D. and O’Connor, B. (2017) A Probabilistic Approach for Learning with Label Proportions Applied to the US Presidential Election. 2017 IEEE International Conference on Data Mining (ICDM), New Orleans, LA, 18-21 November 2017, 445-454. [Google Scholar] [CrossRef
[8] Ardehaly, E.M. and Culotta, A. (2017) Mining the Demographics of Political Sentiment from Twitter Using Learning from Label Proportions. 2017 IEEE International Conference on Data Mining (ICDM), New Orleans, LA, 18-21 November 2017, 733-738. [Google Scholar] [CrossRef
[9] Rueping, S. (2010) SVM Classifier Estimation from Group Probabilities. Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21-24 June 2010, 911-918.
[10] Yu, F.X., Liu, D., Kumar, S., et al. (2013) SVM for Learning with Label Proportions. arXiv Preprint arXiv:1306.0886.
[11] Wang, B., Chen, Z. and Qi, Z. (2015) Linear Twin SVM for Learning from Label Proportions. 2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), Singapore, 6-9 December 2015, 56-59. [Google Scholar] [CrossRef
[12] Cui, L., Chen, Z., Meng, F. and Shi, Y. (2016) Laplacian SVM for Learning from Label Proportions. 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), Barcelona, 12-15 December, 2016, 847-852. [Google Scholar] [CrossRef
[13] Chen, Z., Qi, Z., Wang, B., et al. (2017) Learning with Label Proportions Based on Nonparallel Support Vector Machines. Knowledge-Based Systems, 119, 126-141. [Google Scholar] [CrossRef
[14] Platt, J. (1999) Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods. Advances in Large Margin Classifiers, 10, 61-74.
[15] Zhang, M.L. and Zhou, Z.H. (2008) M3MIML: A Maximum Margin Method for Multi-Instance Mul-ti-Label Learning. 2008 Eighth IEEE International Conference on Data Mining, Pisa, Italy, 15-19 December 2008, 688-697. [Google Scholar] [CrossRef
[16] Liu, B., Xiao, Y. and Hao, Z. (2018) A Selective Multiple Instance Transfer Learning Method for Text Categorization Problems. Knowledge-Based Systems, 141, 178-187. [Google Scholar] [CrossRef
[17] Demšar, J. (2006) Statistical Comparisons of Classifiers over Multiple Data Sets. Journal of Machine Learning Research, 7, 1-30.
[18] Derrac, J., García, S., Molina, D., et al. (2011) A Practical Tutorial on the Use of Nonparametric Statistical Tests as a Methodology for Comparing Evolutionary and Swarm Intelligence Algorithms. Swarm and Evolutionary Computation, 1, 3-18. [Google Scholar] [CrossRef