# 一种基于标签比例信息的迁移学习算法A Label Proportion Information-Based Transfer Learning Algorithm

Abstract: The learning with label proportions problem is a learning task that only uses bag’s label propor-tions information to build a classification model. Due to insufficient training samples, the existing methods that viewed the above problem as single task did not perform well in text classification. To some extent, transfer learning can solve the problem of insufficient training data, the problem that how to use historical data (the original task data) to help the newly generated data (target task data) to classify becomes extremely important. This paper presents a label proportion information-based transfer learning approach to transfer knowledge from the source task to the target task, helping the target task to build a classifier. In order to obtain the transfer learning model, this method converted the original optimization problem into a convex optimization problem, and then solved the dual optimization problem to establish an accurate classifier for the target task. Extensive experiments have shown that the proposed method outperforms the traditional methods.

1. 引言

1) 结合支持向量回归算法提出了基于标签比例信息的迁移学习模型，该模型可以利用迁移学习将知识从原任务迁移到目标任务。

2) 利用拉格朗日方法将原始目标模型转换为凸优化问题，并获得原任务和目标任务的预测模型。

3) 在多个数据集上进行广泛实验，并与现有算法进行对比，验证了提出算法的有效性。

2. 问题描述与相关工作

2.1. 问题描述

Figure 1. Two-class label proportions learning problem

2.2. 相关工作

3. 标签比例学习算法

$y=-\mathrm{log}\left(\frac{1}{p}-1\right)$ (1)

${\forall }_{i}:\frac{1}{|{B}_{i}|}\underset{j\in {B}_{i}}{\sum }\left({w}^{\text{T}}{x}_{j}+b\right)={y}_{i}$(2)

3.1. 目标函数

${f}_{1}\left(x\right)={w}_{1}^{\text{T}}\cdot x+{b}_{1}$ (3)

${f}_{2}\left(x\right)={w}_{2}^{\text{T}}\cdot x+{b}_{2}$ (4)

$\mathrm{min}\text{ }\frac{1}{2}{‖{w}_{0}‖}^{2}+\frac{{\lambda }_{1}}{2}{‖{v}_{1}‖}^{2}+\frac{{\lambda }_{2}}{2}{‖{v}_{2}‖}^{2}+{C}_{1}\underset{i=1}{\overset{{t}_{1}}{\sum }}\left({\xi }_{1i}+{\xi }_{1i}^{*}\right)+{C}_{2}\underset{m=1}{\overset{{t}_{2}}{\sum }}\left({\xi }_{2m}+{\xi }_{2m}^{*}\right)$ (5)

${\forall }_{i=1}^{{t}_{1}}:\begin{array}{l}\frac{1}{|{B}_{i}^{s}|}\underset{j\in {B}_{i}^{s}}{\sum }\left({w}_{1}^{\text{T}}{x}_{j}+{b}_{1}\right)-{y}_{i}\le {\epsilon }_{1i}+{\xi }_{1i}\hfill \\ {y}_{i}-\frac{1}{|{B}_{i}^{s}|}\underset{j\in {B}_{i}^{s}}{\sum }\left({w}_{1}^{\text{T}}{x}_{j}+{b}_{1}\right)\le {\epsilon }_{1i}+{\xi }_{1i}^{*}\hfill \end{array}$

${\forall }_{m=1}^{{t}_{2}}:\begin{array}{l}\frac{1}{|{B}_{m}^{t}|}\underset{n\in {B}_{m}^{t}}{\sum }\underset{n\in {B}_{m}^{t}}{\sum }\left({w}_{2}^{\text{T}}{x}_{n}+{b}_{2}\right)-{y}_{m}\le {\epsilon }_{2m}+{\xi }_{2m}\hfill \\ {y}_{m}-\frac{1}{|{B}_{m}^{t}|}\underset{n\in {B}_{m}^{t}}{\sum }\left({w}_{2}^{\text{T}}{x}_{n}+{b}_{2}\right)\le {\epsilon }_{2m}+{\xi }_{2m}^{*}\hfill \end{array}$

${\xi }_{1n},{\xi }_{1n}^{*}\ge 0\text{\hspace{0.17em}}\left(n=1,\cdots ,{t}_{1}\right)$

${\xi }_{2m},{\xi }_{2m}^{*}\ge 0\text{\hspace{0.17em}}\left(m=1,\cdots ,{t}_{2}\right)$

Figure 2. Transfer knowledge from the source task to the target task

3.2. 对偶问题

$\begin{array}{l}\frac{1+{\lambda }_{1}}{2{\lambda }_{1}}\underset{i,j=1}{\overset{{t}_{1}}{\sum }}\frac{\left({\alpha }_{1i}^{*}-{\alpha }_{1i}\right)\left({\alpha }_{1j}^{*}-{\alpha }_{1j}\right)}{|{B}_{i}^{s}||{B}_{j}^{s}|}K\left({x}_{i},{x}_{j}\right)\\ \text{ }+\frac{1+{\lambda }_{2}}{2{\lambda }_{2}}\underset{m,n=1}{\overset{{t}_{2}}{\sum }}\frac{\left({\alpha }_{2m}^{*}-{\alpha }_{2m}\right)\left({\alpha }_{2n}^{*}-{\alpha }_{2n}\right)}{|{B}_{m}^{t}||{B}_{n}^{t}|}K\left({x}_{m},{x}_{n}\right)\\ \text{ }+\underset{i=1}{\overset{{t}_{1}}{\sum }}\underset{m=1}{\overset{{t}_{2}}{\sum }}\frac{\left({\alpha }_{1i}^{*}-{\alpha }_{1i}\right)\left({\alpha }_{2m}^{*}-{\alpha }_{2m}\right)}{|{B}_{i}^{s}||{B}_{m}^{t}|}K\left({x}_{i},{x}_{m}\right)\\ \text{ }-\underset{i=1}{\overset{{t}_{1}}{\sum }}\left({y}_{i}\left({\alpha }_{1i}^{*}-{\alpha }_{1i}\right)-{\epsilon }_{1i}\left({\alpha }_{1i}^{*}+{\alpha }_{1i}\right)\right)\\ \text{ }-\underset{m=1}{\overset{{t}_{2}}{\sum }}\left({y}_{m}\left({\alpha }_{2m}^{*}-{\alpha }_{2m}\right)-{\epsilon }_{2m}\left({\alpha }_{2m}^{*}+{\alpha }_{2m}\right)\right)\end{array}$ (6)

$\underset{i=1}{\overset{{t}_{1}}{\sum }}\left({a}_{1i}-{a}_{1i}^{*}\right)+\underset{m=1}{\overset{{t}_{2}}{\sum }}\left({a}_{2m}-{a}_{2m}^{*}\right)=0$

${\forall }_{i=1}^{{t}_{1}}:0\le {\alpha }_{1i},{\alpha }_{1i}^{*}\le {C}_{1}$

${\forall }_{m=1}^{{t}_{2}}:0\le {\alpha }_{2m},{\alpha }_{2m}^{*}\le {C}_{2}$

${w}_{0}=\underset{i=1}{\overset{{t}_{1}}{\sum }}\left({a}_{1i}^{*}-{a}_{1i}\right)\frac{1}{|{B}_{i}^{s}|}\underset{j\in {B}_{i}^{s}}{\overset{}{\sum }}{x}_{j}+\underset{m=1}{\overset{{t}_{2}}{\sum }}\left({a}_{2m}^{*}-{a}_{2m}\right)\frac{1}{|{B}_{m}^{t}|}\underset{j\in {B}_{m}^{t}}{\overset{}{\sum }}{x}_{n}$ (7)

${v}_{1}=\frac{1}{{\lambda }_{1}}\underset{i=1}{\overset{{t}_{1}}{\sum }}\left({\alpha }_{1i}^{*}-{\alpha }_{1i}\right)\frac{1}{|{B}_{i}^{s}|}\underset{j\in {B}_{i}^{s}}{\sum }{x}_{j}$ (8)

${v}_{2}=\frac{1}{{\lambda }_{2}}\underset{m=1}{\overset{{t}_{2}}{\sum }}\left({\alpha }_{2m}^{*}-{\alpha }_{2m}\right)\frac{1}{|{B}_{m}^{t}|}\underset{n\in {B}_{m}^{t}}{\sum }{x}_{n}$ (9)

Table 1. LPI-TL Algorithm

3.3. 时间复杂度分析

4. 实验与分析

4.1. 实验数据

Table 2. The list of data sets

4.2. 实验设置

Inv-Cal： $C\in \left[{2}^{-2},{2}^{5}\right],\epsilon \in \left[0.01,0.1\right]$

Alter-SVM： $C\in \left[{2}^{-2},{2}^{5}\right],{C}_{p}\in \left[{2}^{-2},{2}^{7}\right]$

p-NPSVM： ${C}_{i}\in \left[{2}^{-5},{2}^{5}\right]\left(i=1,2,3,4\right),{C}_{p}\in \left\{0.1,1,10\right\}$

LPI-TL： ${C}_{i}\in \left[{2}^{-2},{2}^{7}\right]\left(i=1,2\right),\epsilon \in \left[0,1\right]$

4.3. 实验结果分析

Table 3. Experimental accuracy and standard deviation Statistics

Figure 3. The mean accuracy

Table 4. Wilcoxon signed ranks test.

Table 5. Performance comparison of each algorithm

5. 结束语

1http://www.iesl.cs.umass.edu/datasets.html。

2http://qwone.com/~jason/20Newsgroups/。

