PU learning
postive learning ,仅有正样本情况下的学习。其应用范围:
- Retrieval(检索)
- Inlier-based outlier detection.
- One-vs-rest classification. 负样本太过分散而不能标注
目前有两种解决方法
- 启发式地从未标注样本里找到可靠的负样本,以此训练二分类器,该方法问题是分类效果严重依赖先验知识。
- 将未标注样本作为负样本训练分类器,由于负样本中含有正样本,错误的标签指定导致分类错误。
理论分析
如果未标注样本里class prior已知的话,PU learning问题可使用cose-sensitive learning方法解决,即原则上,PU 分类可以通过诸如加权支持向量机(weighted SVM)解决。
具体点,对于未标注数据:
P
X
=
π
P
1
+
(
1
−
π
)
P
−
1
P_X = \pi P_1 + (1-\pi)P_{-1}
PX=πP1+(1−π)P−1
其中
π
\pi
π为未知的class prior。
Cost-sensitive classification的权重误分率期望:
R
(
f
)
:
=
π
R
1
(
f
)
+
(
1
−
π
)
R
−
1
(
f
)
=
π
R
1
(
f
≠
1
)
+
(
1
−
π
)
R
−
1
(
f
≠
−
1
)
(
1
)
R(f): = \pi {R_1}(f) + (1 - \pi ){R_{ - 1}}(f)\\ = \pi {R_1}(f \ne 1) + (1 - \pi ){R_{ - 1}}(f \ne - 1) (1)
R(f):=πR1(f)+(1−π)R−1(f)=πR1(f̸=1)+(1−π)R−1(f̸=−1)(1)
其中R代表误分率,对于PU learning, 同样有上述形式,但是
R
−
1
(
f
)
R_{-1}(f)
R−1(f)需要通过
R
X
(
f
)
R_X(f)
RX(f)转换变形得到,
KaTeX parse error: Expected 'EOF', got '\eqalign' at position 1: \̲e̲q̲a̲l̲i̲g̲n̲{ & {R_X}(f) = …
这里其实有X划分到应为-1,如果1则是误分;将上式子(2)代入(1)中有,
R
(
f
)
=
π
R
1
(
f
)
+
(
1
−
π
)
R
−
1
(
f
)
=
π
R
1
(
f
)
+
R
X
(
f
)
−
π
P
1
(
f
(
X
)
=
1
)
=
π
R
1
(
f
)
+
R
X
(
f
)
−
π
(
1
−
R
1
(
f
)
)
=
2
π
R
1
(
f
)
+
R
X
(
f
)
−
π
(
3
)
R(f) = \pi {R_1}(f) + (1 - \pi ){R_{ - 1}}(f)\\ =\pi {R_1}(f) +R_X(f)-\pi P_1(f(X) = 1)\\ =\pi {R_1}(f) +R_X(f)-\pi(1-R_1(f))\\ =2\pi R_1(f)+R_X(f)-\pi (3)
R(f)=πR1(f)+(1−π)R−1(f)=πR1(f)+RX(f)−πP1(f(X)=1)=πR1(f)+RX(f)−π(1−R1(f))=2πR1(f)+RX(f)−π(3)
由于
R
1
(
f
)
=
E
1
[
ℓ
H
(
g
(
X
)
)
]
R
x
(
f
)
=
π
E
1
[
ℓ
H
(
−
g
(
x
)
)
]
+
(
1
−
π
)
E
−
1
[
ℓ
H
(
−
g
(
X
)
)
]
{R_1}(f) = \mathbb E_{_1}[{\ell _H}(g(X))] {R_x}(f) = \pi \mathbb E_{1}[{\ell _H}( - g(x))] + (1 - \pi ){\mathbb E_{ - 1}}[{\ell _H}( - g(X))]
R1(f)=E1[ℓH(g(X))]Rx(f)=πE1[ℓH(−g(x))]+(1−π)E−1[ℓH(−g(X))]
代入上有
J
P
U
−
H
(
g
)
=
π
E
1
[
ℓ
H
(
g
(
X
)
)
]
+
(
1
−
π
)
E
−
1
[
ℓ
H
(
−
g
(
X
)
)
]
+
π
E
1
[
ℓ
H
(
g
(
X
)
)
+
ℓ
H
(
−
g
(
X
)
)
]
−
π
J_{PU-H}(g)=\pi\mathbb E_1[\ell_H(g(X))]+(1-\pi)\mathbb E_{-1}[\ell_H(-g(X))]+\pi\mathbb E_1[\ell_H(g(X))+\ell_H(-g(X))]-\pi
JPU−H(g)=πE1[ℓH(g(X))]+(1−π)E−1[ℓH(−g(X))]+πE1[ℓH(g(X))+ℓH(−g(X))]−π
另一种形式即
R
^
p
u
(
g
)
=
2
π
p
R
^
p
+
(
g
)
+
R
^
u
−
(
g
)
−
π
p
{{\hat R}_{pu}}(g) = 2{\pi _p}\hat R_p^ + (g) + {{\hat R}_u}^ - (g) - {\pi _p}
R^pu(g)=2πpR^p+(g)+R^u−(g)−πp
其中:$ \hat R_p^ + (g) = (1/{n_p})\mathop \sum \limits_{i = 1}^{{n_p}} \ell (g(x_i^p), + 1) $和 $ \hat R_n^ - (g) = (1/{n_p})\mathop \sum \limits_{i = 1}^{{n_n}} \ell (g(x_i^n), - 1)$
前两项为普通误差项,后面多多余惩罚项,因此由于多余项的存在,可能无法最小化
J
P
U
−
H
(
g
)
J_{PU-H}(g)
JPU−H(g),当且仅当
ℓ
H
(
g
(
X
)
)
+
ℓ
H
(
−
g
(
X
)
)
\ell_H(g(X))+\ell_H(-g(X))
ℓH(g(X))+ℓH(−g(X))为常数时,能获得最优解
<img src = leanote://file/getImage?fileId=5bbdf300db24ba219b000005) width = 80%>
因此关键是采样合适的loss function.另外,对于PU learning, 在(2)中求取 R − 1 R_{-1} R−1项时, R X ( f ) − π P 1 ( f ( X ) = 1 ) R_X(f)-\pi P_1(f(X) = 1) RX(f)−πP1(f(X)=1)可能为负,因此将该项改为 m a x { 0 , R X ( f ) − π P 1 ( f ( X ) = 1 ) } max\{0, R_X(f)-\pi P_1(f(X) = 1)\} max{0,RX(f)−πP1(f(X)=1)}防止过拟合。
实验
-
将未标注数据全部作为negative样本训练随机森林
-
随机选取与positive等量negative 训练分类并对剩余样本预测,重复多次,将概率平均
-
PU learning
其他
- 采用非线性模型时,可将unlabeled采样为若干份,每份大小与positive类似,然后直接训练多个模型,将得到的概率平均即得,目前该方法在无先验知识分类时最好。
最近看了一些PU learning的东西,总结一下,不对之处敬请指正!
Reference
- du Plessis, M. C., Niu, G. & Sugiyama, M. Analysis of Learning from Positive and Unlabeled Data. Advances in Neural Information Processing Systems 27 703–711 (2014).
- Convex Formulation for Learning from Positive and Unlabeled Data, ICML, 2015.
- 1.Kiryo, R. & Niu, G. Positive-Unlabeled Learning with Non-Negative Risk Estimator. NIPS 11 (2017).