paper reading: Learning Classifiers from Only Positive and Unlabeled Data

Learning A TRADITIONAL CLASSIFIER FROM NONTRADITIONAL INPUT

Let x be an example and let y ∈ \in {0, 1} be a binary label. Let s = 1 if the example x is labeled, and let s = 0 if x is unlabeled. Only positive examples are labeled so y = 1 is certain when s = 1, but s = 0 when either y = 1 or y = 0 may be true.

Nontraditional training set consists of unlabeled examples <x, s=0> and labeled examples <x, s=1>. Only positive examples are labeled.

p(s=1 | x, y = 0) = 0

Two scenarios:

  1. training data are drawn randomly from p(x,y,s), but for each tuple <x, y, s> that is drawn, only <x, s> is recorded. All x such that s = 1 are recorded.
  2. Two training sets are drawn independently from p(x,y,s). All x are recorded.

Goal: To learn a function f(x) such that f(x) = p(y=1|x) as closely as possible.
Assumption: the labeled positive examples are chosen completely randomly from all positive examples. This means if y = 1, the probability that a positive example is labeled is the same constant regardless of x.

p(s = 1 | x, y = 1) = p(s = 1 | y = 1)

So a training set is a random sample from a distribution p(x, y, s). Such training set consists of two subsets:

  1. labeled (s=1)
  2. unlabeled (s=0)

Training algorithm will yield a function g(x) such that g(x) = p(s=1 | x)

Lemma 1:

Prove p ( y = 1 ∣ x ) = p ( s = 1 ∣ x ) p ( s = 1 ∣ y = 1 p(y=1 \mid x) = \frac{p(s=1 \mid x)}{p(s=1 \mid y=1} p(y=1x)=p(s=1y=1p(s=1x)
p ( s = 1 ∣ x ) = p ( y = 1 , s = 1 ∣ x ) = p ( y = 1 ∣ x ) p ( s = 1 ∣ y = 1 , x ) = p ( y = 1 ∣ x ) p ( s = 1 ∣ y = 1 ) p(s=1 \mid x) \\ = p(y = 1, s = 1 \mid x) \\ = p(y = 1 \mid x)p(s = 1 \mid y=1, x) \\ = p(y = 1 \mid x)p(s = 1 \mid y = 1) p(s=1x)=p(y=1,s=1x)=p(y=1x)p(s=1y=1,x)=p(y=1x)p(s=1y=1)

Note that f f f is is an increasing function of g. This means if the classifier f f f is only used to rank examples x according to the chance that they belong to class y = 1, then the classifier g can be used directly instead of f.

It is impossible to have g > p ( s = 1 ∣ y = 1 ) p(s=1 \mid y = 1) p(s=1y=1) . This is reasonable because the “positive” (labeled) and “negative” (unlabeled) training sets for g are sampels from overlapping regions in x space.

WEIGHTING UNBALANCED EXAMPLES

在这里插入图片描述

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值