本文链接：https://blog.csdn.net/lmwang1234/article/details/87848479

Learning A TRADITIONAL CLASSIFIER FROM NONTRADITIONAL INPUT

Let x be an example and let y $\in$ {0, 1} be a binary label. Let s = 1 if the example x is labeled, and let s = 0 if x is unlabeled. Only positive examples are labeled so y = 1 is certain when s = 1, but s = 0 when either y = 1 or y = 0 may be true.

Nontraditional training set consists of unlabeled examples <x, s=0> and labeled examples <x, s=1>. Only positive examples are labeled.

p(s=1 | x, y = 0) = 0

Two scenarios:

training data are drawn randomly from p(x,y,s), but for each tuple <x, y, s> that is drawn, only <x, s> is recorded. All x such that s = 1 are recorded.
Two training sets are drawn independently from p(x,y,s). All x are recorded.

Goal: To learn a function f(x) such that f(x) = p(y=1|x) as closely as possible.
Assumption: the labeled positive examples are chosen completely randomly from all positive examples. This means if y = 1, the probability that a positive example is labeled is the same constant regardless of x.

p(s = 1 | x, y = 1) = p(s = 1 | y = 1)

So a training set is a random sample from a distribution p(x, y, s). Such training set consists of two subsets:

labeled (s=1)
unlabeled (s=0)

Training algorithm will yield a function g(x) such that g(x) = p(s=1 | x)

Lemma 1:

Prove $\mid x) = \frac{p(s=1 \mid x)}{p(s=1 \mid y=1}$
$\mid x) \\ = p(y = 1, s = 1 \mid x) \\ = p(y = 1 \mid x)p(s = 1 \mid y=1, x) \\ = p(y = 1 \mid x)p(s = 1 \mid y = 1)$

Note that $f$ is is an increasing function of g. This means if the classifier $f$ is only used to rank examples x according to the chance that they belong to class y = 1, then the classifier g can be used directly instead of f.

It is impossible to have g > $\mid y = 1)$ . This is reasonable because the “positive” (labeled) and “negative” (unlabeled) training sets for g are sampels from overlapping regions in x space.