II. Learning to Answer Yes/No
1. Perceptron Hypothesis Set
Perceptron 感知机
⇔
\Leftrightarrow
⇔ linear(binary) classifier
h
(
x
)
=
s
i
g
n
(
(
∑
i
=
1
d
)
−
t
h
r
e
s
h
o
l
d
)
h
(
x
)
=
s
i
g
n
(
∑
i
=
1
d
w
i
x
i
+
(
−
t
h
r
e
s
h
o
l
d
w
0
)
⋅
(
+
1
x
0
)
)
h
(
x
)
=
s
i
g
n
(
∑
i
=
0
d
w
i
x
i
)
h
(
x
)
=
s
i
g
n
(
w
T
x
)
h(x) = sign((\sum^d_{i=1})-threshold)\\ h(x) = sign(\sum^d_{i=1}{w_ix_i}+ (\mathop{-threshold}\limits_{w_0})\cdot (\mathop{+1}\limits_{x_0}))\\ h(x) = sign(\sum^d_{i=0}{w_ix_i})\\ h(x) = sign(w^Tx)
h(x)=sign((i=1∑d)−threshold)h(x)=sign(i=1∑dwixi+(w0−threshold)⋅(x0+1))h(x)=sign(i=0∑dwixi)h(x)=sign(wTx)
2. Perceptron Learning Algorithm PLA
H
\Eta
H: all possible perceptrons
D: training data
for t = 0,1...
find a mistake of w_t called (x_{n(t)}, y_{n(t)})
correct the mistake
w_{t+1} := w_t + y_{n(t)}x_{n(t)}
end if no mistake
return last w(w_{PLA}) as g
This screenshot explains why we use
w
t
+
1
:
=
w
t
+
y
n
(
t
)
x
n
(
t
)
w_{t+1} := w_t + y_{n(t)}x_{n(t)}
wt+1:=wt+yn(t)xn(t) to update. Tip: the product of two vectors which have an obtuse angle is negtive
截图来自课件,侵删
The screenshot shows a complete process of classification.
原图来自课件,懒得截再拼了,以下图片来自博客,侵删
cyclic PLA: judge mistakes egorticly or randomly, it will go over each example
3.Guarantee of PLA
If D is linear separable, PLA always halts, but it is slow.The upper bound of iterations is calculable.
I don’t understand the math of upper bound.
截图自课件
4. Non-Separable Data
or non-linear data
We still can find the best line with noisy tolerance.
w
g
=
a
r
g
m
i
n
x
∑
n
=
1
N
[
y
n
≠
s
i
g
n
(
w
T
x
n
)
]
w_g = \mathop{argmin}\limits_x \sum^N_{n=1}[y_n\not=sign(w^Tx_n)]
wg=xargminn=1∑N[yn=sign(wTxn)]
It is an NP-hard problem!
Pocket Algorithm
Pocket algorithm would record the best solution and halts when reach certain iterations(or time, depends on you). It has tiny modification based on PLA, but it’s slower for the comparison between current
w
t
+
1
w_{t+1}
wt+1 and
w
p
o
c
k
e
t
w_{pocket}
wpocket in each loops.
截图自课件