##svm##
svm和逻辑回归一样,也是用来学习得到一个决策边界(decision bundary)的,只不过在某些情况下比逻辑回归更加有效。
###1.引子-逻辑回归###
h
θ
(
x
)
=
1
1
+
exp
(
−
θ
T
X
)
h_{\theta}(x) = \frac{1}{1+\exp(-\theta^TX)}
hθ(x)=1+exp(−θTX)1
对于该假设:
- if y = 1,then h θ ( x ) = 1 h_{\theta}(x) = 1 hθ(x)=1, θ T \theta^T θTx>>0
- if y = 0,then
h
θ
(
x
)
=
0
h_{\theta}(x)=0
hθ(x)=0 ,
θ
T
\theta^T
θTx<<0
cost function is: -
c
o
s
t
=
−
y
l
o
g
1
1
+
e
x
p
(
−
z
)
−
(
1
−
y
)
l
o
g
(
1
−
1
1
+
e
x
p
(
−
z
)
)
cost = -ylog{\frac{1}{1+exp(-z)}}-(1-y)log(1-\frac{1}{1+exp(-z)})
cost=−ylog1+exp(−z)1−(1−y)log(1−1+exp(−z)1)
对cost进行优化:
m i n θ 1 m [ ∑ i = 1 m y ( i ) ( − log h θ ( x ( i ) ) + ( 1 − y ( i ) ) ( − log ( 1 − h θ ( x ( i ) ) ) ) ) ] + λ 2 m ∑ j = 1 n θ j 2 min_{\theta} \frac{1}{m}[\sum_{i=1}^{m} y^{(i)}(-\log h_\theta (x^{(i)})+(1-y^{(i)})(-\log (1-h_\theta (x^{(i)}))))] + \frac{\lambda}{2m}\sum_{j=1}^n \theta_j^2 minθm1[∑i=1my(i)(−loghθ(x(i))+(1−y(i))(−log(1−hθ(x(i)))))]+2mλ∑j=1nθj2
###2. svm###
在svm中,去除 1 m \frac{1}{m} m1这一项(仅是为了计算方便),设:
c o s t 1 ( θ T x ( i ) ) cost_1(\theta^Tx^{(i)}) cost1(θTx(i)) = y ( i ) ( − log h θ ( x ( i ) ) ) y^{(i)}(-\log h_{\theta}(x^{(i)})) y(i)(−loghθ(x(i)))
c o s t 0 ( θ T x ( i ) ) cost_0(\theta^Tx^{(i)}) cost0(θTx(i)) = ( 1 − y ( i ) ) ( − log ( 1 − h θ ( x ( i ) ) ) ) (1-y^{(i)})(-\log (1-h_{\theta}(x^{(i)}))) (1−y(i))(−log(1−hθ(x(i))))
则优化目标变为:
m i n θ ∑ i = 1 m [ y ( i ) c o s t 1 ( θ T x ( i ) ) + ( 1 − y ( i ) ) c o s t 0 ( θ T x ( i ) ) ] + λ 2 ∑ j = 1 n θ j 2 min_{\theta} \sum_{i=1}^{m} [y^{(i)}cost_1(\theta^Tx^{(i)})+(1-y^{(i)})cost_0(\theta^Tx^{(i)})] + \frac{\lambda}{2}\sum_{j=1}^n \theta_j^2 minθ∑i=1m[y(i)cost1(θTx(i))+(1−y(i))cost0(θTx(i))]+2λ∑j=1nθj2
在逻辑回归中, A + λ B A+\lambda B A+λB: λ \lambda λ越大,则赋予B更大的权重,相对B对该式影响越小,所以增大 λ \lambda λ有利于调整B对公式的计算结果的影响.
在svm中, C A + B CA + B CA+B;C越小,则赋予B更大的权重,效果与逻辑回归中一样。所以,可以将C设为 1 λ \frac{1}{\lambda} λ1。则,优化目标可以修改为:
m i n θ C ∑ i = 1 m [ y ( i ) c o s t 1 ( θ T x ( i ) ) + ( 1 − y ( i ) ) c o s t 0 ( θ T x ( i ) ) ] + 1 2 ∑ j = 1 n θ j 2 min_{\theta} C\sum_{i=1}^{m} [y^{(i)}cost_1(\theta^Tx^{(i)})+(1-y^{(i)})cost_0(\theta^Tx^{(i)})] + \frac{1}{2}\sum_{j=1}^n \theta_j^2 minθCi=1∑m[y(i)cost1(θTx(i))+(1−y(i))cost0(θTx(i))]+21j=1∑nθj2
该公式则为svm的优化目标。
令 z = θ T x ( i ) z=\theta^Tx^{(i)} z=θTx(i)
如果y=1,希望: c o s t 1 ( z ) cost_1(z) cost1(z)是当z>=1时, c o s t 1 ( z ) = 0 cost_1(z)=0 cost1(z)=0
如果y=0,希望: c o s t 0 ( z ) cost_0(z) cost0(z)是当z<=-1时, c o s t 0 ( z ) = 0 cost_0(z)=0 cost0(z)=0
如果C很大,则希望找到使得 y ( i ) c o s t 1 ( θ T x ( i ) ) + ( 1 − y ( i ) ) c o s t 0 ( θ T x ( i ) ) y^{(i)}cost_1(\theta^Tx^{(i)})+(1-y^{(i)})cost_0(\theta^Tx^{(i)}) y(i)cost1(θTx(i))+(1−y(i))cost0(θTx(i))整体为零的最优解。即:
y ( i ) = 1 : θ T x ( i ) > = 1 y^{(i)} = 1:\theta^Tx^{(i)}>=1 y(i)=1:θTx(i)>=1
y ( i ) = 0 : θ T x ( i ) < = − 1 y^{(i)} = 0:\theta ^Tx^{(i)}<=-1 y(i)=0:θTx(i)<=−1
则:
m i n θ C ∗ 0 + 1 2 ∑ j = 1 n θ j 2 min_{\theta} C*0 + \frac{1}{2}\sum_{j=1}^{n}\theta_j^2 minθC∗0+21∑j=1nθj2
s . t . z > = 1 , 如 果 y ( i ) = 1 ; z < = − 1 , 如 果 y ( i ) = 0 s.t. z>=1, 如果y^{(i)}=1;z<=-1,如果y^{(i)}=0 s.t.z>=1,如果y(i)=1;z<=−1,如果y(i)=0
###3.决策边界###