点到目标函数距离
如图
y = w ⋅ x + b = 0 y=w \cdot x +b =0 y=w⋅x+b=0
y 1 = w ⋅ x 1 + b = 1 y_1=w \cdot x_1 +b =1 y1=w⋅x1+b=1 ①
y 2 = w ⋅ x 2 + b = − 1 y_2=w \cdot x_2 +b =-1 y2=w⋅x2+b=−1②
两个点所在的决策边界距离d要尽量大。
d= ∥ x 1 − x 2 ∥ c o s θ \parallel x_1 - x_2 \parallel cos\theta ∥x1−x2∥cosθ
①-②得
$w \cdot (x_1-x_2)=2 $
=> $\parallel w \parallel \cdot \parallel x_1 - x_2 \parallel cos\theta=2 $
=> ∥ w ∥ ⋅ d = 2 \parallel w \parallel \cdot d=2 ∥w∥⋅d=2
d=
2
∥
w
∥
{2}\over{\parallel w \parallel }
∥w∥2 (当
y
i
=
1
y_i=1
yi=1时$w\cdot x_i+b≥1
,
当
,当
,当y_i=-1
时
时
时w\cdot x_i+b≤-1 $)
{
max
2
∥
w
∥
y
i
(
w
⋅
x
i
+
b
)
≥
1
,
i
=
1
,
2...
,
N
\begin{cases} \max {{2} \over{\parallel w \parallel } }\\ y_i(w\cdot x_i+b)≥1,i=1,2...,N \end{cases}
{max∥w∥2yi(w⋅xi+b)≥1,i=1,2...,N
求距离最小值
把w放到上面可以方便推导
{
min
∥
w
∥
2
2
y
i
(
w
⋅
x
i
+
b
)
≥
1
,
i
=
1
,
2...
,
N
\begin{cases} \min {{\parallel w \parallel }^2 \over 2 }\\ y_i(w\cdot x_i+b)≥1,i=1,2...,N \end{cases}
{min2∥w∥2yi(w⋅xi+b)≥1,i=1,2...,N
拉格朗日乘子法和KKT
上面的式子就是典型的带约束的极值问题。拉格朗日乘子法是约束为等式约束,而这里的不等式约束用的是KKT方法
KKT原理待续
L ( w , b , a ) = 1 2 ∥ w ∥ 2 − ∑ i = 1 n a i ( y i ( w T x i + b ) − 1 ) L(w,b,a)={1\over2}{\parallel w \parallel }^2-\sum\limits_{i=1}^na_i(y_i(w^Tx_i+b)-1) L(w,b,a)=21∥w∥2−i=1∑nai(yi(wTxi+b)−1) ③
对w,b求导等于0
∂ L ∂ w = 0 {\partial L \over \partial w}=0 ∂w∂L=0
=> ∑ i = 1 n a i y i x i = w \sum\limits_{i=1}^na_iy_ix_i=w i=1∑naiyixi=w ④
∂ L ∂ b = 0 {\partial L \over \partial b}=0 ∂b∂L=0
=> ∑ i = 1 n a i y i = 0 \sum\limits_{i=1}^na_iy_i=0 i=1∑naiyi=0 ⑤
∥ w ∥ = w T w {\parallel w \parallel }=w^Tw ∥w∥=wTw
④和⑤带入③得
= 1 2 ∥ w ∥ 2 − ∑ i = 1 n a i ( y i ( w T x i + b ) − 1 ) ={1\over2}{\parallel w \parallel }^2-\sum\limits_{i=1}^na_i(y_i(w^Tx_i+b)-1) =21∥w∥2−i=1∑nai(yi(wTxi+b)−1)
= 1 2 ∥ w ∥ 2 − ∑ i = 1 n a i y i w T x i − ∑ i = 1 n a i y i b + ∑ i = 1 n a i ={1\over2}{\parallel w \parallel }^2 -\sum\limits_{i=1}^na_iy_iw^Tx_i-\sum\limits_{i=1}^na_iy_ib+\sum\limits_{i=1}^na_i =21∥w∥2−i=1∑naiyiwTxi−i=1∑naiyib+i=1∑nai
= 1 2 w T ∑ i = 1 n a i y i x i − w T ∑ i = 1 n a i y i x i − 0 + ∑ i = 1 n a i ={1\over2}w^T\sum\limits_{i=1}^na_iy_ix_i-w^T\sum\limits_{i=1}^na_iy_ix_i-0+\sum\limits_{i=1}^na_i =21wTi=1∑naiyixi−wTi=1∑naiyixi−0+i=1∑nai
= ∑ i = 1 n a i − 1 2 w T ∑ i = 1 n a i y i x i =\sum\limits_{i=1}^na_i-{1\over2}w^T\sum\limits_{i=1}^na_iy_ix_i =i=1∑nai−21wTi=1∑naiyixi
= ∑ i = 1 n a i − 1 2 ( ∑ i = 1 n a i y i x i ) T ∑ i = 1 n a i y i x i =\sum\limits_{i=1}^na_i-{1\over2}(\sum\limits_{i=1}^na_iy_ix_i)^T\sum\limits_{i=1}^na_iy_ix_i =i=1∑nai−21(i=1∑naiyixi)Ti=1∑naiyixi
得:
L ( w , b , a ) = ∑ i = 1 n a i − 1 2 ∑ i , j = 1 n a i a j y i y j x i T x j L(w,b,a)=\sum\limits_{i=1}^na_i-{1\over2}\sum\limits_{i,j=1}^na_ia_jy_iy_jx_i^Tx_j L(w,b,a)=i=1∑nai−21i,j=1∑naiajyiyjxiTxj
转换为对偶
max a ∑ i = 1 n a i − 1 2 ∑ i , j = 1 n a i a j y i y j x i T x j \max\limits_a\sum\limits_{i=1}^na_i-{1\over2}\sum\limits_{i,j=1}^na_ia_jy_iy_jx_i^Tx_j amaxi=1∑nai−21i,j=1∑naiajyiyjxiTxj
s.t., a i a_i ai≥0,i=1,…,n
∑ i = 1 n a i y i \sum\limits_{i=1}^na_iy_i i=1∑naiyi=0
SMO求解
待续
核函数
上面是线性可分的情况 y = w ⋅ x + b = 0 y=w \cdot x +b =0 y=w⋅x+b=0
大部分情况是线性不可分。用核函数将数据映射到高维空间。
w T ϕ ( x ) + b = 0 w^T\phi(x)+b=0 wTϕ(x)+b=0
得到对偶
max a ∑ i = 1 n a i − 1 2 ∑ i , j = 1 n a i a j y i y j ϕ ( x i ) T ϕ ( x j ) \max\limits_a\sum\limits_{i=1}^na_i-{1\over2}\sum\limits_{i,j=1}^na_ia_jy_iy_j\phi(x_i)^T\phi(x_j) amaxi=1∑nai−21i,j=1∑naiajyiyjϕ(xi)Tϕ(xj)
s.t., a i a_i ai≥0,i=1,…,n
∑ i = 1 n a i y i \sum\limits_{i=1}^na_iy_i i=1∑naiyi=0