机器学习-白板推导 P6_2
SVM 模型求解 对偶问题
primal problem 原问题
带 约 束 : { min w , b 1 2 w T w s . t .      y i ( w T x i + b ) ≥ 1 , f o r ∀ i = 1 , 2.. N 带约束: \begin{cases} \min_{w,b} \frac{1}{2}w^Tw \\ s.t. \;\; y_i(w^Tx_i+b) \geq 1, for \forall i=1,2..N \end{cases} 带约束:{minw,b21wTws.t.yi(wTxi+b)≥1,for∀i=1,2..N
拉格朗日函数:
L
(
w
,
b
,
λ
)
=
1
2
w
T
w
+
∑
i
=
1
N
λ
i
(
1
−
y
i
(
w
T
x
i
+
b
)
)
L(w,b,\lambda)=\frac{1}{2}w^Tw+\sum_{i=1}^N \lambda_{i}(1-y_i(w^Tx_i+b))
L(w,b,λ)=21wTw+∑i=1Nλi(1−yi(wTxi+b))
λ
≥
0
    
1
−
y
i
(
w
T
x
i
+
b
)
≤
0
\lambda \geq 0\;\;1-y_i(w^Tx_i+b)\leq0
λ≥01−yi(wTxi+b)≤0
无 约 束 : { min w , b max λ L ( w , b , λ ) s . t .      λ ≥ 0 无约束: \begin{cases} \min_{w,b} \max_{\lambda} L(w,b,\lambda) \\ s.t. \;\; \lambda \geq 0 \end{cases} 无约束:{minw,bmaxλL(w,b,λ)s.t.λ≥0
{
i
f
:
1
−
y
i
(
w
T
x
i
+
b
)
>
0
,
max
λ
L
(
w
,
b
,
λ
)
=
1
2
w
T
w
+
∞
=
∞
i
f
:
1
−
y
i
(
w
T
x
i
+
b
)
≤
0
,
max
λ
L
(
w
,
b
,
λ
)
=
1
2
w
T
w
+
0
=
1
2
w
T
w
,
min
w
,
b
max
λ
L
(
w
,
b
,
λ
)
=
min
w
,
b
1
2
w
T
w
\begin{cases} if:1-y_i(w^Tx_i+b)>0,\max_{\lambda} L(w,b,\lambda)= \frac{1}{2}w^Tw + \infty=\infty \\ if:1-y_i(w^Tx_i+b) \leq 0,\max_{\lambda} L(w,b,\lambda) = \frac{1}{2}w^Tw+0= \frac{1}{2}w^Tw, \min_{w,b} \max_{\lambda} L(w,b,\lambda)= \min_{w,b} \frac{1}{2}w^Tw \end{cases}
{if:1−yi(wTxi+b)>0,maxλL(w,b,λ)=21wTw+∞=∞if:1−yi(wTxi+b)≤0,maxλL(w,b,λ)=21wTw+0=21wTw,minw,bmaxλL(w,b,λ)=minw,b21wTw
⇒
\Rightarrow
⇒
min
w
,
b
max
λ
L
(
w
,
b
,
λ
)
=
min
w
,
b
(
∞
,
1
2
w
T
w
)
=
min
w
,
b
1
2
w
T
w
\min_{w,b} \max_{\lambda} L(w,b,\lambda)=\min_{w,b}(\infty, \frac{1}{2}w^Tw)=\min_{w,b}\frac{1}{2}w^Tw
minw,bmaxλL(w,b,λ)=minw,b(∞,21wTw)=minw,b21wTw
dual problem 对偶问题
强
对
偶
:
{
max
λ
min
w
,
b
L
(
w
,
b
,
λ
)
s
.
t
.
    
λ
≥
0
强对偶:\begin{cases} \max_{\lambda} \min_{w,b} L(w,b,\lambda) \\ s.t. \;\; \lambda \geq 0 \end{cases}
强对偶:{maxλminw,bL(w,b,λ)s.t.λ≥0
若对偶关系:
min
max
L
≥
max
min
L
\min \max L \geq \max \min L
minmaxL≥maxminL
强对偶关系: min max L = max min L \min \max L = \max \min L minmaxL=maxminL
凸优化二次型问题,满足强对偶。
min w , b L ( w , b , λ ) \min_{w,b} L(w,b,\lambda) minw,bL(w,b,λ)
∂
L
∂
b
=
∂
∂
b
[
∑
i
=
1
N
λ
i
−
∑
i
=
1
N
λ
i
y
i
(
w
T
x
i
+
b
)
]
=
∂
∂
b
[
−
∑
i
=
1
N
λ
i
y
i
b
]
=
−
∑
i
=
1
N
λ
i
y
i
=
0
\begin{aligned} \frac{\partial L}{\partial b} &=\frac{\partial}{\partial b}[\sum_{i=1}^N \lambda_i - \sum_{i=1}^N \lambda_i y_i(w^Tx_i+b)] \\ &=\frac{\partial}{\partial b}[ - \sum_{i=1}^N \lambda_i y_ib] \\ &=- \sum_{i=1}^N \lambda_i y_i = 0 \end{aligned}
∂b∂L=∂b∂[i=1∑Nλi−i=1∑Nλiyi(wTxi+b)]=∂b∂[−i=1∑Nλiyib]=−i=1∑Nλiyi=0
带入
L
(
w
,
b
,
λ
)
L(w,b,\lambda)
L(w,b,λ)
L
(
w
,
b
,
λ
)
=
1
2
w
T
w
+
∑
i
=
1
N
λ
i
(
1
−
y
i
(
w
T
x
i
+
b
)
)
=
1
2
w
T
w
+
∑
i
=
1
N
λ
i
−
∑
i
=
1
N
λ
i
y
i
w
T
x
i
−
∑
i
=
1
N
λ
i
y
i
b
=
1
2
w
T
w
+
∑
i
=
1
N
λ
i
−
∑
i
=
1
N
λ
i
y
i
w
T
x
i
\begin{aligned} L(w,b,\lambda) &=\frac{1}{2}w^Tw+\sum_{i=1}^N \lambda_{i}(1-y_i(w^Tx_i+b)) \\ &=\frac{1}{2}w^Tw+\sum_{i=1}^N\lambda_{i} -\sum_{i=1}^N\lambda_{i}y_iw^Tx_i -\sum_{i=1}^N\lambda_{i} y_i b\\ &=\frac{1}{2}w^Tw+\sum_{i=1}^N\lambda_{i} -\sum_{i=1}^N\lambda_{i}y_iw^Tx_i \end{aligned}
L(w,b,λ)=21wTw+i=1∑Nλi(1−yi(wTxi+b))=21wTw+i=1∑Nλi−i=1∑NλiyiwTxi−i=1∑Nλiyib=21wTw+i=1∑Nλi−i=1∑NλiyiwTxi
∂
L
∂
w
=
1
2
w
−
∑
i
=
1
N
λ
i
y
i
x
i
=
0
⇒
w
=
∑
i
=
1
N
λ
i
y
i
x
i
\begin{aligned} &\frac{\partial L}{\partial w}=\frac{1}{2}w-\sum_{i=1}^N\lambda_{i}y_ix_i=0 \\ & \Rightarrow w=\sum_{i=1}^N\lambda_{i}y_ix_i \end{aligned}
∂w∂L=21w−i=1∑Nλiyixi=0⇒w=i=1∑Nλiyixi
L
(
w
,
b
,
λ
)
=
1
2
(
∑
i
=
1
N
λ
i
y
i
x
i
)
T
(
∑
i
=
1
N
λ
i
y
i
x
i
)
−
∑
i
=
1
N
λ
i
y
i
(
∑
j
=
1
N
λ
j
y
j
x
j
)
T
x
i
+
∑
j
=
1
N
λ
i
=
1
2
∑
i
=
1
N
∑
j
=
1
N
λ
i
λ
j
y
i
y
j
x
i
T
x
j
−
∑
i
=
1
N
∑
j
=
1
N
λ
i
λ
j
y
i
y
j
x
j
T
x
i
+
∑
j
=
1
N
λ
i
=
−
1
2
∑
i
=
1
N
∑
j
=
1
N
λ
i
λ
j
y
i
y
j
x
i
T
x
j
+
∑
j
=
1
N
λ
i
\begin{aligned} L(w,b,\lambda) &= \frac{1}{2}(\sum_{i=1}^N\lambda_{i}y_ix_i)^T(\sum_{i=1}^N\lambda_{i}y_ix_i) - \sum_{i=1}^N\lambda_{i}y_i(\sum_{j=1}^N\lambda_{j}y_jx_j)^Tx_i+\sum_{j=1}^N\lambda_i \\ &= \frac{1}{2} \sum_{i=1}^N \sum_{j=1}^N \lambda_i \lambda_j y_i y_j x_i^T x_j - \sum_{i=1}^N \sum_{j=1}^N \lambda_i \lambda_j y_i y_j x_j^T x_i+\sum_{j=1}^N\lambda_i \\ & = -\frac{1}{2} \sum_{i=1}^N \sum_{j=1}^N \lambda_i \lambda_j y_i y_j x_i^T x_j+\sum_{j=1}^N\lambda_i \\ \end{aligned}
L(w,b,λ)=21(i=1∑Nλiyixi)T(i=1∑Nλiyixi)−i=1∑Nλiyi(j=1∑Nλjyjxj)Txi+j=1∑Nλi=21i=1∑Nj=1∑NλiλjyiyjxiTxj−i=1∑Nj=1∑NλiλjyiyjxjTxi+j=1∑Nλi=−21i=1∑Nj=1∑NλiλjyiyjxiTxj+j=1∑Nλi
等 价 代 换 : { max λ − 1 2 ∑ i = 1 N ∑ j = 1 N λ i λ j y i y j x i T x j + ∑ j = 1 N λ i s . t .      λ i ≥ 0 等价代换:\begin{cases} \max_{\lambda} -\frac{1}{2} \sum_{i=1}^N \sum_{j=1}^N \lambda_i \lambda_j y_i y_j x_i^T x_j+\sum_{j=1}^N\lambda_i \\ s.t. \;\; \lambda_i \geq 0 \end{cases} 等价代换:{maxλ−21∑i=1N∑j=1NλiλjyiyjxiTxj+∑j=1Nλis.t.λi≥0