SVM (support vector machine)
本文着眼于SMO原理和非线性分类器。
SMO
网页。
网页。
platt论文 Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines。
非线性分类器
一个思路是将低维数据投射到高维数据,在高维空间中寻找超平面。
则代价函数由 x i → ⋅ x j → \overrightarrow{x_i} \cdot \overrightarrow{x_j} xi⋅xj变为 ⟨ ϕ ( x ( i ) ) , ϕ ( x ( j ) ) ⟩ \left \langle \phi(x^{(i)}), \phi(x^{(j)}) \right \rangle ⟨ϕ(x(i)),ϕ(x(j))⟩,那么对于 k k k维,其时间复杂度是 O ( k 2 ) O(k^2) O(k2)(需要转化为高维空间),难以接受。一个可行 t r i c k trick trick是使用 k e r n e l f u n c t i o n kernel\ function kernel function,其高维点积为低维点积转换后相乘, h ( x i ) → ⋅ h ( x j ) → \overrightarrow{h(x_i)} \cdot \overrightarrow{h(x_j)} h(xi)⋅h(xj),则时间复杂度为 O ( k ) O(k) O(k)。
常用 k e r n e l kernel kernel:
Kernel | expression |
---|---|
Linear | K ( x , y ) = x T y + c K(x, y) = x^Ty + c K(x,y)=xTy+c |
Polynomial | K ( x , y ) = ( a x T y + c ) d , ( a , c ⩾ 0 ) K(x, y) = (ax^Ty + c)^d, (a, c \geqslant 0) K(x,y)=(axTy+c)d,(a,c⩾0) |
Radial Basis | K ( x , y ) = e x p ( − γ ∥ x − y ∥ 2 ) , ( γ ⩾ 0 ) K(x, y) = exp(-\gamma \|x - y\|^2), (\gamma \geqslant 0) K(x,y)=exp(−γ∥x−y∥2),(γ⩾0) |
Gaussiaan | K ( x , y ) = e x p ( − ∥ x − y ∥ 2 2 σ 2 ) K(x, y) = exp(-\frac{\|x-y\|^2}{2\sigma^2}) K(x,y)=exp(−2σ2∥x−y∥2) |
Valid Kernel: 半正定对称矩阵。证明。
SMO derivation(concrete)
We have ∑ α i y i = 0 \sum \alpha_i y_i = 0 ∑αiyi=0, so we have to change α i , α j \alpha_i, \alpha_j αi,αj simultaneously. Assume we choose α 1 , α 2 \alpha_1, \alpha_2 α1,α2,then α 1 y 1 + α 2 y 2 = − ∑ i = 3 α i y i = ζ \alpha_1 y_1 + \alpha_2y_2 = -\sum_{i = 3} \alpha_iy_i = \zeta α1y1+α2y2=−∑i=3αiyi=ζ.
target:
m
i
n
L
=
1
2
∑
i
,
j
α
i
α
j
y
i
y
j
<
x
i
→
,
x
j
→
>
−
∑
α
i
min\ L = \frac{1}{2} \sum_{i, j} \alpha_i \alpha_j y_i y_j <\overrightarrow{x_i}, \overrightarrow{x_j}> - \sum \alpha_i
min L=21∑i,jαiαjyiyj<xi,xj>−∑αi
L
=
1
2
α
1
2
K
11
+
1
2
α
2
2
K
22
+
α
1
α
2
y
1
y
2
K
12
+
α
1
y
1
∑
i
=
3
α
i
y
i
K
i
1
+
α
2
y
2
∑
i
=
3
α
i
y
i
K
i
2
−
(
α
1
+
α
2
)
+
c
o
n
s
t
α
1
=
y
1
ζ
−
y
1
y
2
α
2
∂
α
1
∂
α
2
=
−
y
1
y
2
∂
L
∂
α
2
=
α
1
K
11
∂
α
1
∂
α
2
+
α
2
K
22
+
y
1
y
2
K
12
∂
α
1
α
2
∂
α
2
+
∂
α
1
∂
α
2
y
1
∑
i
=
3
α
i
y
i
K
i
1
+
y
2
∑
i
=
3
α
i
y
i
K
i
2
−
1
−
∂
α
1
∂
α
2
=
−
y
1
y
2
α
1
K
11
+
α
2
K
22
+
y
1
y
2
K
12
(
α
1
−
y
1
y
2
α
2
)
−
y
2
∑
i
=
3
α
i
y
i
K
i
1
+
y
2
∑
i
=
3
α
i
y
i
K
i
2
+
y
1
y
2
−
1
=
(
K
11
+
K
22
−
2
K
12
)
α
2
−
y
2
K
11
ζ
+
y
2
K
12
ζ
+
y
1
y
2
−
1
−
y
2
∑
i
=
3
α
i
y
i
(
K
i
1
−
K
i
2
)
\begin{aligned} L = & \frac{1}{2} \alpha_1^2 K_{11} + \frac{1}{2} \alpha_2^2 K_{22} + \alpha_1 \alpha_2 y_1 y_2 K_{12} + \alpha_1 y_1 \sum_{i = 3} \alpha_i y_i K_{i1} + \alpha_2 y_2 \sum_{i = 3} \alpha_i y_i K_{i2} - (\alpha_1 + \alpha_2) + const \\ \alpha_1 = & y_1 \zeta - y_1 y_2 \alpha_2 \\ \frac{\partial \alpha_1}{\partial \alpha_2} = & -y_1y_2\\ \frac{\partial L}{\partial \alpha_2} = & \alpha_1 K_{11} \frac{\partial \alpha_1}{\partial \alpha_2} + \alpha_2 K_{22} + y_1 y_2 K_{12} \frac{\partial \alpha_1 \alpha_2}{\partial \alpha_2} + \frac{\partial \alpha_1}{\partial \alpha_2} y_1 \sum_{i = 3} \alpha_i y_i K_{i1} + y_2 \sum_{i = 3} \alpha_i y_i K_{i2} - 1 - \frac{\partial \alpha_1}{\partial \alpha_2}\\ = & -y_1y_2 \alpha_1 K_{11} + \alpha_2 K_{22} + y_1 y_2 K_{12} (\alpha_1 - y_1 y_2 \alpha_2) - y_2 \sum_{i = 3} \alpha_i y_i K_{i1} + y_2 \sum_{i = 3} \alpha_i y_i K_{i2} + y_1 y_2 - 1\\ = & (K_{11} + K_{22} - 2K_{12}) \alpha_2 - y_2 K_{11} \zeta + y_2 K_{12} \zeta + y_1y_2 - 1 - y_2 \sum_{i = 3} \alpha_i y_i (K_{i1} - K_{i2})& \end{aligned}
L=α1=∂α2∂α1=∂α2∂L===21α12K11+21α22K22+α1α2y1y2K12+α1y1i=3∑αiyiKi1+α2y2i=3∑αiyiKi2−(α1+α2)+consty1ζ−y1y2α2−y1y2α1K11∂α2∂α1+α2K22+y1y2K12∂α2∂α1α2+∂α2∂α1y1i=3∑αiyiKi1+y2i=3∑αiyiKi2−1−∂α2∂α1−y1y2α1K11+α2K22+y1y2K12(α1−y1y2α2)−y2i=3∑αiyiKi1+y2i=3∑αiyiKi2+y1y2−1(K11+K22−2K12)α2−y2K11ζ+y2K12ζ+y1y2−1−y2i=3∑αiyi(Ki1−Ki2)
let ∂ L ∂ α 2 = 0 \frac{\partial L}{\partial \alpha_2} = 0 ∂α2∂L=0, then ( K 11 + K 22 − 2 K 12 ) α 2 = y 2 ( ( K 11 − K 12 ) ζ + y 2 − y 1 + ∑ i = 3 α i y i ( K i 1 − K i 2 ) ) (K_{11} + K_{22} - 2K_{12}) \alpha_2 = y_2 ((K_{11} - K_{12})\zeta + y_2 - y_1 + \sum_{i = 3} \alpha_i y_i (K_{i1} - K_{i2})) (K11+K22−2K12)α2=y2((K11−K12)ζ+y2−y1+∑i=3αiyi(Ki1−Ki2))
( K 11 + K 22 − 2 K 12 ) α 2 = y 2 ( ( K 11 − K 12 ) ζ + y 2 − y 1 + ∑ i = 3 α i y i ( K i 1 − K i 2 ) ) ( K 11 + K 22 − 2 K 12 ) α 2 = y 2 ( ∑ α i y i K i 1 − ∑ α i y i K i 2 + y 2 − y 1 + α 2 y 2 ( K 11 + K 22 − 2 K 12 ) ) ( K 11 + K 22 − 2 K 12 ) α 2 ∗ = ( K 11 + K 22 − 2 K 12 ) α 2 + y 2 ( ( ∑ α i y i K i 1 − y 1 ) − ( ∑ α i y i K i 2 − y 2 ) ) \begin{aligned} (K_{11} + K_{22} - 2K_{12}) \alpha_2 = & y_2 ((K_{11} - K_{12})\zeta + y_2 - y_1 + \sum_{i = 3} \alpha_i y_i (K_{i1} - K_{i2})) \\ (K_{11} + K_{22} - 2K_{12}) \alpha_2 = & y_2(\sum \alpha_iy_iK_{i1} - \sum \alpha_iy_iK_{i2} + y_2 - y_1 + \alpha_2 y_2 (K_{11} + K_{22} - 2 K_{12}))\\ (K_{11} + K_{22} - 2K_{12})\alpha_2^* = & (K_{11} + K_{22} - 2K_{12})\alpha_2 + y_2((\sum \alpha_iy_iK_{i1} - y_1) - (\sum \alpha_iy_iK_{i2} - y_2))\\ \end{aligned} (K11+K22−2K12)α2=(K11+K22−2K12)α2=(K11+K22−2K12)α2∗=y2((K11−K12)ζ+y2−y1+i=3∑αiyi(Ki1−Ki2))y2(∑αiyiKi1−∑αiyiKi2+y2−y1+α2y2(K11+K22−2K12))(K11+K22−2K12)α2+y2((∑αiyiKi1−y1)−(∑αiyiKi2−y2))
let
E
i
=
∑
j
α
i
y
i
K
i
j
+
b
−
y
i
E_i = \sum_{j} \alpha_iy_iK_{ij} + b - y_i
Ei=∑jαiyiKij+b−yi,
η
=
K
11
+
K
22
−
2
K
12
\eta = K_{11} + K_{22} - 2K_{12}
η=K11+K22−2K12
α
2
∗
=
α
2
+
y
2
(
E
1
−
E
2
)
η
\alpha_2^* = \alpha_2 + \frac{y_2(E_1 - E_2)}{\eta}
α2∗=α2+ηy2(E1−E2)
α 2 ∗ \alpha_2^* α2∗ also needs to satisfy [ L , H ] [L, H] [L,H]
α 2 n e w = { H , ( H < α 2 ∗ ) α 2 ∗ , ( L ⩽ α 2 ∗ ⩽ H ) L , ( α 2 ∗ < L ) \alpha_2^{new} = \left\{\begin{matrix} H,& (H < \alpha_2^*)\\ \alpha_2^*,& (L \leqslant \alpha_2^* \leqslant H)\\ L,& (\alpha_2^* < L) \end{matrix}\right. α2new=⎩⎨⎧H,α2∗,L,(H<α2∗)(L⩽α2∗⩽H)(α2∗<L)
α 1 n e w = y 1 ( η − y 2 α 2 n e w ) = y 1 ( y 1 α 1 + y 2 α 2 − y 2 α 2 n e w ) = α 1 + y 1 y 2 ( α 2 − α 2 n e w ) \alpha_1^{new} = y_1(\eta - y_2\alpha_2^{new}) = y_1(y_1\alpha_1 + y_2\alpha_2 - y_2\alpha_2^{new}) = \alpha_1 + y_1y_2(\alpha_2 - \alpha_2^{new}) α1new=y1(η−y2α2new)=y1(y1α1+y2α2−y2α2new)=α1+y1y2(α2−α2new)
iterations choice
u
=
∑
y
j
α
j
K
(
x
j
→
,
x
→
)
−
b
u = \sum y_j \alpha_j K(\overrightarrow{x_j}, \overrightarrow{x}) - b
u=∑yjαjK(xj,x)−b
KKT condition of QP problem:
α
i
=
0
⇔
y
i
u
i
⩾
1
0
<
α
i
<
C
⇔
y
i
u
i
=
1
α
i
=
C
⇔
y
i
u
i
⩽
1
\begin{aligned} \alpha_i = 0 \Leftrightarrow y_iu_i \geqslant 1\\ 0 <\alpha_i < C \Leftrightarrow y_iu_i = 1\\ \alpha_i = C \Leftrightarrow y_iu_i \leqslant 1 \end{aligned}
αi=0⇔yiui⩾10<αi<C⇔yiui=1αi=C⇔yiui⩽1
first choice the point which violates KKT, then choice the point of max ∥ E 2 − E 1 ∥ \|E_2 - E_1\| ∥E2−E1∥.