SVM原理-2

SVM (support vector machine)

本文着眼于SMO原理和非线性分类器。

SMO

网页
网页
platt论文 Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines。

非线性分类器

一个思路是将低维数据投射到高维数据,在高维空间中寻找超平面。

则代价函数由 x i → ⋅ x j → \overrightarrow{x_i} \cdot \overrightarrow{x_j} xi xj 变为 ⟨ ϕ ( x ( i ) ) , ϕ ( x ( j ) ) ⟩ \left \langle \phi(x^{(i)}), \phi(x^{(j)}) \right \rangle ϕ(x(i)),ϕ(x(j)),那么对于 k k k维,其时间复杂度是 O ( k 2 ) O(k^2) O(k2)(需要转化为高维空间),难以接受。一个可行 t r i c k trick trick是使用 k e r n e l   f u n c t i o n kernel\ function kernel function,其高维点积为低维点积转换后相乘, h ( x i ) → ⋅ h ( x j ) → \overrightarrow{h(x_i)} \cdot \overrightarrow{h(x_j)} h(xi) h(xj) ,则时间复杂度为 O ( k ) O(k) O(k)

常用 k e r n e l kernel kernel

Kernelexpression
Linear K ( x , y ) = x T y + c K(x, y) = x^Ty + c K(x,y)=xTy+c
Polynomial K ( x , y ) = ( a x T y + c ) d , ( a , c ⩾ 0 ) K(x, y) = (ax^Ty + c)^d, (a, c \geqslant 0) K(x,y)=(axTy+c)d,(a,c0)
Radial Basis K ( x , y ) = e x p ( − γ ∥ x − y ∥ 2 ) , ( γ ⩾ 0 ) K(x, y) = exp(-\gamma \|x - y\|^2), (\gamma \geqslant 0) K(x,y)=exp(γxy2),(γ0)
Gaussiaan K ( x , y ) = e x p ( − ∥ x − y ∥ 2 2 σ 2 ) K(x, y) = exp(-\frac{\|x-y\|^2}{2\sigma^2}) K(x,y)=exp(2σ2xy2)

Valid Kernel: 半正定对称矩阵。证明

SMO derivation(concrete)

We have ∑ α i y i = 0 \sum \alpha_i y_i = 0 αiyi=0, so we have to change α i , α j \alpha_i, \alpha_j αi,αj simultaneously. Assume we choose α 1 , α 2 \alpha_1, \alpha_2 α1,α2,then α 1 y 1 + α 2 y 2 = − ∑ i = 3 α i y i = ζ \alpha_1 y_1 + \alpha_2y_2 = -\sum_{i = 3} \alpha_iy_i = \zeta α1y1+α2y2=i=3αiyi=ζ.

target:
m i n   L = 1 2 ∑ i , j α i α j y i y j < x i → , x j → > − ∑ α i min\ L = \frac{1}{2} \sum_{i, j} \alpha_i \alpha_j y_i y_j <\overrightarrow{x_i}, \overrightarrow{x_j}> - \sum \alpha_i min L=21i,jαiαjyiyj<xi ,xj >αi
L = 1 2 α 1 2 K 11 + 1 2 α 2 2 K 22 + α 1 α 2 y 1 y 2 K 12 + α 1 y 1 ∑ i = 3 α i y i K i 1 + α 2 y 2 ∑ i = 3 α i y i K i 2 − ( α 1 + α 2 ) + c o n s t α 1 = y 1 ζ − y 1 y 2 α 2 ∂ α 1 ∂ α 2 = − y 1 y 2 ∂ L ∂ α 2 = α 1 K 11 ∂ α 1 ∂ α 2 + α 2 K 22 + y 1 y 2 K 12 ∂ α 1 α 2 ∂ α 2 + ∂ α 1 ∂ α 2 y 1 ∑ i = 3 α i y i K i 1 + y 2 ∑ i = 3 α i y i K i 2 − 1 − ∂ α 1 ∂ α 2 = − y 1 y 2 α 1 K 11 + α 2 K 22 + y 1 y 2 K 12 ( α 1 − y 1 y 2 α 2 ) − y 2 ∑ i = 3 α i y i K i 1 + y 2 ∑ i = 3 α i y i K i 2 + y 1 y 2 − 1 = ( K 11 + K 22 − 2 K 12 ) α 2 − y 2 K 11 ζ + y 2 K 12 ζ + y 1 y 2 − 1 − y 2 ∑ i = 3 α i y i ( K i 1 − K i 2 ) \begin{aligned} L = & \frac{1}{2} \alpha_1^2 K_{11} + \frac{1}{2} \alpha_2^2 K_{22} + \alpha_1 \alpha_2 y_1 y_2 K_{12} + \alpha_1 y_1 \sum_{i = 3} \alpha_i y_i K_{i1} + \alpha_2 y_2 \sum_{i = 3} \alpha_i y_i K_{i2} - (\alpha_1 + \alpha_2) + const \\ \alpha_1 = & y_1 \zeta - y_1 y_2 \alpha_2 \\ \frac{\partial \alpha_1}{\partial \alpha_2} = & -y_1y_2\\ \frac{\partial L}{\partial \alpha_2} = & \alpha_1 K_{11} \frac{\partial \alpha_1}{\partial \alpha_2} + \alpha_2 K_{22} + y_1 y_2 K_{12} \frac{\partial \alpha_1 \alpha_2}{\partial \alpha_2} + \frac{\partial \alpha_1}{\partial \alpha_2} y_1 \sum_{i = 3} \alpha_i y_i K_{i1} + y_2 \sum_{i = 3} \alpha_i y_i K_{i2} - 1 - \frac{\partial \alpha_1}{\partial \alpha_2}\\ = & -y_1y_2 \alpha_1 K_{11} + \alpha_2 K_{22} + y_1 y_2 K_{12} (\alpha_1 - y_1 y_2 \alpha_2) - y_2 \sum_{i = 3} \alpha_i y_i K_{i1} + y_2 \sum_{i = 3} \alpha_i y_i K_{i2} + y_1 y_2 - 1\\ = & (K_{11} + K_{22} - 2K_{12}) \alpha_2 - y_2 K_{11} \zeta + y_2 K_{12} \zeta + y_1y_2 - 1 - y_2 \sum_{i = 3} \alpha_i y_i (K_{i1} - K_{i2})& \end{aligned} L=α1=α2α1=α2L===21α12K11+21α22K22+α1α2y1y2K12+α1y1i=3αiyiKi1+α2y2i=3αiyiKi2(α1+α2)+consty1ζy1y2α2y1y2α1K11α2α1+α2K22+y1y2K12α2α1α2+α2α1y1i=3αiyiKi1+y2i=3αiyiKi21α2α1y1y2α1K11+α2K22+y1y2K12(α1y1y2α2)y2i=3αiyiKi1+y2i=3αiyiKi2+y1y21(K11+K222K12)α2y2K11ζ+y2K12ζ+y1y21y2i=3αiyi(Ki1Ki2)

let ∂ L ∂ α 2 = 0 \frac{\partial L}{\partial \alpha_2} = 0 α2L=0, then ( K 11 + K 22 − 2 K 12 ) α 2 = y 2 ( ( K 11 − K 12 ) ζ + y 2 − y 1 + ∑ i = 3 α i y i ( K i 1 − K i 2 ) ) (K_{11} + K_{22} - 2K_{12}) \alpha_2 = y_2 ((K_{11} - K_{12})\zeta + y_2 - y_1 + \sum_{i = 3} \alpha_i y_i (K_{i1} - K_{i2})) (K11+K222K12)α2=y2((K11K12)ζ+y2y1+i=3αiyi(Ki1Ki2))

( K 11 + K 22 − 2 K 12 ) α 2 = y 2 ( ( K 11 − K 12 ) ζ + y 2 − y 1 + ∑ i = 3 α i y i ( K i 1 − K i 2 ) ) ( K 11 + K 22 − 2 K 12 ) α 2 = y 2 ( ∑ α i y i K i 1 − ∑ α i y i K i 2 + y 2 − y 1 + α 2 y 2 ( K 11 + K 22 − 2 K 12 ) ) ( K 11 + K 22 − 2 K 12 ) α 2 ∗ = ( K 11 + K 22 − 2 K 12 ) α 2 + y 2 ( ( ∑ α i y i K i 1 − y 1 ) − ( ∑ α i y i K i 2 − y 2 ) ) \begin{aligned} (K_{11} + K_{22} - 2K_{12}) \alpha_2 = & y_2 ((K_{11} - K_{12})\zeta + y_2 - y_1 + \sum_{i = 3} \alpha_i y_i (K_{i1} - K_{i2})) \\ (K_{11} + K_{22} - 2K_{12}) \alpha_2 = & y_2(\sum \alpha_iy_iK_{i1} - \sum \alpha_iy_iK_{i2} + y_2 - y_1 + \alpha_2 y_2 (K_{11} + K_{22} - 2 K_{12}))\\ (K_{11} + K_{22} - 2K_{12})\alpha_2^* = & (K_{11} + K_{22} - 2K_{12})\alpha_2 + y_2((\sum \alpha_iy_iK_{i1} - y_1) - (\sum \alpha_iy_iK_{i2} - y_2))\\ \end{aligned} (K11+K222K12)α2=(K11+K222K12)α2=(K11+K222K12)α2=y2((K11K12)ζ+y2y1+i=3αiyi(Ki1Ki2))y2(αiyiKi1αiyiKi2+y2y1+α2y2(K11+K222K12))(K11+K222K12)α2+y2((αiyiKi1y1)(αiyiKi2y2))

let E i = ∑ j α i y i K i j + b − y i E_i = \sum_{j} \alpha_iy_iK_{ij} + b - y_i Ei=jαiyiKij+byi, η = K 11 + K 22 − 2 K 12 \eta = K_{11} + K_{22} - 2K_{12} η=K11+K222K12
α 2 ∗ = α 2 + y 2 ( E 1 − E 2 ) η \alpha_2^* = \alpha_2 + \frac{y_2(E_1 - E_2)}{\eta} α2=α2+ηy2(E1E2)

α 2 ∗ \alpha_2^* α2 also needs to satisfy [ L , H ] [L, H] [L,H]

α 2 n e w = { H , ( H < α 2 ∗ ) α 2 ∗ , ( L ⩽ α 2 ∗ ⩽ H ) L , ( α 2 ∗ < L ) \alpha_2^{new} = \left\{\begin{matrix} H,& (H < \alpha_2^*)\\ \alpha_2^*,& (L \leqslant \alpha_2^* \leqslant H)\\ L,& (\alpha_2^* < L) \end{matrix}\right. α2new=H,α2,L,(H<α2)(Lα2H)(α2<L)

α 1 n e w = y 1 ( η − y 2 α 2 n e w ) = y 1 ( y 1 α 1 + y 2 α 2 − y 2 α 2 n e w ) = α 1 + y 1 y 2 ( α 2 − α 2 n e w ) \alpha_1^{new} = y_1(\eta - y_2\alpha_2^{new}) = y_1(y_1\alpha_1 + y_2\alpha_2 - y_2\alpha_2^{new}) = \alpha_1 + y_1y_2(\alpha_2 - \alpha_2^{new}) α1new=y1(ηy2α2new)=y1(y1α1+y2α2y2α2new)=α1+y1y2(α2α2new)

iterations choice

u = ∑ y j α j K ( x j → , x → ) − b u = \sum y_j \alpha_j K(\overrightarrow{x_j}, \overrightarrow{x}) - b u=yjαjK(xj ,x )b
KKT condition of QP problem:
α i = 0 ⇔ y i u i ⩾ 1 0 < α i < C ⇔ y i u i = 1 α i = C ⇔ y i u i ⩽ 1 \begin{aligned} \alpha_i = 0 \Leftrightarrow y_iu_i \geqslant 1\\ 0 <\alpha_i < C \Leftrightarrow y_iu_i = 1\\ \alpha_i = C \Leftrightarrow y_iu_i \leqslant 1 \end{aligned} αi=0yiui10<αi<Cyiui=1αi=Cyiui1

first choice the point which violates KKT, then choice the point of max ∥ E 2 − E 1 ∥ \|E_2 - E_1\| E2E1.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值