SVM——笔记

点到超平面的距离

向量 x x x到超平面 w T x + b w^Tx+b wTx+b的距离为
∣ w T ( x − x 0 ) ∣ = ∣ w ∣ ∣ x − x 0 ∣ cos ⁡ π 2 = ∣ ∣ w ∣ ∣ d |w^T(x-x_0)|=|w||x-x_0|\cos \frac{\pi}{2}=||w||d wT(xx0)=wxx0cos2π=wd
又因为
w T ( x − x 0 ) = w T x − w T x 0 = w T x + b w^T(x-x_0)=w^Tx-w^Tx_0=w^Tx+b wT(xx0)=wTxwTx0=wTx+b
合并上两式
∣ w T ( x − x 0 ) ∣ = ∣ w T x + b ∣ = ∣ ∣ w ∣ ∣ d ⇒ d = 1 ∣ ∣ w ∣ ∣ ∣ w T x + b ∣ \begin{aligned} |w^T(x-x_0)| &=|w^Tx+b|=||w||d \\ & \Rightarrow d=\frac{1}{||w||}|w^Tx+b| \end{aligned} wT(xx0)=wTx+b=wdd=w1wTx+b

硬间隔分类器

由于超平面有两端,再另一端则会出现负号,为了消除这个负号,我们乘上类标 y ∈ { − 1 , 1 } y \in \{ -1,1\} y{1,1}
γ i = y i ( w T x i + b ) \gamma_i=y_i(w^Tx_i+b) γi=yi(wTxi+b)
结合点到超平面的距离公式,发现只需要除以 ∣ ∣ w ∣ ∣ ||w|| w就能将 γ i \gamma_i γi转化为距离
d i = γ i ∣ ∣ w ∣ ∣ = y i ( w T x + b ) ∣ ∣ w ∣ ∣ d_i=\frac{\gamma_i}{||w||}=\frac{y_i(w^Tx+b)}{||w||} di=wγi=wyi(wTx+b)
为了使得每个向量到超平面的距离尽可能的大,只需要使最小的 d i d_i di最大即可,同时,其他向量到超平面的距离会大于这个最小距离,即:
max ⁡ w , b   γ ∣ ∣ w ∣ ∣ s . t .   y i ( w T x i + b ) ≥ 0 , i = 1 , … , n \begin{aligned} \max_{w,b}\ & \frac{\gamma}{||w||} \\ s.t.\ & y_i(w^Tx_i+b) \ge 0,i=1,\dots,n \end{aligned} w,bmax s.t. wγyi(wTxi+b)0,i=1,,n
此处使得 γ \gamma γ取到最小的 x i x_i xi即为支持向量,支持向量机也就行想使得支持向量离超平面尽可能的远。
由于 w , b w,b w,b可以以任意比例缩放,所以令 γ = 1 \gamma=1 γ=1,可推出
max ⁡ w , b   1 ∣ ∣ w ∣ ∣ s . t .   y i ( w T x i + b ) ≥ 1 , i = 1 , … , n \begin{aligned} \max_{w,b}\ & \frac{1}{||w||} \\ s.t.\ & y_i(w^Tx_i+b) \ge 1,i=1,\dots,n \end{aligned} w,bmax s.t. w1yi(wTxi+b)1,i=1,,n
1 ∣ ∣ w ∣ ∣ \frac{1}{||w||} w1取得最大值时 ∣ ∣ w ∣ ∣ ||w|| w最小,故可再转化为
min ⁡ w , b   w T w s . t .   y i ( w T x i + b ) ≥ 1 , i = 1 , … , n \begin{aligned} \min_{w,b}\ & w^Tw \\ s.t.\ & y_i(w^Tx_i+b) \ge 1,i=1,\dots,n \end{aligned} w,bmin s.t. wTwyi(wTxi+b)1,i=1,,n
为了解决带约束的最优化问题,使用拉格朗日乘子法,构建拉格朗日函数
L ( w , b , α ) = 1 2 w T w + ∑ i = 1 n a i [ 1 − y i ( w T x i + b ) ] \mathcal{L}(w,b,\alpha)=\frac{1}{2}w^Tw+\sum_{i=1}^{n}{a_i \left[ 1- y_i(w^Tx_i+b) \right]} L(w,b,α)=21wTw+i=1nai[1yi(wTxi+b)]
于是将问题转化为不带 w , b w,b w,b约束的优化问题
min ⁡ w , b max ⁡ α L s . t .   α i ≥ 0 \begin{aligned} \min_{w,b} & \max_{\alpha}\mathcal{L} \\ s.t.\ & \alpha_i \ge 0 \end{aligned} w,bmins.t. αmaxLαi0
1 − y i ( w T x i + b ) 1- y_i(w^Tx_i+b) 1yi(wTxi+b)不满足约束时, m a x α L = max_{\alpha}\mathcal{L}= maxαL=,这样是没有意义的。而当其满足约束时 m a x α L = 0 max_{\alpha}\mathcal{L}=0 maxαL=0
再将其转化为对偶问题
max ⁡ α min ⁡ w , b L s . t .   α i ≥ 0 \begin{aligned} \max_{\alpha} & \min_{w,b}\mathcal{L} \\ s.t.\ & \alpha_i \ge 0 \end{aligned} αmaxs.t. w,bminLαi0
先看最小化的部分,发现与 α \alpha α无关,于是可以直接对 w , b w,b w,b求导
∂ L ∂ b = 0 ⇒ ∑ i = 1 n α i y i = 0 ∂ L ∂ w = 0 ⇒ w = ∑ i = 1 n α i y i x i \begin{aligned} &\frac{\partial \mathcal{L}}{\partial b}=0 \Rightarrow \sum_{i=1}^{n}{\alpha_iy_i}=0\\ &\frac{\partial \mathcal{L}}{\partial w}=0 \Rightarrow w=\sum_{i=1}^{n}{\alpha_iy_ix_i} \end{aligned} bL=0i=1nαiyi=0wL=0w=i=1nαiyixi
将其带回原式
L = 1 2 ( ∑ i = 1 n α i y i x i ) T ∑ j = 1 n α j y j x j − ∑ i = 1 n α i y i [ ( ∑ j = 1 n α j y j x j ) T x i + b ] + ∑ i = 1 n α i = 1 2 ∑ i = 1 n ∑ j = 1 n α i α j y i y j x i T x j − ∑ i = 1 n ∑ j = 1 n α i α j y i y j x i T x j + ∑ i = 1 n α i = ∑ i = 1 n α i − 1 2 ∑ i = 1 n ∑ j = 1 n α i α j y i y j x i T x j \begin{aligned} \mathcal{L}&=\frac{1}{2}(\sum_{i=1}^{n}{\alpha_iy_ix_i})^T\sum_{j=1}^{n}{\alpha_jy_jx_j}-\sum_{i=1}^{n} \alpha_iy_i \left[ (\sum_{j=1}^{n}{\alpha_jy_jx_j})^Tx_i+b \right] +\sum_{i=1}^{n}\alpha_i\\ &=\frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}\alpha_i\alpha_jy_iy_jx_i^Tx_j-\sum_{i=1}^{n}\sum_{j=1}^{n}\alpha_i\alpha_jy_iy_jx_i^Tx_j+\sum_{i=1}^{n}\alpha_i\\ &=\sum_{i=1}^{n}\alpha_i-\frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}\alpha_i\alpha_jy_iy_jx_i^Tx_j \end{aligned} L=21(i=1nαiyixi)Tj=1nαjyjxji=1nαiyi[(j=1nαjyjxj)Txi+b]+i=1nαi=21i=1nj=1nαiαjyiyjxiTxji=1nj=1nαiαjyiyjxiTxj+i=1nαi=i=1nαi21i=1nj=1nαiαjyiyjxiTxj
将原约束问题转化为
max ⁡ α ∑ i = 1 n α i − 1 2 ∑ i = 1 n ∑ j = 1 n α i α j y i y j x i T x j s . t .   ∑ i = 1 n α i y i = 0   α i ≥ 0 \begin{aligned} \max_{\alpha} & \sum_{i=1}^{n}\alpha_i-\frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}\alpha_i\alpha_jy_iy_jx_i^Tx_j \\ s.t.\ & \sum_{i=1}^{n}{\alpha_iy_i}=0\\ &\ \alpha_i \ge 0 \end{aligned} αmaxs.t. i=1nαi21i=1nj=1nαiαjyiyjxiTxji=1nαiyi=0 αi0
但上式只能对 w w w求解,而对于 b b b的求解则需要用到KKT条件(因为函数 L \mathcal{L} L满足一些条件,故其满足KKT条件)
{ ∂ L ∂ w = 0 , ∂ L ∂ b = 0 α i [ 1 − y i ( w T x i + b ) ] = 0 1 − y i ( w T x i + b ) ≤ 0 α i ≥ 0 \left\{ \begin{aligned} &\frac{\partial \mathcal{L}}{\partial w}=0, \frac{\partial \mathcal{L}}{\partial b}=0\\ &\pmb{\alpha_i \left[ 1-y_i(w^Tx_i+b)\right]=0}\\ &1-y_i(w^Tx_i+b) \le 0\\ &\alpha_i \ge 0 \end{aligned} \right. wL=0,bL=0αi[1yi(wTxi+b)]=0αi[1yi(wTxi+b)]=0αi[1yi(wTxi+b)]=01yi(wTxi+b)0αi0
故当 x i x_i xi为支持向量时满足 1 − y i ( w T x i + b ) = 0 1-y_i(w^Tx_i+b)=0 1yi(wTxi+b)=0,推出 b = y i − w T x i b=y_i-w^Tx_i b=yiwTxi,再结合 w w w
{ w ∗ = ∑ i = 1 n α i y i x i b ∗ = y i − ( ∑ j = 1 n α j y j x j ) T x i \left\{ \begin{aligned} &w^*=\sum_{i=1}^{n}{\alpha_iy_ix_i}\\ &b^*=y_i-(\sum_{j=1}^{n}{\alpha_jy_jx_j})^Tx_i \end{aligned} \right. w=i=1nαiyixib=yi(j=1nαjyjxj)Txi

软间隔分类器

在数据不能线性可分的情况下,硬间隔SVM是不收敛的,故在原有最优化条件上加一个损失,使其成为软间隔分离器
min ⁡ w , b   w T w + C ∑ i = 1 n ( max ⁡ { 0 , 1 − y i ( w T x i + b ) } ) \min_{w,b}\ w^Tw + C\sum_{i=1}^{n}(\max \left\{ 0,1-y_i(w^Tx_i+b) \right\}) w,bmin wTw+Ci=1n(max{0,1yi(wTxi+b)})
但是一般不会写成括号里的形式,令 max ⁡ { 0 , 1 − y i ( w T x i + b ) } = ξ i \max \left\{ 0,1-y_i(w^Tx_i+b) \right\}=\xi_i max{0,1yi(wTxi+b)}=ξi,故将最优化问题转化为
min ⁡ w , b   w T w + C ∑ i = 1 n ξ i s . t .   y i ( w T x i + b ) ≥ 1 − ξ i ξ i ≥ 0 \begin{aligned} \min_{w,b}\ &w^Tw + C\sum_{i=1}^{n} \xi_i \\ s.t.\ & y_i(w^Tx_i+b) \ge 1-\xi_i \\ & \xi_i \ge 0 \end{aligned} w,bmin s.t. wTw+Ci=1nξiyi(wTxi+b)1ξiξi0
引入拉格朗日乘子,并将其转化为对偶问题
max ⁡ α , β min ⁡ w , b   L = w T w + C ∑ i = 1 n ξ i − ∑ i = 1 n α i [ ξ i + y i ( w T x i + b ) − 1 ] − ∑ i = 1 n β i ξ i   α i ≥ 0   β i ≥ 0 \begin{aligned} \max_{\alpha,\beta}\min_{w,b}\ & \mathcal{L}=w^Tw+C\sum_{i=1}^{n}\xi_i-\sum_{i=1}^{n}{\alpha_i\left[ \xi_i+y_i(w^Tx_i+b)-1 \right]}-\sum_{i=1}^{n}{\beta_i\xi_i}\\ \ & \alpha_i \ge 0\\ \ & \beta_i \ge 0 \end{aligned} α,βmaxw,bmin   L=wTw+Ci=1nξii=1nαi[ξi+yi(wTxi+b)1]i=1nβiξiαi0βi0
同时,其满足KKT条件
{ ∂ L ∂ w = 0 , ∂ L ∂ b = 0 , ∂ L ∂ ξ = 0 α i [ 1 − y i ( w T x i + b ) ] = 0 β i ξ i = 0 y i ( w T x i + b ) − 1 + ξ i ≤ 0 ξ i , α i , β i ≥ 0 \left\{ \begin{aligned} &\frac{\partial \mathcal{L}}{\partial w}=0, \frac{\partial \mathcal{L}}{\partial b}=0, \frac{\partial \mathcal{L}}{\partial \xi}=0\\ &\alpha_i \left[ 1-y_i(w^Tx_i+b)\right]=0\\ &\beta_i\xi_i=0\\ &y_i(w^Tx_i+b)-1+\xi_i \le 0\\ &\xi_i,\alpha_i,\beta_i \ge 0 \end{aligned} \right. wL=0,bL=0,ξL=0αi[1yi(wTxi+b)]=0βiξi=0yi(wTxi+b)1+ξi0ξi,αi,βi0
w , b , ξ w,b,\xi w,b,ξ分别求偏导得出:
∂ L ∂ b = 0 ⇒ ∑ i = 1 n α i y i = 0 ∂ L ∂ w = 0 ⇒ w = ∑ i = 1 n α i y i x i ∂ L ∂ ξ = 0 ⇒ ξ i = C − β i \begin{aligned} &\frac{\partial \mathcal{L}}{\partial b}=0 \Rightarrow \sum_{i=1}^{n}{\alpha_iy_i}=0\\ &\frac{\partial \mathcal{L}}{\partial w}=0 \Rightarrow w=\sum_{i=1}^{n}{\alpha_iy_ix_i}\\ &\frac{\partial \mathcal{L}}{\partial \xi}=0 \Rightarrow \xi_i=C-\beta_i \end{aligned} bL=0i=1nαiyi=0wL=0w=i=1nαiyixiξL=0ξi=Cβi
L \mathcal{L} L化简
L = w T w − ∑ i = 1 n α i y i ( w T x + b ) + ∑ i = 1 n α i + ∑ i = 1 n ( C − α i ) ξ i − ∑ i = 1 n ( C − α i ) ξ i = ∑ i = 1 n α i − 1 2 ∑ i = 1 n ∑ j = 1 n α i α j y i y j x i T x j \begin{aligned} \mathcal{L}&=w^Tw-\sum_{i=1}^{n}\alpha_iy_i(w^Tx+b)+\sum_{i=1}^{n}\alpha_i+\sum_{i=1}^{n}(C-\alpha_i)\xi_i-\sum_{i=1}^{n}(C-\alpha_i)\xi_i\\ &=\sum_{i=1}^{n}\alpha_i-\frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}\alpha_i\alpha_jy_iy_jx_i^Tx_j \end{aligned} L=wTwi=1nαiyi(wTx+b)+i=1nαi+i=1n(Cαi)ξii=1n(Cαi)ξi=i=1nαi21i=1nj=1nαiαjyiyjxiTxj
优化问题转化为
max ⁡ α   ∑ i = 1 n α i − 1 2 ∑ i = 1 n ∑ j = 1 n α i α j y i y j x i T x j s . t .   0 ≤ α i ≤ C , i = 1 , … , n ∑ i = 1 m α i y i = 0 \begin{aligned} \max_{\alpha}\ & \sum_{i=1}^{n}\alpha_i-\frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}\alpha_i\alpha_jy_iy_jx_i^Tx_j \\ s.t.\ & 0\le \alpha_i \le C,i=1,\dots,n\\ &\sum_{i=1}^{m}\alpha_iy_i=0 \end{aligned} αmax s.t. i=1nαi21i=1nj=1nαiαjyiyjxiTxj0αiC,i=1,,ni=1mαiyi=0

SMO

由于需要满足等式约束 ∑ i = 1 m α i y i = 0 \sum_{i=1}^{m}\alpha_iy_i=0 i=1mαiyi=0,而当只改变一个 α i \alpha_i αi时会违反该约束,所以一次至少要对两个 α i \alpha_i αi进行修改

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值