SVM2

3 线性支持向量机与软间隔最大化

一个特征空间上的数据集:

  • T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , ⋯   , ( x N , y N ) } T=\left\{\left(x_{1}, y_{1}\right),\left(x_{2}, y_{2}\right), \cdots,\left(x_{N}, y_{N}\right)\right\} T={(x1,y1),(x2,y2),,(xN,yN)},其中 x i ∈ X = R n , y i ∈ Y = { + 1 , − 1 } x_{i} \in \mathcal{X}=\mathbf{R}^{n},y_{i} \in \mathcal{Y}=\{+1,-1\} xiX=Rn,yiY={+1,1} i = 1 , 2 , ⋯   , N i=1,2, \cdots, N i=1,2,,N x i x_i xi是第 i i i个特征向量,也称为实例, y i y_i yi x i x_i xi的类别标记

  • ( x i , y i ) (x_i,y_i) (xi,yi)为样本点,当 y = + 1 y=+1 y=+1时,称 x i x_i xi为正例;当 y = − 1 y=-1 y=1时,称 x i x_i xi为正例

  • 假设数据集是线性不可分的,训练数据集中存在一些特异点

线性不可分:

  • 对于某些点不满足约束条件: y i ( w ⋅ x i + b ) − 1 ⩾ 0 , i = 1 , 2 , ⋯   , N {y_{i}\left(w \cdot x_{i}+b\right)-1 \geqslant 0, \quad i=1,2, \cdots, N} yi(wxi+b)10,i=1,2,,N

  • 对于每个样本点引入松弛变量 ξ i ⩾ 0 \xi_{i} \geqslant 0 ξi0,约束条件变为: y i ( w ⋅ x i + b ) ⩾ 1 − ξ i y_{i}\left(w \cdot x_{i}+b\right) \geqslant 1-\xi_{i} yi(wxi+b)1ξi

  • 目标函数: 1 2 ∥ w ∥ 2 + C ∑ i = 1 N ξ i \frac{1}{2}\|w\|^{2}+C \sum_{i=1}^{N} \xi_{i} 21w2+Ci=1Nξi C > 0 C>0 C>0为惩罚参数

  • 原始最优化问题为凸二次规划问题:
    min ⁡ w , b , ξ 1 2 ∥ w ∥ 2 + C ∑ i = 1 N ξ i  s.t.  y i ( w ⋅ x i + b ) ⩾ 1 − ξ i , i = 1 , 2 , ⋯   , N ξ i ⩾ 0 , i = 1 , 2 , ⋯   , N \color{red}\begin{array}{ll}{\min _{w, b, \xi}} & {\frac{1}{2}\|w\|^{2}+C \displaystyle \sum_{i=1}^{N} \xi_{i}} \\ {\text { s.t. }} & {y_{i}\left(w \cdot x_{i}+b\right) \geqslant 1-\xi_{i}, \quad i=1,2, \cdots, N} \\ {} & {\xi_{i} \geqslant 0, \quad i=1,2, \cdots, N}\end{array} minw,b,ξ s.t. 21w2+Ci=1Nξiyi(wxi+b)1ξi,i=1,2,,Nξi0,i=1,2,,N

  • 可以证明 w w w的解是唯一的,但** b b b的解可能不是唯一的,而是存在于一个区间内**

学习的对偶算法:

原始问题的拉格朗日函数为:
L ( w , b , ξ , α , μ ) ≡ 1 2 ∥ w ∥ 2 + C ∑ i = 1 N ξ i − ∑ i = 1 N α i ( y i ( w ⋅ x i + b ) − 1 + ξ i ) − ∑ i = 1 N μ i ξ i α i ⩾ 0 , μ i ⩾ 0 L(w, b, \xi, \alpha, \mu) \equiv \frac{1}{2}\|w\|^{2}+C \sum_{i=1}^{N} \xi_{i}-\sum_{i=1}^{N} \alpha_{i}\left(y_{i}\left(w \cdot x_{i}+b\right)-1+\xi_{i}\right)-\sum_{i=1}^{N} \mu_{i} \xi_{i} \\ \alpha_{i} \geqslant 0, \mu_{i} \geqslant 0 L(w,b,ξ,α,μ)21w2+Ci=1Nξii=1Nαi(yi(wxi+b)1+ξi)i=1Nμiξiαi0,μi0
对偶问题是拉格朗日函数的极大极小问题, L ( w , b , ξ , α , μ ) L(w, b, \xi, \alpha, \mu) L(w,b,ξ,α,μ) w , b , ξ w, b, \xi w,b,ξ求导得:
∇ w L ( w , b , ξ , α , μ ) = w − ∑ i = 1 N α i y i x i = 0 ∇ b L ( w , b , ξ , α , μ ) = − ∑ i = 1 N α i y i = 0 ∇ ξ i L ( w , b , ξ , α , μ ) = C − α i − μ i = 0 \nabla_{w} L(w, b, \xi, \alpha, \mu)=w-\sum_{i=1}^{N} \alpha_{i} y_{i} x_{i}=0\\ \nabla_{b} L(w, b, \xi, \alpha, \mu)=-\sum_{i=1}^{N} \alpha_{i} y_{i}=0\\ \nabla_{\xi_{i}} L(w, b, \xi, \alpha, \mu)=C-\alpha_{i}-\mu_{i}=0 wL(w,b,ξ,α,μ)=wi=1Nαiyixi=0bL(w,b,ξ,α,μ)=i=1Nαiyi=0ξiL(w,b,ξ,α,μ)=Cαiμi=0
解得:
w = ∑ i = 1 N α i y i x i ∑ i = 1 N α i y i = 0 C − α i − μ i = 0 w=\sum_{i=1}^{N} \alpha_{i} y_{i} x_{i}\\ \sum_{i=1}^{N} \alpha_{i} y_{i}=0\\ C-\alpha_{i}-\mu_{i}=0 w=i=1Nαiyixii=1Nαiyi=0Cαiμi=0
代入 L ( w , b , ξ , α , μ ) L(w, b, \xi, \alpha, \mu) L(w,b,ξ,α,μ)得:
min ⁡ w , b , ξ L ( w , b , ξ , α , μ ) = − 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) + ∑ i = 1 N α i \min _{w, b, \xi} L(w, b, \xi, \alpha, \mu)=-\frac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j}\left(x_{i} \cdot x_{j}\right)+\sum_{i=1}^{N} \alpha_{i} w,b,ξminL(w,b,ξ,α,μ)=21i=1Nj=1Nαiαjyiyj(xixj)+i=1Nαi
对上式求极大得对偶问题:
max ⁡ α − 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) + ∑ i = 1 N α i  s.t.  ∑ i = 1 N α i y i = 0 C − α i − μ i = 0 α i ⩾ 0 μ i ⩾ 0 , i = 1 , 2 , ⋯   , N \max _{\alpha}-\frac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j}\left(x_{i} \cdot x_{j}\right)+\sum_{i=1}^{N} \alpha_{i}\\ \begin{array}{ll}{\text { s.t. }} & {\displaystyle\sum_{i=1}^{N} \alpha_{i} y_{i}=0} \\ {} & {C-\alpha_{i}-\mu_{i}=0} \\ {} & {\alpha_{i} \geqslant 0} \\ {} & {\mu_{i} \geqslant 0, \quad i=1,2, \cdots, N}\end{array} αmax21i=1Nj=1Nαiαjyiyj(xixj)+i=1Nαi s.t. i=1Nαiyi=0Cαiμi=0αi0μi0,i=1,2,,N
进一步得:
min ⁡ α 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) − ∑ i = 1 N α i  s.t.  ∑ i = 1 N α i y i = 0 0 ⩽ α i ⩽ C , i = 1 , 2 , ⋯   , N \color{red} \begin{array}{ll}{\min _{\alpha}} & {\frac{1}{2} \displaystyle \sum_{i=1}^{N} \displaystyle \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j}\left(x_{i} \cdot x_{j}\right)-\displaystyle \sum_{i=1}^{N} \alpha_{i}} \\ {\text { s.t. }} & {\displaystyle \sum_{i=1}^{N} \alpha_{i} y_{i}=0} \\ {} & {0 \leqslant \alpha_{i} \leqslant C, \quad i=1,2, \cdots, N}\end{array} minα s.t. 21i=1Nj=1Nαiαjyiyj(xixj)i=1Nαii=1Nαiyi=00αiC,i=1,2,,N
求解对偶问题:

α ∗ = ( α 1 ∗ , α 2 ∗ , ⋯   , α N ∗ ) T \alpha^{*}=\left(\alpha_{1}^{*}, \alpha_{2}^{*}, \cdots, \alpha_{N}^{*}\right)^{\mathrm{T}} α=(α1,α2,,αN)T是对偶问题的一个解,若存在 α ∗ \alpha^{*} α的一个分量 α j \alpha_{j} αj 0 < α j ∗ < C 0<\alpha_{j}^{*}<C 0<αj<C,则原始问题的解 w ∗ , b ∗ w^{*}, b^{*} w,b可按下式求得:
w ∗ = ∑ i = 1 N α i ∗ y i x i b ∗ = y j − ∑ i = 1 N y i α i ∗ ( x i ⋅ x j ) \color{red} {w^{*}=\displaystyle \sum_{i=1}^{N} \alpha_{i}^{*} y_{i} x_{i}\\ b^{*}=y_{j}-\displaystyle \sum_{i=1}^{N} y_{i} \alpha_{i}^{*}\left(x_{i} \cdot x_{j}\right)} w=i=1Nαiyixib=yji=1Nyiαi(xixj)
由此得分离超平面为:
∑ i = 1 N α i ∗ y i ( x ⋅ x i ) + b ∗ = 0 \color{red} \sum_{i=1}^{N} \alpha_{i}^{*} y_{i}\left(x \cdot x_{i}\right)+b^{*}=0 i=1Nαiyi(xxi)+b=0
分类决策函数:
f ( x ) = sign ⁡ ( ∑ i = 1 N α i ∗ y i ( x ⋅ x i ) + b ∗ ) \color{red} f(x)=\operatorname{sign}\left(\sum_{i=1}^{N} \alpha_{i}^{*} y_{i}\left(x \cdot x_{i}\right)+b^{*}\right) f(x)=sign(i=1Nαiyi(xxi)+b)
支持向量:

训练数据集中对应于 α i ∗ > 0 \alpha_{i}^{*}>0 αi>0的样本点 ( x i , y i ) \left(x_{i}, y_{i}\right) (xi,yi)的实例点 x i ∈ R n x_{i} \in \mathbf{R}^{n} xiRn称为支持向量,实例点 x i x_{i} xi到分类间隔的距离为 ξ i ∥ w ∥ \frac{\xi_{i}}{\|w\|} wξi

  • α i ∗ < C \alpha_{i}^{*}<C αi<C,则 ξ i = 0 \xi_{i}=0 ξi=0,支持向量 x i x_{i} xi恰好在间隔边界上
  • α i ∗ = C , 0 < ξ i < 1 \alpha_{i}^{*}=C,0<\xi_{i}<1 αi=C,0<ξi<1,分类正确,支持向量 x i x_{i} xi在间隔边界和分类超平面之间
  • α i ∗ = C , ξ i = 1 \alpha_{i}^{*}=C,\xi_i=1 αi=C,ξi=1,支持向量 x i x_{i} xi在分类超平面上
  • α i ∗ = C , ξ i > 1 \alpha_{i}^{*}=C,\xi_i>1 αi=C,ξi>1,分类错误,支持向量 x i x_{i} xi在分类超平面误分一侧

合页损失函数:

线性支持向量机的原始最优化问题
min ⁡ w , b , ξ 1 2 ∥ w ∥ 2 + C ∑ i = 1 N ξ i  s.t.  y i ( w ⋅ x i + b ) ⩾ 1 − ξ i , i = 1 , 2 , ⋯   , N ξ i ⩾ 0 , i = 1 , 2 , ⋯   , N \begin{array}{ll}{\min _{w, b, \xi}} & {\frac{1}{2}\|w\|^{2}+C \displaystyle \sum_{i=1}^{N} \xi_{i}} \\ {\text { s.t. }} & {y_{i}\left(w \cdot x_{i}+b\right) \geqslant 1-\xi_{i}, \quad i=1,2, \cdots, N} \\ {} & {\xi_{i} \geqslant 0, \quad i=1,2, \cdots, N}\end{array} minw,b,ξ s.t. 21w2+Ci=1Nξiyi(wxi+b)1ξi,i=1,2,,Nξi0,i=1,2,,N
等价于最优化问题:
min ⁡ w , b ∑ i = 1 N [ 1 − y i ( w ⋅ x i + b ) ] + + λ ∥ w ∥ 2 \min _{w, b} \sum_{i=1}^{N}\left[1-y_{i}\left(w \cdot x_{i}+b\right)\right]_{+}+\lambda\|w\|^{2} w,bmini=1N[1yi(wxi+b)]++λw2
y i ( w ⋅ x i + b ) y_{i}\left(w \cdot x_{i}+b\right) yi(wxi+b)是函数间隔(确信度), λ \lambda λ w w w L 2 L_2 L2范数

函数 L ( y ( w ⋅ x + b ) ) = [ 1 − y ( w ⋅ x + b ) ] + L(y(w \cdot x+b))=[1-y(w \cdot x+b)]_{+} L(y(wx+b))=[1y(wx+b)]+合页损失函数,下标“+”表示取正值的函数:
[ z ] + = { z , z > 0 0 , z ⩽ 0 [z]_{+}=\left\{\begin{array}{ll}{z,} & {z>0} \\ {0,} & {z \leqslant 0}\end{array}\right. [z]+={z,0,z>0z0
合页损失函数的图形为:

在这里插入图片描述

合页损失函数对 y i ( w ⋅ x i + b ) y_{i}\left(w \cdot x_{i}+b\right) yi(wxi+b)的要求更高,不仅要分类正确

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值