【统计学习方法】第7章 支持向量机

支持向量机(support vector machines,SVM)是一种二类分类模型。它的基本模型是定义在特征空间上的间隔最大的线性分类器,间隔最大使它有别于感知机;支持向量机还包括核技巧,这使它成为实质上的非线性分类器。支持向量机的学习策略就是间隔最大化,可形式化为一个求解凸二次规划(convex quadratic programming)的问题,也等价于正则化的合页损失函数的最小化问题。支持向量机的学习算法是求解凸二次规划的最优化算法。

1、线性可分支持向量机与硬间隔最大化

线性可分支持向量机

假设给定一个特征空间上的训练数据集 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , ⋯   , ( x N , y N ) } \begin{aligned} & T = \left\{ \left( x_{1}, y_{1} \right), \left( x_{2}, y_{2} \right), \cdots, \left( x_{N}, y_{N} \right) \right\} \end{aligned} T={(x1,y1),(x2,y2),,(xN,yN)}

其中, x i ∈ X = R n , y i ∈ Y = { + 1 , − 1 } , i = 1 , 2 , ⋯   , N x_{i} \in \mathcal{X} = R^{n}, y_{i} \in \mathcal{Y} = \left\{ +1, -1 \right\}, i = 1, 2, \cdots, N xiX=Rn,yiY={+1,1},i=1,2,,N x i x_{i} xi为第 i i i个特征向量(实例), y i y_{i} yi为第 x i x_{i} xi的类标记,当 y i = + 1 y_{i}=+1 yi=+1时,称 x i x_{i} xi为正例;当 y i = − 1 y_{i}= -1 yi=1时,称 x i x_{i} xi为负例, ( x i , y i ) \left( x_{i}, y_{i} \right) (xi,yi)称为样本点。

分离超平面对应于方程 w . x + b = 0 w.x+b=0 w.x+b=0,它由法向量 w w w和截距 b b b决定,可用 ( w , b ) (w, b) (w,b)来表示。

线性可分支持向量机(硬间隔支持向量机):给定线性可分训练数据集,通过间隔最大化或等价地求解相应地凸二次规划问题学习得到分离超平面为 w ∗ ⋅ x + b ∗ = 0 \begin{aligned} & w^{*} \cdot x + b^{*} = 0 \end{aligned} wx+b=0

以及相应的分类决策函数 f ( x ) = s i g n ( w ∗ ⋅ x + b ∗ ) \begin{aligned} & f \left( x \right) = sign \left( w^{*} \cdot x + b^{*} \right) \end{aligned} f(x)=sign(wx+b)

称为线型可分支持向量机。

函数间隔和几何间隔

对于给定的训练数据集T和超平面 ( w , b ) (w, b) (w,b),定义超平面 ( w , b ) \left( w, b \right) (w,b)关于样本点 ( x i , y i ) \left( x_{i}, y_{i} \right) (xi,yi)函数间隔 γ ^ i = y i ( w ⋅ x i + b ) \begin{aligned} & \hat \gamma_{i} = y_{i} \left( w \cdot x_{i} + b \right) \end{aligned} γ^i=yi(wxi+b)

超平面 ( w , b ) \left( w, b \right) (w,b)关于训练集 T T T的函数间隔 γ ^ = min ⁡ i = 1 , 2 , ⋯   , N γ ^ i \begin{aligned} & \hat \gamma = \min_{i = 1, 2, \cdots, N} \hat \gamma_{i} \end{aligned} γ^=i=1,2,,Nminγ^i

即超平面 ( w , b ) \left( w, b \right) (w,b)关于训练集 T T T中所有样本点 ( x i , y i ) \left( x_{i}, y_{i} \right) (xi,yi)的函数间隔的最小值。

超平面 ( w , b ) \left( w, b \right) (w,b)关于样本点 ( x i , y i ) \left( x_{i}, y_{i} \right) (xi,yi)几何间隔 γ i = y i ( w ∥ w ∥ ⋅ x i + b ∥ w ∥ ) \begin{aligned} & \gamma_{i} = y_{i} \left( \dfrac{w}{\| w \|} \cdot x_{i} + \dfrac{b}{\| w \|} \right) \end{aligned} γi=yi(wwxi+wb)

超平面 ( w , b ) \left( w, b \right) (w,b)关于训练集 T T T的几何间隔 γ = min ⁡ i = 1 , 2 , ⋯   , N γ i \begin{aligned} & \gamma = \min_{i = 1, 2, \cdots, N} \gamma_{i} \end{aligned} γ=i=1,2,,Nminγi

即超平面 ( w , b ) \left( w, b \right) (w,b)关于训练集 T T T中所有样本点 ( x i , y i ) \left( x_{i}, y_{i} \right) (xi,yi)的几何间隔的最小值。

函数间隔和几何间隔的关系 γ i = γ ^ i ∥ w ∥ γ = γ ^ ∥ w ∥ \begin{aligned} & \gamma_{i} = \dfrac{\hat \gamma_{i}}{\| w \|} \\& \gamma = \dfrac{\hat \gamma}{\| w \|} \end{aligned} γi=wγ^iγ=wγ^

间隔最大化

最大间隔分离超平面等价为求解 max ⁡ w , b γ s . t . y i ( w ∥ w ∥ ⋅ x i + b ∥ w ∥ ) ≥ γ , i = 1 , 2 , ⋯   , N \begin{aligned} & \max_{w,b} \quad \gamma \\ & s.t. \quad y_{i} \left( \dfrac{w}{\| w \|} \cdot x_{i} + \dfrac{b}{\| w \|} \right) \geq \gamma, \quad i=1,2, \cdots, N \end{aligned} w,bmaxγs.t.yi(wwxi+wb)γ,i=1,2,,N

等价的 max ⁡ w , b γ ^ ∥ w ∥ s . t . y i ( w ⋅ x i + b ) ≥ γ ^ , i = 1 , 2 , ⋯   , N \begin{aligned} & \max_{w,b} \quad \dfrac{\hat \gamma}{\| w \|} \\ & s.t. \quad y_{i} \left( w \cdot x_{i} + b \right) \geq \hat \gamma, \quad i=1,2, \cdots, N \end{aligned} w,bmaxwγ^s.t.yi(wxi+b)γ^,i=1,2,,N

γ ^ = 1 \hat \gamma = 1 γ^=1,将其入上面的最优化问题,注意到最大化 1 ∥ w ∥ \dfrac{1}{\| w \|} w1和最小化 1 2 ∥ w ∥ 2 \dfrac{1}{2} \| w \|^{2} 21w2是等价的,

等价的 min ⁡ w , b 1 2 ∥ w ∥ 2 s . t . y i ( w ⋅ x i + b ) − 1 ≥ 0 , i = 1 , 2 , ⋯   , N \begin{aligned} & \min_{w,b} \quad \dfrac{1}{2} \| w \|^{2} \\ & s.t. \quad y_{i} \left( w \cdot x_{i} + b \right) -1 \geq 0, \quad i=1,2, \cdots, N \end{aligned} w,bmin21w2s.t.yi(wxi+b)10,i=1,2,,N

线性可分支持向量机学习算法(最大间隔法)

  • 输入:线性可分训练数据集 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , ⋯   , ( x N , y N ) } T = \left\{ \left( x_{1}, y_{1} \right), \left( x_{2}, y_{2} \right), \cdots, \left( x_{N}, y_{N} \right) \right\} T={(x1,y1),(x2,y2),,(xN,yN)},其中 x i ∈ X = R n , y i ∈ Y = { + 1 , − 1 } , i = 1 , 2 , ⋯   , N x_{i} \in \mathcal{X} = R^{n}, y_{i} \in \mathcal{Y} = \left\{ +1, -1 \right\}, i = 1, 2, \cdots, N xiX=Rn,yiY={+1,1},i=1,2,,N
  • 输出:最大间隔分离超平面和分类决策函数
  1. 构建并求解约束最优化问题 min ⁡ w , b 1 2 ∥ w ∥ 2 s . t . y i ( w ⋅ x i + b ) − 1 ≥ 0 , i = 1 , 2 , ⋯   , N \begin{aligned} \\ & \min_{w,b} \quad \dfrac{1}{2} \| w \|^{2} \\ & s.t. \quad y_{i} \left( w \cdot x_{i} + b \right) -1 \geq 0, \quad i=1,2, \cdots, N \end{aligned} w,bmin21w2s.t.yi(wxi+b)10,i=1,2,,N
    求得最优解 w ∗ , b ∗ w^{*}, b^{*} w,b
  2. 得到分离超平面 w ∗ ⋅ x + b ∗ = 0 \begin{aligned} & w^{*} \cdot x + b^{*} = 0 \end{aligned} wx+b=0

以及分类决策函数
f ( x ) = s i g n ( w ∗ ⋅ x + b ∗ ) \begin{aligned} & f \left( x \right) = sign \left( w^{*} \cdot x + b^{*} \right) \end{aligned} f(x)=sign(wx+b)

支持向量和间隔边界

(硬间隔)支持向量:训练数据集的样本点中与分离超平面距离最近的样本点的实例,即使约束条件等号成立的样本点 y i ( w ⋅ x i + b ) − 1 = 0 \begin{aligned} & y_{i} \left( w \cdot x_{i} + b \right) -1 = 0 \end{aligned} yi(wxi+b)1=0

在这里插入图片描述

y i = + 1 y_{i} = +1 yi=+1的正例点,支持向量在超平面
H 1 : w ⋅ x + b = 1 \begin{aligned} & H_{1}:w \cdot x + b = 1 \end{aligned} H1:wx+b=1

y i = − 1 y_{i} = -1 yi=1的正例点,支持向量在超平面
H 1 : w ⋅ x + b = − 1 \begin{aligned} & H_{1}:w \cdot x + b = -1 \end{aligned} H1:wx+b=1

H 1 H_{1} H1 H 2 H_{2} H2称为间隔边界, H 1 H_{1} H1 H 2 H_{2} H2上的点就是支持向量。

H 1 H_{1} H1 H 2 H_{2} H2之间的距离称为间隔,且 ∣ H 1 H 2 ∣ = 1 ∥ w ∥ + 1 ∥ w ∥ = 2 ∥ w ∥ |H_{1}H_{2}| = \dfrac{1}{\| w \|} + \dfrac{1}{\| w \|} = \dfrac{2}{\| w \|} H1H2=w1+w1=w2

2、线性支持向量机与软间隔最大化

线性支持向量机

线性支持向量机(软间隔支持向量机):给定线性不可分训练数据集,通过求解凸二次规划问题
min ⁡ w , b , ξ 1 2 ∥ w ∥ 2 + C ∑ i = 1 N ξ i s . t . y i ( w ⋅ x i + b ) ≥ 1 − ξ i ξ i ≥ 0 , i = 1 , 2 , ⋯   , N \begin{aligned} & \min_{w,b,\xi} \quad \dfrac{1}{2} \| w \|^{2} + C \sum_{i=1}^{N} \xi_{i} \\ & s.t. \quad y_{i} \left( w \cdot x_{i} + b \right) \geq 1 - \xi_{i} \\ & \xi_{i} \geq 0, \quad i=1,2, \cdots, N \end{aligned} w,b,ξmin21w2+Ci=1Nξis.t.yi(wxi+b)1ξiξi0,i=1,2,,N

学习得到分离超平面为 w ∗ ⋅ x + b ∗ = 0 \begin{aligned} & w^{*} \cdot x + b^{*} = 0 \end{aligned} wx+b=0

以及相应的分类决策函数 f ( x ) = s i g n ( w ∗ ⋅ x + b ∗ ) \begin{aligned} & f \left( x \right) = sign \left( w^{*} \cdot x + b^{*} \right) \end{aligned} f(x)=sign(wx+b)

称为线型支持向量机。

最优化问题的求解

  1. 引入拉格朗日乘子 α i ≥ 0 , i = 1 , 2 , ⋯   , N \alpha_{i} \geq 0, i = 1, 2, \cdots, N αi0,i=1,2,,N构建拉格朗日函数 L ( w , b , α ) = 1 2 ∥ w ∥ 2 + ∑ i = 1 N α i [ − y i ( w ⋅ x i + b ) + 1 ] = 1 2 ∥ w ∥ 2 − ∑ i = 1 N α i y i ( w ⋅ x i + b ) + ∑ i = 1 N α i \begin{aligned} & L \left( w, b, \alpha \right) = \dfrac{1}{2} \| w \|^{2} + \sum_{i=1}^{N} \alpha_{i} \left[- y_{i} \left( w \cdot x_{i} + b \right) + 1 \right] \\ & = \dfrac{1}{2} \| w \|^{2} - \sum_{i=1}^{N} \alpha_{i} y_{i} \left( w \cdot x_{i} + b \right) + \sum_{i=1}^{N} \alpha_{i} \end{aligned} L(w,b,α)=21w2+i=1Nαi[yi(wxi+b)+1]=21w2i=1Nαiyi(wxi+b)+i=1Nαi
    其中, α = ( α 1 , α 2 , ⋯   , α N ) T \alpha = \left( \alpha_{1}, \alpha_{2}, \cdots, \alpha_{N} \right)^{T} α=(α1,α2,,αN)T为拉格朗日乘子向量。

  2. min ⁡ w , b L ( w , b , α ) \min_{w,b}L \left( w, b, \alpha \right) minw,bL(w,b,α): ∇ w L ( w , b , α ) = w − ∑ i = 1 N α i y i x i = 0 ∇ b L ( w , b , α ) = − ∑ i = 1 N α i y i = 0 \begin{aligned} & \nabla _{w} L \left( w, b, \alpha \right) = w - \sum_{i=1}^{N} \alpha_{i} y_{i} x_{i} = 0 \\ & \nabla _{b} L \left( w, b, \alpha \right) = -\sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \end{aligned} wL(w,b,α)=wi=1Nαiyixi=0bL(w,b,α)=i=1Nαiyi=0


    w = ∑ i = 1 N α i y i x i ∑ i = 1 N α i y i = 0 \begin{aligned} & w = \sum_{i=1}^{N} \alpha_{i} y_{i} x_{i} \\ & \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \end{aligned} wi=1Nαiyixii=1Nαiyi=0

    代入拉格朗日函数,得 L ( w , b , α ) = 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) − ∑ i = 1 N α i y i [ ( ∑ j = 1 N α j y j x j ) ⋅ x i + b ] + ∑ i = 1 N α i = − 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) − ∑ i = 1 N α i y i b + ∑ i = 1 N α i = − 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) + ∑ i = 1 N α i \begin{aligned} \\ & L \left( w, b, \alpha \right) = \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} y_{i} \left[ \left( \sum_{j=1}^{N} \alpha_{j} y_{j} x_{j} \right) \cdot x_{i} + b \right] + \sum_{i=1}^{N} \alpha_{i} \\ & = - \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} y_{i} b + \sum_{i=1}^{N} \alpha_{i} \\ & = - \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) + \sum_{i=1}^{N} \alpha_{i} \end{aligned} L(w,b,α)=21i=1Nj=1Nαiαjyiyj(xixj)i=1Nαiyi[(j=1Nαjyjxj)xi+b]+i=1Nαi=21i=1Nj=1Nαiαjyiyj(xixj)i=1Nαiyib+i=1Nαi=21i=1Nj=1Nαiαjyiyj(xixj)+i=1Nαi
    min ⁡ w , b L ( w , b , α ) = − 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) + ∑ i = 1 N α i \begin{aligned} \\ & \min_{w,b}L \left( w, b, \alpha \right) = - \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) + \sum_{i=1}^{N} \alpha_{i} \end{aligned} w,bminL(w,b,α)=21i=1Nj=1Nαiαjyiyj(xixj)+i=1Nαi

  3. max ⁡ α min ⁡ w , b L ( w , b , α ) \max_{\alpha} \min_{w,b}L \left( w, b, \alpha \right) maxαminw,bL(w,b,α): max ⁡ α − 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) + ∑ i = 1 N α i s . t . ∑ i = 1 N α i y i = 0 α i ≥ 0 , i = 1 , 2 , ⋯   , N \begin{aligned} \\ & \max_{\alpha} - \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) + \sum_{i=1}^{N} \alpha_{i} \\ & s.t. \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & \alpha_{i} \geq 0, \quad i=1,2, \cdots, N \end{aligned} αmax21i=1Nj=1Nαiαjyiyj(xixj)+i=1Nαis.t.i=1Nαiyi=0αi0,i=1,2,,N
    等价的 min ⁡ α 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) − ∑ i = 1 N α i s . t . ∑ i = 1 N α i y i = 0 α i ≥ 0 , i = 1 , 2 , ⋯   , N \begin{aligned} \\ & \min_{\alpha} \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} \\ & s.t. \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & \alpha_{i} \geq 0, \quad i=1,2, \cdots, N \end{aligned} αmin21i=1Nj=1Nαiαjyiyj(xixj)i=1Nαis.t.i=1Nαiyi=0αi0,i=1,2,,N

线性可分支持向量机(硬间隔支持向量机)学习算法:

  • 输入:线性可分训练数据集 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , ⋯   , ( x N , y N ) } T = \left\{ \left( x_{1}, y_{1} \right), \left( x_{2}, y_{2} \right), \cdots, \left( x_{N}, y_{N} \right) \right\} T={(x1,y1),(x2,y2),,(xN,yN)},其中 x i ∈ X = R n , y i ∈ Y = { + 1 , − 1 } , i = 1 , 2 , ⋯   , N x_{i} \in \mathcal{X} = R^{n}, y_{i} \in \mathcal{Y} = \left\{ +1, -1 \right\}, i = 1, 2, \cdots, N xiX=Rn,yiY={+1,1},i=1,2,,N
  • 输出:最大间隔分离超平面和分类决策函数
  1. 构建并求解约束最优化问题 min ⁡ α 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) − ∑ i = 1 N α i s . t . ∑ i = 1 N α i y i = 0 α i ≥ 0 , i = 1 , 2 , ⋯   , N \begin{aligned} & \min_{\alpha} \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} \\ & s.t. \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & \alpha_{i} \geq 0, \quad i=1,2, \cdots, N \end{aligned} αmin21i=1Nj=1Nαiαjyiyj(xixj)i=1Nαis.t.i=1Nαiyi=0αi0,i=1,2,,N

    求得最优解 α ∗ = ( α 1 ∗ , α 1 ∗ , ⋯   , α N ∗ ) \alpha^{*} = \left( \alpha_{1}^{*}, \alpha_{1}^{*}, \cdots, \alpha_{N}^{*} \right) α=(α1,α1,,αN)

  2. 计算 w ∗ = ∑ i = 1 N α i ∗ y i x i \begin{aligned} & w^{*} = \sum_{i=1}^{N} \alpha_{i}^{*} y_{i} x_{i} \end{aligned} w=i=1Nαiyixi

    并选择 α ∗ \alpha^{*} α的一个正分量 α j ∗ > 0 \alpha_{j}^{*} \gt 0 αj>0,计算 b ∗ = y j − ∑ i = 1 N α i ∗ y i ( x i ⋅ x j ) \begin{aligned} & b^{*} = y_{j} - \sum_{i=1}^{N} \alpha_{i}^{*} y_{i} \left( x_{i} \cdot x_{j} \right) \end{aligned} b=yji=1Nαiyi(xixj)

  3. 得到分离超平面 w ∗ ⋅ x + b ∗ = 0 \begin{aligned} & w^{*} \cdot x + b^{*} = 0 \end{aligned} wx+b=0

    以及分类决策函数
    f ( x ) = s i g n ( w ∗ ⋅ x + b ∗ ) \begin{aligned} & f \left( x \right) = sign \left( w^{*} \cdot x + b^{*} \right) \end{aligned} f(x)=sign(wx+b)

最优化问题的求解

  1. 引入拉格朗日乘子 α i ≥ 0 , μ i ≥ 0 , i = 1 , 2 , ⋯   , N \alpha_{i} \geq 0, \mu_{i} \geq 0, i = 1, 2, \cdots, N αi0,μi0,i=1,2,,N构建拉格朗日函数 L ( w , b , ξ , α , μ ) = 1 2 ∥ w ∥ 2 + C ∑ i = 1 N ξ i + ∑ i = 1 N α i [ − y i ( w ⋅ x i + b ) + 1 − ξ i ] + ∑ i = 1 N μ i ( − ξ i ) = 1 2 ∥ w ∥ 2 + C ∑ i = 1 N ξ i − ∑ i = 1 N α i [ y i ( w ⋅ x i + b ) − 1 + ξ i ] − ∑ i = 1 N μ i ξ i \begin{aligned} & L \left( w, b, \xi, \alpha, \mu \right) = \dfrac{1}{2} \| w \|^{2} + C \sum_{i=1}^{N} \xi_{i} + \sum_{i=1}^{N} \alpha_{i} \left[- y_{i} \left( w \cdot x_{i} + b \right) + 1 - \xi_{i} \right] + \sum_{i=1}^{N} \mu_{i} \left( -\xi_{i} \right) \\ & = \dfrac{1}{2} \| w \|^{2} + C \sum_{i=1}^{N} \xi_{i} - \sum_{i=1}^{N} \alpha_{i} \left[ y_{i} \left( w \cdot x_{i} + b \right) -1 + \xi_{i} \right] - \sum_{i=1}^{N} \mu_{i} \xi_{i} \end{aligned} L(w,b,ξ,α,μ)=21w2+Ci=1Nξi+i=1Nαi[yi(wxi+b)+1ξi]+i=1Nμi(ξi)=21w2+Ci=1Nξii=1Nαi[yi(wxi+b)1+ξi]i=1Nμiξi

    其中, α = ( α 1 , α 2 , ⋯   , α N ) T \alpha = \left( \alpha_{1}, \alpha_{2}, \cdots, \alpha_{N} \right)^{T} α=(α1,α2,,αN)T以及 μ = ( μ 1 , μ 2 , ⋯   , μ N ) T \mu = \left( \mu_{1}, \mu_{2}, \cdots, \mu_{N} \right)^{T} μ=(μ1,μ2,,μN)T为拉格朗日乘子向量。

  2. min ⁡ w , b L ( w , b , ξ , α , μ ) \min_{w,b}L \left( w, b, \xi, \alpha, \mu \right) minw,bL(w,b,ξ,α,μ): ∇ w L ( w , b , ξ , α , μ ) = w − ∑ i = 1 N α i y i x i = 0 ∇ b L ( w , b , ξ , α , μ ) = − ∑ i = 1 N α i y i = 0 ∇ ξ i L ( w , b , ξ , α , μ ) = C − α i − μ i = 0 \begin{aligned} & \nabla_{w} L \left( w, b, \xi, \alpha, \mu \right) = w - \sum_{i=1}^{N} \alpha_{i} y_{i} x_{i} = 0 \\ & \nabla_{b} L \left( w, b, \xi, \alpha, \mu \right) = -\sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & \nabla_{\xi_{i}} L \left( w, b, \xi, \alpha, \mu \right) = C - \alpha_{i} - \mu_{i} = 0 \end{aligned} wL(w,b,ξ,α,μ)=wi=1Nαiyixi=0bL(w,b,ξ,α,μ)=i=1Nαiyi=0ξiL(w,b,ξ,α,μ)=Cαiμi=0


    w = ∑ i = 1 N α i y i x i ∑ i = 1 N α i y i = 0 C − α i − μ i = 0 \begin{aligned} & w = \sum_{i=1}^{N} \alpha_{i} y_{i} x_{i} \\ & \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & C - \alpha_{i} - \mu_{i} = 0\end{aligned} wi=1Nαiyixii=1Nαiyi=0Cαiμi=0

    代入拉格朗日函数,得 L ( w , b , ξ , α , μ ) = 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) + C ∑ i = 1 N ξ i − ∑ i = 1 N α i y i [ ( ∑ j = 1 N α j y j x j ) ⋅ x i + b ] + ∑ i = 1 N α i − ∑ i = 1 N α i ξ i − ∑ i N μ i ξ i = − 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) − ∑ i = 1 N α i y i b + ∑ i = 1 N α i + ∑ i = 1 N ξ i ( C − α i − μ i ) = − 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) + ∑ i = 1 N α i \begin{aligned} & L \left( w, b, \xi, \alpha, \mu \right) = \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) + C \sum_{i=1}^{N} \xi_{i} - \sum_{i=1}^{N} \alpha_{i} y_{i} \left[ \left( \sum_{j=1}^{N} \alpha_{j} y_{j} x_{j} \right) \cdot x_{i} + b \right] \\ & \quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad + \sum_{i=1}^{N} \alpha_{i} - \sum_{i=1}^{N} \alpha_{i} \xi_{i} - \sum_{i}^{N} \mu_{i} \xi_{i} \\ & = - \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} y_{i} b + \sum_{i=1}^{N} \alpha_{i} + \sum_{i=1}^{N} \xi_{i} \left( C - \alpha_{i} - \mu_{i} \right) \\ & = - \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) + \sum_{i=1}^{N} \alpha_{i} \end{aligned} L(w,b,ξ,α,μ)=21i=1Nj=1Nαiαjyiyj(xixj)+Ci=1Nξii=1Nαiyi[(j=1Nαjyjxj)xi+b]+i=1Nαii=1NαiξiiNμiξi=21i=1Nj=1Nαiαjyiyj(xixj)i=1Nαiyib+i=1Nαi+i=1Nξi(Cαiμi)=21i=1Nj=1Nαiαjyiyj(xixj)+i=1Nαi

    min ⁡ w , b , ξ L ( w , b , ξ , α , μ ) = − 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) + ∑ i = 1 N α i \begin{aligned} & \min_{w,b,\xi}L \left( w, b, \xi, \alpha, \mu \right) = - \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) + \sum_{i=1}^{N} \alpha_{i} \end{aligned} w,b,ξminL(w,b,ξ,α,μ)=21i=1Nj=1Nαiαjyiyj(xixj)+i=1Nαi

  3. max ⁡ α min ⁡ w , b , ξ L ( w , b , ξ , α , μ ) \max_{\alpha} \min_{w,b, \xi}L \left( w, b, \xi, \alpha, \mu \right) maxαminw,b,ξL(w,b,ξ,α,μ): max ⁡ α − 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) + ∑ i = 1 N α i s . t . ∑ i = 1 N α i y i = 0 C − α i − μ i = 0 α i ≥ 0 μ i ≥ 0 , i = 1 , 2 , ⋯   , N \begin{aligned} & \max_{\alpha} - \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) + \sum_{i=1}^{N} \alpha_{i} \\ & s.t. \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & C - \alpha_{i} - \mu_{i} = 0 \\ & \alpha_{i} \geq 0 \\ & \mu_{i} \geq 0, \quad i=1,2, \cdots, N \end{aligned} αmax21i=1Nj=1Nαiαjyiyj(xixj)+i=1Nαis.t.i=1Nαiyi=0Cαiμi=0αi0μi0,i=1,2,,N

等价的 min ⁡ α 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) − ∑ i = 1 N α i s . t . ∑ i = 1 N α i y i = 0 0 ≤ α i ≤ C , i = 1 , 2 , ⋯   , N \begin{aligned} & \min_{\alpha} \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} \\ & s.t. \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & 0 \leq \alpha_{i} \leq C , \quad i=1,2, \cdots, N \end{aligned} αmin21i=1Nj=1Nαiαjyiyj(xixj)i=1Nαis.t.i=1Nαiyi=00αiC,i=1,2,,N

线性支持向量机(软间隔支持向量机)学习算法

  • 输入:训练数据集 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , ⋯   , ( x N , y N ) } T = \left\{ \left( x_{1}, y_{1} \right), \left( x_{2}, y_{2} \right), \cdots, \left( x_{N}, y_{N} \right) \right\} T={(x1,y1),(x2,y2),,(xN,yN)},其中 x i ∈ X = R n , y i ∈ Y = { + 1 , − 1 } , i = 1 , 2 , ⋯   , N x_{i} \in \mathcal{X} = R^{n}, y_{i} \in \mathcal{Y} = \left\{ +1, -1 \right\}, i = 1, 2, \cdots, N xiX=Rn,yiY={+1,1},i=1,2,,N
  • 输出:最大间隔分离超平面和分类决策函数
  1. 选择惩罚参数 C ≥ 0 C \geq 0 C0,构建并求解约束最优化问题 min ⁡ α 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) − ∑ i = 1 N α i s . t . ∑ i = 1 N α i y i = 0 0 ≤ α i ≤ C , i = 1 , 2 , ⋯   , N \begin{aligned} & \min_{\alpha} \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} \\ & s.t. \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & 0 \leq \alpha_{i} \leq C , \quad i=1,2, \cdots, N \end{aligned} αmin21i=1Nj=1Nαiαjyiyj(xixj)i=1Nαis.t.i=1Nαiyi=00αiC,i=1,2,,N

    求得最优解 α ∗ = ( α 1 ∗ , α 1 ∗ , ⋯   , α N ∗ ) \alpha^{*} = \left( \alpha_{1}^{*}, \alpha_{1}^{*}, \cdots, \alpha_{N}^{*} \right) α=(α1,α1,,αN)

  2. 计算 w ∗ = ∑ i = 1 N α i ∗ y i x i \begin{aligned} & w^{*} = \sum_{i=1}^{N} \alpha_{i}^{*} y_{i} x_{i} \end{aligned} w=i=1Nαiyixi

    并选择 α ∗ \alpha^{*} α的一个分量 0 < α j ∗ < C 0 \lt \alpha_{j}^{*} \lt C 0<αj<C,计算 b ∗ = y j − ∑ i = 1 N α i ∗ y i ( x i ⋅ x j ) \begin{aligned} & b^{*} = y_{j} - \sum_{i=1}^{N} \alpha_{i}^{*} y_{i} \left( x_{i} \cdot x_{j} \right) \end{aligned} b=yji=1Nαiyi(xixj)

  3. 得到分离超平面 w ∗ ⋅ x + b ∗ = 0 \begin{aligned} & w^{*} \cdot x + b^{*} = 0 \end{aligned} wx+b=0

    以及分类决策函数
    f ( x ) = s i g n ( w ∗ ⋅ x + b ∗ ) \begin{aligned} & f \left( x \right) = sign \left( w^{*} \cdot x + b^{*} \right) \end{aligned} f(x)=sign(wx+b)

支持向量

(软间隔)支持向量:线性不可分情况下,最优化问题的解 α ∗ = ( α 1 ∗ , α 2 ∗ , ⋯   , α N ∗ ) T \alpha^{*} = \left( \alpha_{1}^{*}, \alpha_{2}^{*}, \cdots, \alpha_{N}^{*} \right)^{T} α=(α1,α2,,αN)T中对应于 α i ∗ > 0 \alpha_{i}^{*} \gt 0 αi>0的样本点 ( x i , y i ) \left( x_{i}, y_{i} \right) (xi,yi)的实例 x i x_{i} xi

在这里插入图片描述

实例 x i x_{i} xi的几何间隔 γ i = y i ( w ⋅ x i + b ) ∥ w ∥ = ∣ 1 − ξ i ∣ ∥ w ∥ \begin{aligned} & \gamma_{i} = \dfrac{y_{i} \left( w \cdot x_{i} + b \right)}{ \| w \|} = \dfrac{| 1 - \xi_{i} |}{\| w \|} \end{aligned} γi=wyi(wxi+b)=w1ξi

1 2 ∣ H 1 H 2 ∣ = 1 ∥ w ∥ \dfrac{1}{2} | H_{1}H_{2} | = \dfrac{1}{\| w \|} 21H1H2=w1

则实例 x i x_{i} xi到间隔边界的距离 ∣ γ i − 1 ∥ w ∥ ∣ = ∣ ∣ 1 − ξ i ∣ ∥ w ∥ − 1 ∥ w ∥ ∣ = ξ i ∥ w ∥ \begin{aligned} & \left| \gamma_{i} - \dfrac{1}{\| w \|} \right| = \left| \dfrac{| 1 - \xi_{i} |}{\| w \|} - \dfrac{1}{\| w \|} \right| = \dfrac{\xi_{i}}{\| w \|}\end{aligned} γiw1=w1ξiw1=wξi

ξ i ≥ 0 ⇔ {   ξ i = 0 , x i 在 间 隔 边 界 上 ; 0 < ξ i < 1 , x i 在 间 隔 边 界 与 分 离 超 平 面 之 间 ; ξ i = 1 , x i 在 分 离 超 平 面 上 ; ξ i > 1 , x i 在 分 离 超 平 面 误 分 类 一 侧 ; \begin{aligned} \xi_{i} \geq 0 \Leftrightarrow \left\{ \begin{aligned} \ & \xi_{i}=0, x_{i}在间隔边界上; \\ & 0 \lt \xi_{i} \lt 1, x_{i}在间隔边界与分离超平面之间; \\ & \xi_{i}=1, x_{i}在分离超平面上; \\ & \xi_{i}\gt1, x_{i}在分离超平面误分类一侧; \end{aligned} \right.\end{aligned} ξi0 ξi=0,xi;0<ξi<1,xi;ξi=1,xi;ξi>1,xi;

合页损失函数

线性支持向量机(软间隔)的合页损失函数 L ( y ( w ⋅ x + b ) ) = [ 1 − y ( w ⋅ x + b ) ] + \begin{aligned} & L \left( y \left( w \cdot x + b \right) \right) = \left[ 1 - y \left(w \cdot x + b \right) \right]_{+} \end{aligned} L(y(wx+b))=[1y(wx+b)]+

在这里插入图片描述

其中,“+”为取正函数 [ z ] + = {   z , z > 0 0 , z ≤ 0 \begin{aligned} \left[ z \right]_{+} = \left\{ \begin{aligned} \ & z, z \gt 0 \\ & 0, z \leq 0 \end{aligned} \right.\end{aligned} [z]+={ z,z>00,z0

3、非线性支持向量机与核函数

核函数

X \mathcal{X} X是输入空间(欧氏空间 R n R^{n} Rn的子集或离散集合), H \mathcal{H} H是特征空间(希尔伯特空间),如果存在一个从 X \mathcal{X} X H \mathcal{H} H的映射 ϕ ( x ) : X → H \begin{aligned} & \phi \left( x \right) : \mathcal{X} \to \mathcal{H} \end{aligned} ϕ(x):XH

使得对所有 x , z ∈ X x,z \in \mathcal{X} x,zX,函数 K ( x , z ) K \left(x, z \right) K(x,z)满足条件
K ( x , z ) = ϕ ( x ) ⋅ ϕ ( z ) \begin{aligned} & K \left(x, z \right) = \phi \left( x \right) \cdot \phi \left( z \right) \end{aligned} K(x,z)=ϕ(x)ϕ(z)

则称 K ( x , z ) K \left(x, z \right) K(x,z)为核函数, ϕ ( x ) \phi \left( x \right) ϕ(x)为映射函数,式中 ϕ ( x ) ⋅ ϕ ( z ) \phi \left( x \right) \cdot \phi \left( z \right) ϕ(x)ϕ(z) ϕ ( x ) \phi \left( x \right) ϕ(x) ϕ ( z ) \phi \left( z \right) ϕ(z)的内积。

常用核函数

多项式核函数 K ( x , z ) = ( x ⋅ z + 1 ) p \begin{aligned} & K \left( x, z \right) = \left( x \cdot z + 1 \right)^{p} \end{aligned} K(x,z)=(xz+1)p

高斯核函数
K ( x , z ) = exp ⁡ ( − ∥ x − z ∥ 2 2 σ 2 ) \begin{aligned} & K \left( x, z \right) = \exp \left( - \dfrac{\| x - z \|^{2}}{2 \sigma^{2}} \right) \end{aligned} K(x,z)=exp(2σ2xz2)

非线性支持向量分类机

非线性支持向量机:从非线性分类训练集,通过核函数与软间隔最大化,学习得到分类决策函数
f ( x ) = s i g n ( ∑ i = 1 N α i ∗ y i K ( x , x i ) + b ∗ ) \begin{aligned} & f \left( x \right) = sign \left( \sum_{i=1}^{N} \alpha_{i}^{*} y_{i} K \left(x, x_{i} \right) + b^{*} \right) \end{aligned} f(x)=sign(i=1NαiyiK(x,xi)+b)

称为非线性支持向量机, K ( x , z ) K \left( x, z \right) K(x,z)是正定核函数。

在这里插入图片描述

非线性支持向量机学习算法

  • 输入:训练数据集 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , ⋯   , ( x N , y N ) } T = \left\{ \left( x_{1}, y_{1} \right), \left( x_{2}, y_{2} \right), \cdots, \left( x_{N}, y_{N} \right) \right\} T={(x1,y1),(x2,y2),,(xN,yN)},其中 x i ∈ X = R n , y i ∈ Y = { + 1 , − 1 } , i = 1 , 2 , ⋯   , N x_{i} \in \mathcal{X} = R^{n}, y_{i} \in \mathcal{Y} = \left\{ +1, -1 \right\}, i = 1, 2, \cdots, N xiX=Rn,yiY={+1,1},i=1,2,,N
  • 输出:分类决策函数
  1. 选择适当的核函数 K ( x , z ) K \left( x, z \right) K(x,z)和惩罚参数 C ≥ 0 C \geq 0 C0,构建并求解约束最优化问题 min ⁡ α 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j K ( x i , x j ) − ∑ i = 1 N α i s . t . ∑ i = 1 N α i y i = 0 0 ≤ α i ≤ C , i = 1 , 2 , ⋯   , N \begin{aligned} & \min_{\alpha} \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} K \left( x_{i}, x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} \\ & s.t. \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & 0 \leq \alpha_{i} \leq C , \quad i=1,2, \cdots, N \end{aligned} αmin21i=1Nj=1NαiαjyiyjK(xi,xj)i=1Nαis.t.i=1Nαiyi=00αiC,i=1,2,,N
    求得最优解 α ∗ = ( α 1 ∗ , α 1 ∗ , ⋯   , α N ∗ ) \alpha^{*} = \left( \alpha_{1}^{*}, \alpha_{1}^{*}, \cdots, \alpha_{N}^{*} \right) α=(α1,α1,,αN)
  2. 计算 w ∗ = ∑ i = 1 N α i ∗ y i x i \begin{aligned} \\ & w^{*} = \sum_{i=1}^{N} \alpha_{i}^{*} y_{i} x_{i} \end{aligned} w=i=1Nαiyixi
    并选择 α ∗ \alpha^{*} α的一个分量 0 < α j ∗ < C 0 \lt \alpha_{j}^{*} \lt C 0<αj<C,计算 b ∗ = y j − ∑ i = 1 N α i ∗ y i K ( x i , x j ) \begin{aligned} \\ & b^{*} = y_{j} - \sum_{i=1}^{N} \alpha_{i}^{*} y_{i} K \left( x_{i}, x_{j} \right) \end{aligned} b=yji=1NαiyiK(xi,xj)
  3. 得到分离超平面 w ∗ ⋅ x + b ∗ = 0 \begin{aligned} \\ & w^{*} \cdot x + b^{*} = 0 \end{aligned} wx+b=0
    以及分类决策函数
    f ( x ) = s i g n ( ∑ i = 1 N α i ∗ y i K ( x i , x j ) + b ∗ ) \begin{aligned} \\& f \left( x \right) = sign \left( \sum_{i=1}^{N} \alpha_{i}^{*} y_{i} K \left( x_{i}, x_{j} \right) + b^{*} \right) \end{aligned} f(x)=sign(i=1NαiyiK(xi,xj)+b)

4、序列最小最优化算法

本节讨论支持向量机学习的实现问题。我们知道,支持向量机的学习问题可以形式化为求解凸二次规划问题。这样的凸二次规划问题具有全局最优解,并且有许多最优化算法可以用于这一问题的求解。但是当训练样本容量很大时,这些算法往往变得非常低效,以致无法使用。

两个变量二次规划的求解方法

序列最小最优化(sequential minimal optimization,SMO)算法 要解如下凸二次规划的对偶问题:
min ⁡ α 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j K ( x i , x j ) − ∑ i = 1 N α i s . t . ∑ i = 1 N α i y i = 0 0 ≤ α i ≤ C , i = 1 , 2 , ⋯   , N \begin{aligned} \min_{\alpha} &\dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} K \left( x_{i}, x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} \\ s.t. & \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & 0 \leq \alpha_{i} \leq C , \quad i=1,2, \cdots, N \end{aligned} αmins.t.21i=1Nj=1NαiαjyiyjK(xi,xj)i=1Nαii=1Nαiyi=00αiC,i=1,2,,N

选择 α 1 , α 2 \alpha_{1}, \alpha_{2} α1,α2两个变量,其他变量 α i ( i = 3 , 4 , ⋯   , N ) \alpha_{i} \left( i = 3, 4, \cdots, N \right) αi(i=3,4,,N)是固定的,SMO的最优化问题的子问题
min ⁡ α 1 , α 2 W ( α 1 , α 2 ) = 1 2 K 11 α 1 2 + 1 2 K 22 α 2 2 + y 1 y 2 K 12 α 1 α 2 − ( α 1 + α 2 ) + y 1 α 1 ∑ i = 3 N y i α i K i 1 + y 2 α 2 ∑ i = 3 N y i α i K i 2 s . t . α 1 + α 2 = − ∑ i = 3 N α i y i = ς 0 ≤ α i ≤ C , i = 1 , 2 \begin{aligned} & \min_{\alpha_{1}, \alpha_{2}} W \left( \alpha_{1}, \alpha_{2} \right) = \dfrac{1}{2} K_{11} \alpha_{1}^{2} + \dfrac{1}{2} K_{22} \alpha_{2}^{2} + y_{1} y_{2} K_{12} \alpha_{1} \alpha_{2} \\ & \quad\quad\quad\quad\quad\quad - \left( \alpha_{1} + \alpha_{2} \right) + y_{1} \alpha_{1} \sum_{i=3}^{N} y_{i} \alpha_{i} K_{i1} + y_{2} \alpha_{2} \sum_{i=3}^{N} y_{i} \alpha_i K_{i2} \\ & s.t. \quad \alpha_{1} + \alpha_{2} = -\sum_{i=3}^{N} \alpha_{i} y_{i} = \varsigma \\ & 0 \leq \alpha_{i} \leq C , \quad i=1,2 \end{aligned} α1,α2minW(α1,α2)=21K11α12+21K22α22+y1y2K12α1α2(α1+α2)+y1α1i=3NyiαiKi1+y2α2i=3NyiαiKi2s.t.α1+α2=i=3Nαiyi=ς0αiC,i=1,2

其中, K i j = K ( x i , x j ) , i , j = 1 , 2 , ⋯   , N , ς K_{ij} = K \left( x_{i}, x_{j} \right), i,j = 1,2, \cdots, N, \varsigma Kij=K(xi,xj),i,j=1,2,,N,ς是常数,且省略了不含 α 1 , α 2 \alpha_{1}, \alpha_{2} α1,α2的常数项。

设凸二次规划的对偶问题的初始可行解为 α 1 o l d , α 2 o l d \alpha_{1}^{old}, \alpha_{2}^{old} α1old,α2old,最优解为 α 1 n e w , α 2 n e w \alpha_{1}^{new}, \alpha_{2}^{new} α1new,α2new,且在沿着约束方向未经剪辑时 α 2 \alpha_{2} α2的最优解为 α 2 n e w , u n c \alpha_{2}^{new,unc} α2new,unc

由于 α 2 n e w \alpha_{2}^{new} α2new需要满足 0 ≤ α i ≤ C 0 \leq \alpha_{i} \leq C 0αiC,所以最优解 α 2 n e w \alpha_{2}^{new} α2new的取值范围需满足 L ≤ α 2 n e w ≤ H \begin{aligned} & L \leq \alpha_{2}^{new} \leq H \end{aligned} Lα2newH

其中,L与H是 α 2 n e w \alpha_{2}^{new} α2new所在的对角线段断点的界。
如果 y 1 ≠ y 2 y_{1} \neq y_{2} y1=y2,则
L = max ⁡ ( 0 , α 2 o l d − α 1 o l d ) , H = min ⁡ ( C , C + α 2 o l d − α 1 o l d ) \begin{aligned} & L = \max \left( 0, \alpha_{2}^{old} - \alpha_{1}^{old} \right), H = \min \left( C, C + \alpha_{2}^{old} - \alpha_{1}^{old} \right) \end{aligned} L=max(0,α2oldα1old),H=min(C,C+α2oldα1old)

如果 y 1 = y 2 y_{1} = y_{2} y1=y2,则
L = max ⁡ ( 0 , α 2 o l d + α 1 o l d − C ) , H = min ⁡ ( C , α 2 o l d + α 1 o l d ) \begin{aligned} & L = \max \left( 0, \alpha_{2}^{old} + \alpha_{1}^{old} - C \right), H = \min \left( C, \alpha_{2}^{old} + \alpha_{1}^{old} \right) \end{aligned} L=max(0,α2old+α1oldC),H=min(C,α2old+α1old)

g ( x ) = ∑ i = 1 N α i y i K ( x i , x ) + b \begin{aligned} & g \left( x \right) = \sum_{i=1}^{N} \alpha_{i} y_{i} K \left( x_{i}, x \right) + b \end{aligned} g(x)=i=1NαiyiK(xi,x)+b

E i = g ( x i ) − y i = ( ∑ j = 1 N α j y j K ( x j , x i ) + b ) − y i , i = 1 , 2 E_{i} = g \left( x_{i} \right) - y_{i} = \left( \sum_{j=1}^{N} \alpha_{j} y_{j} K \left( x_{j}, x_{i} \right) + b \right) - y_{i}, \quad i=1,2 Ei=g(xi)yi=(j=1NαjyjK(xj,xi)+b)yi,i=1,2

最优化问题
沿着约束方向未经剪辑时的解是

α 2 n e w , u n c = α 2 o l d + y 2 ( E 1 − E 2 ) η \begin{aligned} & \alpha_{2}^{new,unc} = \alpha_{2}^{old} + \dfrac{y_{2} \left( E_{1} - E_{2} \right)}{\eta}\end{aligned} α2new,unc=α2old+ηy2(E1E2)

其中
η = K 11 + K 22 − 2 K 12 = ∥ Φ ( x 1 ) − Φ ( x 2 ) ∥ 2 \eta=K_{11}+K_{22}-2 K_{12}=\left\|\Phi\left(x_{1}\right)-\Phi\left(x_{2}\right)\right\|^{2} η=K11+K222K12=Φ(x1)Φ(x2)2

Φ ( x ) \Phi(x) Φ(x) 是输入空间到特征空间的映射, E i , i = 1 , 2 E_{i}, i=1,2 Ei,i=1,2

经剪辑后 α 2 n e w = {   H , α 2 n e w , u n c g t ; H α 2 n e w , u n c , L ≤ α 2 n e w , u n c ≤ H L , α 2 n e w , u n c l t ; L \begin{aligned} \alpha_{2}^{new} = \left\{ \begin{aligned} \ & H, \alpha_{2}^{new,unc} &gt; H \\ & \alpha_{2}^{new,unc}, L \leq \alpha_{2}^{new,unc} \leq H \\ & L, \alpha_{2}^{new,unc} &lt; L \end{aligned} \right.\end{aligned} α2new= H,α2new,uncα2new,unc,Lα2new,uncHL,α2new,uncgt;Hlt;L

由于 ς = α 1 o l d y 1 + α 2 o l d y 2 \varsigma = \alpha_{1}^{old} y_{1} + \alpha_{2}^{old} y_{2} ς=α1oldy1+α2oldy2 ς = α 1 n e w y 1 + α 2 n e w y 2 \varsigma = \alpha_{1}^{new} y_{1} + \alpha_{2}^{new} y_{2} ς=α1newy1+α2newy2
α 1 o l d y 1 + α 2 o l d y 2 = α 1 n e w y 1 + α 2 n e w y 2 α 1 n e w = α 1 o l d + y 1 y 2 ( α 2 o l d − α 2 n e w ) \begin{aligned} & \alpha_{1}^{old} y_{1} + \alpha_{2}^{old} y_{2} = \alpha_{1}^{new} y_{1} + \alpha_{2}^{new} y_{2} \\ & \quad\quad\quad\quad \alpha_{1}^{new} = \alpha_{1}^{old} + y_{1} y_{2} \left( \alpha_{2}^{old} - \alpha_{2}^{new} \right) \end{aligned} α1oldy1+α2oldy2=α1newy1+α2newy2α1new=α1old+y1y2(α2oldα2new)

证明

引入 v i = ∑ j = 3 N α j y j K ( x i , x j ) = g ( x i ) − ∑ j = 1 2 α j y j K ( x i , x j ) − b , i = 1 , 2 v_{i} = \sum_{j=3}^{N} \alpha_{j} y_{j} K \left( x_{i}, x_{j} \right) = g \left( x_{i} \right) - \sum_{j=1}^{2}\alpha_{j} y_{j} K \left( x_{i}, x_{j} \right) - b, \quad i=1,2 vi=j=3NαjyjK(xi,xj)=g(xi)j=12αjyjK(xi,xj)b,i=1,2

则目标函数 W ( α 1 , α 2 ) = 1 2 K 11 α 1 2 + 1 2 K 22 α 2 2 + y 1 y 2 K 12 α 1 α 2 − ( α 1 + α 2 ) + y 1 v 1 α 1 + y 2 v 2 α 2 \begin{aligned} & W \left( \alpha_{1}, \alpha_{2} \right) = \dfrac{1}{2} K_{11} \alpha_{1}^{2} + \dfrac{1}{2} K_{22} \alpha_{2}^{2} + y_{1} y_{2} K_{12} \alpha_{1} \alpha_{2} & \quad\quad\quad\quad\quad\quad - \left( \alpha_{1} + \alpha_{2} \right) + y_{1} v_{1} \alpha_{1}+ y_{2} v_{2} \alpha_{2} \end{aligned} W(α1,α2)=21K11α12+21K22α22+y1y2K12α1α2(α1+α2)+y1v1α1+y2v2α2

由于 α 1 y 1 = ς , y i 2 = 1 \alpha_{1} y_{1} = \varsigma, y_{i}^{2} = 1 α1y1=ς,yi2=1,可将 α 1 \alpha_{1} α1表示为 α 1 = ( ς − y 2 α 2 ) y 1 \begin{aligned} \\ & \alpha_{1} = \left( \varsigma - y_{2} \alpha_{2} \right) y_{1}\end{aligned} α1=(ςy2α2)y1
代入,得 W ( α 2 ) = 1 2 K 11 [ ( ς − y 2 α 2 ) y 1 ] 2 + 1 2 K 22 α 2 2 + y 1 y 2 K 12 ( ς − y 2 α 2 ) y 1 α 2 − [ ( ς − y 2 α 2 ) y 1 + α 2 ] + y 1 v 1 ( ς − y 2 α 2 ) y 1 + y 2 v 2 α 2 = 1 2 K 11 ( ς − y 2 α 2 ) 2 + 1 2 K 22 α 2 2 + y 2 K 12 ( ς − y 2 α 2 ) α 2 − ( ς − y 2 α 2 ) y 1 − α 2 + v 1 ( ς − y 2 α 2 ) + y 2 v 2 α 2 \begin{aligned} & W \left( \alpha_{2} \right) = \dfrac{1}{2} K_{11} \left[ \left( \varsigma - y_{2} \alpha_{2} \right) y_{1} \right]^{2} + \dfrac{1}{2} K_{22} \alpha_{2}^{2} + y_{1} y_{2} K_{12} \left( \varsigma - y_{2} \alpha_{2} \right) y_{1} \alpha_{2} \\ & \quad\quad\quad\quad\quad\quad - \left[ \left( \varsigma - y_{2} \alpha_{2} \right) y_{1} + \alpha_{2} \right] + y_{1} v_{1} \left( \varsigma - y_{2} \alpha_{2} \right) y_{1} + y_{2} v_{2} \alpha_{2} \\ & = \dfrac{1}{2} K_{11} \left( \varsigma - y_{2} \alpha_{2} \right)^{2} + \dfrac{1}{2} K_{22} \alpha_{2}^{2} + y_{2} K_{12} \left( \varsigma - y_{2} \alpha_{2} \right) \alpha_{2} \\ & \quad\quad\quad\quad\quad\quad - \left( \varsigma - y_{2} \alpha_{2} \right) y_{1} - \alpha_{2} + v_{1} \left( \varsigma - y_{2} \alpha_{2} \right) + y_{2} v_{2} \alpha_{2} \end{aligned} W(α2)=21K11[(ςy2α2)y1]2+21K22α22+y1y2K12(ςy2α2)y1α2[(ςy2α2)y1+α2]+y1v1(ςy2α2)y1+y2v2α2=21K11(ςy2α2)2+21K22α22+y2K12(ςy2α2)α2(ςy2α2)y1α2+v1(ςy2α2)+y2v2α2

α 2 \alpha_{2} α2求导 ∂ W ∂ α 2 = K 11 α 2 + K 22 α 2 − 2 K 12 α 2 − K 11 ς y 2 + K 12 ς y 2 + y 1 y 2 − 1 − v 1 y 2 + y 2 v 2 \begin{aligned} & \dfrac {\partial W}{\partial \alpha_{2}} = K_{11} \alpha_{2} + K_{22} \alpha_{2} -2 K_{12} \alpha_{2} \\ & \quad\quad\quad - K_{11} \varsigma y_{2} + K_{12} \varsigma y_{2} + y_{1} y_{2} -1 - v_{1} y_{2} + y_{2} v_{2} \end{aligned} α2W=K11α2+K22α22K12α2K11ςy2+K12ςy2+y1y21v1y2+y2v2
令其为0,得 ( K 11 + K 22 − 2 K 12 ) α 2 = y 2 ( y 2 − y 1 + ς K 11 − ς K 12 + v 1 − v 2 ) = y 2 [ y 2 − y 1 + ς K 11 − ς K 12 + ( g ( x 1 ) − ∑ j = 1 2 α j y j K 1 j − b ) − ( g ( x 2 ) − ∑ j = 1 2 α j y j K 2 j − b ) ] \left( K_{11} + K_{22} - 2 K_{12} \right) \alpha_{2} = y_{2} \left( y_{2} - y_{1} + \varsigma K_{11} - \varsigma K_{12} + v_{1} - v_{2} \right) \\ \quad\quad\quad\quad\quad\quad\quad\quad = y_{2} \left[ y_{2} - y_{1} + \varsigma K_{11} - \varsigma K_{12} + \left( g \left( x_{1} \right) - \sum_{j=1}^{2}\alpha_{j} y_{j} K_1j - b \right) \\ - \left( g \left( x_{2} \right) - \sum_{j=1}^{2}\alpha_{j} y_{j} K_2j - b \right) \right] (K11+K222K12)α2=y2(y2y1+ςK11ςK12+v1v2)=y2[y2y1+ςK11ςK12+(g(x1)j=12αjyjK1jb)(g(x2)j=12αjyjK2jb)]

ς = α 1 o l d y 1 + α 2 o l d y 2 \varsigma = \alpha_{1}^{old} y_{1} + \alpha_{2}^{old} y_{2} ς=α1oldy1+α2oldy2代入,得 ( K 11 + K 22 − 2 K 12 ) α 2 n e w , u n c = y 2 ( ( K 11 + K 22 − 2 K 12 ) α 2 o l d y 2 + y 2 − y 1 + g ( x 1 ) − g ( x 2 ) ) = ( K 11 + K 22 − 2 K 12 ) α 2 o l d + y 2 ( E 1 − E 2 ) \begin{aligned} \\ & \left( K_{11} + K_{22} - 2 K_{12} \right) \alpha_{2}^{new,unc} = y_{2} \left( \left( K_{11} + K_{22} - 2 K_{12} \right) \alpha_{2}^{old} y_{2} + y_{2} - y_{1} + g \left( x_{1} \right) - g \left( x_{2} \right) \right) \\ & \quad\quad\quad\quad\quad\quad\quad\quad\quad\quad = \left( K_{11} + K_{22} - 2 K_{12} \right) \alpha_{2}^{old} + y_{2} \left( E_{1} - E_{2} \right) \end{aligned} (K11+K222K12)α2new,unc=y2((K11+K222K12)α2oldy2+y2y1+g(x1)g(x2))=(K11+K222K12)α2old+y2(E1E2)

η = K 11 + K 22 − 2 K 12 \eta = K_{11} + K_{22} - 2 K_{12} η=K11+K222K12代入,得 α 2 n e w , u n c = α 2 o l d + y 2 ( E 1 − E 2 ) η \begin{aligned} \\ & \alpha_{2}^{new,unc} = \alpha_{2}^{old} + \dfrac{y_{2} \left( E_{1} - E_{2} \right)}{\eta}\end{aligned} α2new,unc=α2old+ηy2(E1E2)

计算阈值 b b b和差值 E i E_i Ei

由分量 0 < α 1 n e w < C 0 \lt \alpha_{1}^{new} \lt C 0<α1new<C,则 b 1 n e w = y 1 − ∑ i = 3 N α i y i K i 1 − α 1 n e w y 1 K 11 − α 2 n e w y 2 K 21 \begin{aligned} & b_1^{new} = y_{1} - \sum_{i=3}^{N} \alpha_{i} y_{i} K_{i1} - \alpha_{1}^{new} y_{1} K_{11} - \alpha_{2}^{new} y_{2} K_{21} \end{aligned} b1new=y1i=3NαiyiKi1α1newy1K11α2newy2K21

E 1 = g ( x 1 ) − y 1 = ( ∑ j = 1 N α j y j K i j + b ) − y 1 = ∑ i = 3 N α i y i K i 1 + α 1 o l d y 1 K 11 + α 2 o l d y 2 K 21 + b o l d − y 1 \begin{aligned} & E_{1} = g \left( x_{1} \right) - y_{1} = \left( \sum_{j=1}^{N} \alpha_{j} y_{j} K_{ij} + b \right) - y_{1} \\ & = \sum_{i=3}^{N} \alpha_{i} y_{i} K_{i1} + \alpha_{1}^{old} y_{1} K_{11} + \alpha_{2}^{old} y_{2} K_{21} + b^{old} - y_{1} \end{aligned} E1=g(x1)y1=(j=1NαjyjKij+b)y1=i=3NαiyiKi1+α1oldy1K11+α2oldy2K21+boldy1

y 1 − ∑ i = 3 N α i y i K i 1 = − E 1 + α 1 o l d y 1 K 11 + α 2 o l d y 2 K 21 + b o l d \begin{aligned} & y_{1} - \sum_{i=3}^{N} \alpha_{i} y_{i} K_{i1} = -E_{1} + \alpha_{1}^{old} y_{1} K_{11} + \alpha_{2}^{old} y_{2} K_{21} + b^{old} \end{aligned} y1i=3NαiyiKi1=E1+α1oldy1K11+α2oldy2K21+bold

代入,得 b 1 n e w = − E 1 + y 1 K 11 ( α 1 n e w − α 1 o l d ) − y 2 K 21 ( α 2 n e w − α 2 o l d ) + b o l d \begin{aligned} & b_1^{new} = -E_{1} + y_{1} K_{11} \left( \alpha_{1}^{new} - \alpha_{1}^{old} \right) - y_{2} K_{21} \left( \alpha_{2}^{new} - \alpha_{2}^{old} \right) + b^{old} \end{aligned} b1new=E1+y1K11(α1newα1old)y2K21(α2newα2old)+bold

同理,得 b 2 n e w = − E 2 + y 1 K 12 ( α 1 n e w − α 1 o l d ) − y 2 K 22 ( α 2 n e w − α 2 o l d ) + b o l d \begin{aligned} \\ & b_2^{new} = -E_{2} + y_{1} K_{12} \left( \alpha_{1}^{new} - \alpha_{1}^{old} \right) - y_{2} K_{22} \left( \alpha_{2}^{new} - \alpha_{2}^{old} \right) + b^{old} \end{aligned} b2new=E2+y1K12(α1newα1old)y2K22(α2newα2old)+bold

如果 α 1 n e w , α 2 n e w \alpha_{1}^{new}, \alpha_{2}^{new} α1new,α2new满足 0 < α i n e w < C , i = 1 , 2 0 \lt \alpha_{i}^{new} \lt C, i = 1, 2 0<αinew<C,i=1,2

b n e w = b 1 n e w = b 2 n e w \begin{aligned} & b^{new} = b_{1}^{new} = b_{2}^{new}\end{aligned} bnew=b1new=b2new

否则 b n e w = b 1 n e w + b 2 n e w 2 \begin{aligned} & b^{new} = \dfrac{b_{1}^{new} + b_{2}^{new}}{2} \end{aligned} bnew=2b1new+b2new

更新 E i E_{i} Ei
E i n e w = ∑ S y j α j K ( x i , x j ) + b n e w − y i \begin{aligned} & E_{i}^{new} = \sum_{S} y_{j} \alpha_{j} K_{ \left( x_{i}, x_{j} \right)} + b^{new} - y_{i} \end{aligned} Einew=SyjαjK(xi,xj)+bnewyi

其中, S S S是所有支持向量 x j x_{j} xj的集合。

SMO算法

SMO算法

  • 输入:训练数据集 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , ⋯   , ( x N , y N ) } T = \left\{ \left( x_{1}, y_{1} \right), \left( x_{2}, y_{2} \right), \cdots, \left( x_{N}, y_{N} \right) \right\} T={(x1,y1),(x2,y2),,(xN,yN)},其中 x i ∈ X = R n , y i ∈ Y = { + 1 , − 1 } , i = 1 , 2 , ⋯   , N x_{i} \in \mathcal{X} = R^{n}, y_{i} \in \mathcal{Y} = \left\{ +1, -1 \right\}, i = 1, 2, \cdots, N xiX=Rn,yiY={+1,1},i=1,2,,N,精度 ε \varepsilon ε
  • 输出:近似解 α ^ \hat \alpha α^
  1. 取初始值 α 0 = 0 \alpha^{0} = 0 α0=0,令 k = 0 k = 0 k=0
  2. 选取优化变量 α 1 ( k ) , α 2 ( k ) \alpha_{1}^{\left( k \right)},\alpha_{2}^{\left( k \right)} α1(k),α2(k),求解 min ⁡ α 1 , α 2 W ( α 1 , α 2 ) = 1 2 K 11 α 1 2 + 1 2 K 22 α 2 2 + y 1 y 2 K 12 α 1 α 2 − ( α 1 + α 2 ) + y 1 α 1 ∑ i = 3 N y i α i K i 1 + y 2 α 2 ∑ i = 3 N y i α i K i 2 s . t . α 1 + α 2 = − ∑ i = 3 N α i y i = ς 0 ≤ α i ≤ C , i = 1 , 2 \begin{aligned} \\ & \min_{\alpha_{1}, \alpha_{2}} W \left( \alpha_{1}, \alpha_{2} \right) = \dfrac{1}{2} K_{11} \alpha_{1}^{2} + \dfrac{1}{2} K_{22} \alpha_{2}^{2} + y_{1} y_{2} K_{12} \alpha_{1} \alpha_{2} \\ & \quad\quad\quad\quad\quad\quad - \left( \alpha_{1} + \alpha_{2} \right) + y_{1} \alpha_{1} \sum_{i=3}^{N} y_{i} \alpha_{i} K_{i1} + y_{2} \alpha_{2} \sum_{i=3}^{N} y_{i} \alpha_i K_{i2} \\ & s.t. \quad \alpha_{1} + \alpha_{2} = -\sum_{i=3}^{N} \alpha_{i} y_{i} = \varsigma \\ & 0 \leq \alpha_{i} \leq C , \quad i=1,2 \end{aligned} α1,α2minW(α1,α2)=21K11α12+21K22α22+y1y2K12α1α2(α1+α2)+y1α1i=3NyiαiKi1+y2α2i=3NyiαiKi2s.t.α1+α2=i=3Nαiyi=ς0αiC,i=1,2
    求得最优解 α 1 ( k + 1 ) , α 2 ( k + 1 ) \alpha_{1}^{\left( k+1 \right)},\alpha_{2}^{\left( k+1 \right)} α1(k1),α2(k+1),更新 α \alpha α α ( k + 1 ) \alpha^{\left( k+1 \right)} α(k+1)
  3. 若在精度 ε \varepsilon ε范围内满足停机条件 ∑ i = 1 N α i y i = 0 0 ≤ α i ≤ C , i = 1 , 2 , ⋯   , N y i ⋅ g ( x i ) = { ⩾ 1 , { x i ∣ α i = 0 } = 1 , { x i ∣ 0 < α i < C } ⩽ 1 , { x i ∣ α i = C } \begin{aligned} & \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & 0 \leq \alpha_{i} \leq C, i = 1, 2, \cdots, N \\ & \end{aligned} \\ y_{i} \cdot g\left(x_{i}\right)=\left\{\begin{array}{ll} \geqslant 1, & \left\{x_{i} \mid \alpha_{i}=0\right\} \\ =1, & \left\{x_{i} \mid 0<\alpha_{i}<C\right\} \\ \leqslant 1, & \left\{x_{i} \mid \alpha_{i}=C\right\} \end{array}\right. i=1Nαiyi=00αiC,i=1,2,,Nyig(xi)=1,=1,1,{xiαi=0}{xi0<αi<C}{xiαi=C}则转4.;否则令 k = k + 1 k = k + 1 k=k+1,转2.;
    4.取 α ^ = α ( k + 1 ) \hat \alpha = \alpha^{\left( k + 1 \right)} α^=α(k+1)

5、概要总结

1.支持向量机最简单的情况是线性可分支持向量机,或硬间隔支持向量机。构建它的条件是训练数据线性可分。其学习策略是最大间隔法。可以表示为凸二次规划问题,其原始最优化问题为

min ⁡ w , b 1 2 ∥ w ∥ 2 \min _{w, b} \frac{1}{2}\|w\|^{2} w,bmin21w2 s . t . y i ( w ⋅ x i + b ) − 1 ⩾ 0 , i = 1 , 2 , ⋯   , N s.t. \quad y_{i}\left(w \cdot x_{i}+b\right)-1 \geqslant 0, \quad i=1,2, \cdots, N s.t.yi(wxi+b)10,i=1,2,,N
求得最优化问题的解为 w ∗ w^* w b ∗ b^* b,得到线性可分支持向量机,分离超平面是

w ∗ ⋅ x + b ∗ = 0 w^{*} \cdot x+b^{*}=0 wx+b=0
分类决策函数是

f ( x ) = sign ⁡ ( w ∗ ⋅ x + b ∗ ) f(x)=\operatorname{sign}\left(w^{*} \cdot x+b^{*}\right) f(x)=sign(wx+b)
最大间隔法中,函数间隔与几何间隔是重要的概念。

线性可分支持向量机的最优解存在且唯一。位于间隔边界上的实例点为支持向量。最优分离超平面由支持向量完全决定。 二次规划问题的对偶问题是 min ⁡ 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) − ∑ i = 1 N α i \min \frac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j}\left(x_{i} \cdot x_{j}\right)-\sum_{i=1}^{N} \alpha_{i} min21i=1Nj=1Nαiαjyiyj(xixj)i=1Nαi

s . t . ∑ i = 1 N α i y i = 0 s.t. \quad \sum_{i=1}^{N} \alpha_{i} y_{i}=0 s.t.i=1Nαiyi=0 α i ⩾ 0 , i = 1 , 2 , ⋯   , N \alpha_{i} \geqslant 0, \quad i=1,2, \cdots, N αi0,i=1,2,,N
通常,通过求解对偶问题学习线性可分支持向量机,即首先求解对偶问题的最优值

a ∗ a^* a,然后求最优值 w ∗ w^* w b ∗ b^* b,得出分离超平面和分类决策函数。

2.现实中训练数据是线性可分的情形较少,训练数据往往是近似线性可分的,这时使用线性支持向量机,或软间隔支持向量机。线性支持向量机是最基本的支持向量机。

对于噪声或例外,通过引入松弛变量 ξ i \xi_{\mathrm{i}} ξi,使其“可分”,得到线性支持向量机学习的凸二次规划问题,其原始最优化问题是

min ⁡ w , b , ξ 1 2 ∥ w ∥ 2 + C ∑ i = 1 N ξ i \min _{w, b, \xi} \frac{1}{2}\|w\|^{2}+C \sum_{i=1}^{N} \xi_{i} w,b,ξmin21w2+Ci=1Nξi s . t . y i ( w ⋅ x i + b ) ⩾ 1 − ξ i , i = 1 , 2 , ⋯   , N s.t. \quad y_{i}\left(w \cdot x_{i}+b\right) \geqslant 1-\xi_{i}, \quad i=1,2, \cdots, N s.t.yi(wxi+b)1ξi,i=1,2,,N ξ i ⩾ 0 , i = 1 , 2 , ⋯   , N \xi_{i} \geqslant 0, \quad i=1,2, \cdots, N ξi0,i=1,2,,N
求解原始最优化问题的解 w ∗ w^* w b ∗ b^* b,得到线性支持向量机,其分离超平面为

w ∗ ⋅ x + b ∗ = 0 w^{*} \cdot x+b^{*}=0 wx+b=0
分类决策函数为

f ( x ) = sign ⁡ ( w ∗ ⋅ x + b ∗ ) f(x)=\operatorname{sign}\left(w^{*} \cdot x+b^{*}\right) f(x)=sign(wx+b)
线性可分支持向量机的解 w ∗ w^* w唯一但 b ∗ b^* b不唯一。对偶问题是

min ⁡ α 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) − ∑ i = 1 N α i \min _{\alpha} \frac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j}\left(x_{i} \cdot x_{j}\right)-\sum_{i=1}^{N} \alpha_{i} αmin21i=1Nj=1Nαiαjyiyj(xixj)i=1Nαi s . t . ∑ i = 1 N α i y i = 0 s.t. \quad \sum_{i=1}^{N} \alpha_{i} y_{i}=0 s.t.i=1Nαiyi=0 0 ⩽ α i ⩽ C , i = 1 , 2 , ⋯   , N 0 \leqslant \alpha_{i} \leqslant C, \quad i=1,2, \cdots, N 0αiC,i=1,2,,N
线性支持向量机的对偶学习算法,首先求解对偶问题得到最优解 α ∗ \alpha^* α,然后求原始问题最优解 w ∗ w^* w b ∗ b^* b,得出分离超平面和分类决策函数。

对偶问题的解 α ∗ \alpha^* α中满 α i ∗ > 0 \alpha_{i}^{*}\gt0 αi>0的实例点 x i x_i xi称为支持向量。支持向量可在间隔边界上,也可在间隔边界与分离超平面之间,或者在分离超平面误分一侧。最优分离超平面由支持向量完全决定。

线性支持向量机学习等价于最小化二阶范数正则化的合页函数

∑ i = 1 N [ 1 − y i ( w ⋅ x i + b ) ] + + λ ∥ w ∥ 2 \sum_{i=1}^{N}\left[1-y_{i}\left(w \cdot x_{i}+b\right)\right]_{+}+\lambda\|w\|^{2} i=1N[1yi(wxi+b)]++λw2

3.非线性支持向量机

对于输入空间中的非线性分类问题,可以通过非线性变换将它转化为某个高维特征空间中的线性分类问题,在高维特征空间中学习线性支持向量机。由于在线性支持向量机学习的对偶问题里,目标函数和分类决策函数都只涉及实例与实例之间的内积,所以不需要显式地指定非线性变换,而是用核函数来替换当中的内积。核函数表示,通过一个非线性转换后的两个实例间的内积。具体地, K ( x , z ) K(x,z) K(x,z)是一个核函数,或正定核,意味着存在一个从输入空间x到特征空间的映射 X → H \mathcal{X} \rightarrow \mathcal{H} XH,对任意 X \mathcal{X} X,有

K ( x , z ) = ϕ ( x ) ⋅ ϕ ( z ) K(x, z)=\phi(x) \cdot \phi(z) K(x,z)=ϕ(x)ϕ(z)

对称函数 K ( x , z ) K(x,z) K(x,z)为正定核的充要条件如下:对任意 x i ∈ X , i = 1 , 2 , … , m \mathrm{x}_{\mathrm{i}} \in \mathcal{X}, \quad \mathrm{i}=1,2, \ldots, \mathrm{m} xiX,i=1,2,,m,任意正整数 m m m,对称函数 K ( x , z ) K(x,z) K(x,z)对应的Gram矩阵是半正定的。

所以,在线性支持向量机学习的对偶问题中,用核函数 K ( x , z ) K(x,z) K(x,z)替代内积,求解得到的就是非线性支持向量机

f ( x ) = sign ⁡ ( ∑ i = 1 N α i ∗ y i K ( x , x i ) + b ∗ ) f(x)=\operatorname{sign}\left(\sum_{i=1}^{N} \alpha_{i}^{*} y_{i} K\left(x, x_{i}\right)+b^{*}\right) f(x)=sign(i=1NαiyiK(x,xi)+b)

4.SMO算法

SMO算法是支持向量机学习的一种快速算法,其特点是不断地将原二次规划问题分解为只有两个变量的二次规划子问题,并对子问题进行解析求解,直到所有变量满足KKT条件为止。这样通过启发式的方法得到原二次规划问题的最优解。因为子问题有解析解,所以每次计算子问题都很快,虽然计算子问题次数很多,但在总体上还是高效的。

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值