支持向量机(support vector machines,SVM)是一种二类分类模型。它的基本模型是定义在特征空间上的间隔最大的线性分类器,间隔最大使它有别于感知机;支持向量机还包括核技巧,这使它成为实质上的非线性分类器。支持向量机的学习策略就是间隔最大化,可形式化为一个求解凸二次规划(convex quadratic programming)的问题,也等价于正则化的合页损失函数的最小化问题。支持向量机的学习算法是求解凸二次规划的最优化算法。
1、线性可分支持向量机与硬间隔最大化
线性可分支持向量机
假设给定一个特征空间上的训练数据集 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , ⋯ , ( x N , y N ) } \begin{aligned} & T = \left\{ \left( x_{1}, y_{1} \right), \left( x_{2}, y_{2} \right), \cdots, \left( x_{N}, y_{N} \right) \right\} \end{aligned} T={(x1,y1),(x2,y2),⋯,(xN,yN)}
其中, x i ∈ X = R n , y i ∈ Y = { + 1 , − 1 } , i = 1 , 2 , ⋯ , N x_{i} \in \mathcal{X} = R^{n}, y_{i} \in \mathcal{Y} = \left\{ +1, -1 \right\}, i = 1, 2, \cdots, N xi∈X=Rn,yi∈Y={+1,−1},i=1,2,⋯,N, x i x_{i} xi为第 i i i个特征向量(实例), y i y_{i} yi为第 x i x_{i} xi的类标记,当 y i = + 1 y_{i}=+1 yi=+1时,称 x i x_{i} xi为正例;当 y i = − 1 y_{i}= -1 yi=−1时,称 x i x_{i} xi为负例, ( x i , y i ) \left( x_{i}, y_{i} \right) (xi,yi)称为样本点。
分离超平面对应于方程 w . x + b = 0 w.x+b=0 w.x+b=0,它由法向量 w w w和截距 b b b决定,可用 ( w , b ) (w, b) (w,b)来表示。
线性可分支持向量机(硬间隔支持向量机):给定线性可分训练数据集,通过间隔最大化或等价地求解相应地凸二次规划问题学习得到分离超平面为 w ∗ ⋅ x + b ∗ = 0 \begin{aligned} & w^{*} \cdot x + b^{*} = 0 \end{aligned} w∗⋅x+b∗=0
以及相应的分类决策函数 f ( x ) = s i g n ( w ∗ ⋅ x + b ∗ ) \begin{aligned} & f \left( x \right) = sign \left( w^{*} \cdot x + b^{*} \right) \end{aligned} f(x)=sign(w∗⋅x+b∗)
称为线型可分支持向量机。
函数间隔和几何间隔
对于给定的训练数据集T和超平面 ( w , b ) (w, b) (w,b),定义超平面 ( w , b ) \left( w, b \right) (w,b)关于样本点 ( x i , y i ) \left( x_{i}, y_{i} \right) (xi,yi)的函数间隔为 γ ^ i = y i ( w ⋅ x i + b ) \begin{aligned} & \hat \gamma_{i} = y_{i} \left( w \cdot x_{i} + b \right) \end{aligned} γ^i=yi(w⋅xi+b)
超平面 ( w , b ) \left( w, b \right) (w,b)关于训练集 T T T的函数间隔 γ ^ = min i = 1 , 2 , ⋯ , N γ ^ i \begin{aligned} & \hat \gamma = \min_{i = 1, 2, \cdots, N} \hat \gamma_{i} \end{aligned} γ^=i=1,2,⋯,Nminγ^i
即超平面 ( w , b ) \left( w, b \right) (w,b)关于训练集 T T T中所有样本点 ( x i , y i ) \left( x_{i}, y_{i} \right) (xi,yi)的函数间隔的最小值。
超平面 ( w , b ) \left( w, b \right) (w,b)关于样本点 ( x i , y i ) \left( x_{i}, y_{i} \right) (xi,yi)的几何间隔为 γ i = y i ( w ∥ w ∥ ⋅ x i + b ∥ w ∥ ) \begin{aligned} & \gamma_{i} = y_{i} \left( \dfrac{w}{\| w \|} \cdot x_{i} + \dfrac{b}{\| w \|} \right) \end{aligned} γi=yi(∥w∥w⋅xi+∥w∥b)
超平面 ( w , b ) \left( w, b \right) (w,b)关于训练集 T T T的几何间隔 γ = min i = 1 , 2 , ⋯ , N γ i \begin{aligned} & \gamma = \min_{i = 1, 2, \cdots, N} \gamma_{i} \end{aligned} γ=i=1,2,⋯,Nminγi
即超平面 ( w , b ) \left( w, b \right) (w,b)关于训练集 T T T中所有样本点 ( x i , y i ) \left( x_{i}, y_{i} \right) (xi,yi)的几何间隔的最小值。
函数间隔和几何间隔的关系 γ i = γ ^ i ∥ w ∥ γ = γ ^ ∥ w ∥ \begin{aligned} & \gamma_{i} = \dfrac{\hat \gamma_{i}}{\| w \|} \\& \gamma = \dfrac{\hat \gamma}{\| w \|} \end{aligned} γi=∥w∥γ^iγ=∥w∥γ^
间隔最大化
最大间隔分离超平面等价为求解 max w , b γ s . t . y i ( w ∥ w ∥ ⋅ x i + b ∥ w ∥ ) ≥ γ , i = 1 , 2 , ⋯ , N \begin{aligned} & \max_{w,b} \quad \gamma \\ & s.t. \quad y_{i} \left( \dfrac{w}{\| w \|} \cdot x_{i} + \dfrac{b}{\| w \|} \right) \geq \gamma, \quad i=1,2, \cdots, N \end{aligned} w,bmaxγs.t.yi(∥w∥w⋅xi+∥w∥b)≥γ,i=1,2,⋯,N
等价的 max w , b γ ^ ∥ w ∥ s . t . y i ( w ⋅ x i + b ) ≥ γ ^ , i = 1 , 2 , ⋯ , N \begin{aligned} & \max_{w,b} \quad \dfrac{\hat \gamma}{\| w \|} \\ & s.t. \quad y_{i} \left( w \cdot x_{i} + b \right) \geq \hat \gamma, \quad i=1,2, \cdots, N \end{aligned} w,bmax∥w∥γ^s.t.yi(w⋅xi+b)≥γ^,i=1,2,⋯,N
取 γ ^ = 1 \hat \gamma = 1 γ^=1,将其入上面的最优化问题,注意到最大化 1 ∥ w ∥ \dfrac{1}{\| w \|} ∥w∥1和最小化 1 2 ∥ w ∥ 2 \dfrac{1}{2} \| w \|^{2} 21∥w∥2是等价的,
等价的 min w , b 1 2 ∥ w ∥ 2 s . t . y i ( w ⋅ x i + b ) − 1 ≥ 0 , i = 1 , 2 , ⋯ , N \begin{aligned} & \min_{w,b} \quad \dfrac{1}{2} \| w \|^{2} \\ & s.t. \quad y_{i} \left( w \cdot x_{i} + b \right) -1 \geq 0, \quad i=1,2, \cdots, N \end{aligned} w,bmin21∥w∥2s.t.yi(w⋅xi+b)−1≥0,i=1,2,⋯,N
线性可分支持向量机学习算法(最大间隔法):
- 输入:线性可分训练数据集 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , ⋯ , ( x N , y N ) } T = \left\{ \left( x_{1}, y_{1} \right), \left( x_{2}, y_{2} \right), \cdots, \left( x_{N}, y_{N} \right) \right\} T={(x1,y1),(x2,y2),⋯,(xN,yN)},其中 x i ∈ X = R n , y i ∈ Y = { + 1 , − 1 } , i = 1 , 2 , ⋯ , N x_{i} \in \mathcal{X} = R^{n}, y_{i} \in \mathcal{Y} = \left\{ +1, -1 \right\}, i = 1, 2, \cdots, N xi∈X=Rn,yi∈Y={+1,−1},i=1,2,⋯,N
- 输出:最大间隔分离超平面和分类决策函数
- 构建并求解约束最优化问题
min
w
,
b
1
2
∥
w
∥
2
s
.
t
.
y
i
(
w
⋅
x
i
+
b
)
−
1
≥
0
,
i
=
1
,
2
,
⋯
,
N
\begin{aligned} \\ & \min_{w,b} \quad \dfrac{1}{2} \| w \|^{2} \\ & s.t. \quad y_{i} \left( w \cdot x_{i} + b \right) -1 \geq 0, \quad i=1,2, \cdots, N \end{aligned}
w,bmin21∥w∥2s.t.yi(w⋅xi+b)−1≥0,i=1,2,⋯,N
求得最优解 w ∗ , b ∗ w^{*}, b^{*} w∗,b∗。 - 得到分离超平面 w ∗ ⋅ x + b ∗ = 0 \begin{aligned} & w^{*} \cdot x + b^{*} = 0 \end{aligned} w∗⋅x+b∗=0
以及分类决策函数
f
(
x
)
=
s
i
g
n
(
w
∗
⋅
x
+
b
∗
)
\begin{aligned} & f \left( x \right) = sign \left( w^{*} \cdot x + b^{*} \right) \end{aligned}
f(x)=sign(w∗⋅x+b∗)
支持向量和间隔边界
(硬间隔)支持向量:训练数据集的样本点中与分离超平面距离最近的样本点的实例,即使约束条件等号成立的样本点 y i ( w ⋅ x i + b ) − 1 = 0 \begin{aligned} & y_{i} \left( w \cdot x_{i} + b \right) -1 = 0 \end{aligned} yi(w⋅xi+b)−1=0
对
y
i
=
+
1
y_{i} = +1
yi=+1的正例点,支持向量在超平面
H
1
:
w
⋅
x
+
b
=
1
\begin{aligned} & H_{1}:w \cdot x + b = 1 \end{aligned}
H1:w⋅x+b=1
对
y
i
=
−
1
y_{i} = -1
yi=−1的正例点,支持向量在超平面
H
1
:
w
⋅
x
+
b
=
−
1
\begin{aligned} & H_{1}:w \cdot x + b = -1 \end{aligned}
H1:w⋅x+b=−1
H 1 H_{1} H1和 H 2 H_{2} H2称为间隔边界, H 1 H_{1} H1和 H 2 H_{2} H2上的点就是支持向量。
H 1 H_{1} H1和 H 2 H_{2} H2之间的距离称为间隔,且 ∣ H 1 H 2 ∣ = 1 ∥ w ∥ + 1 ∥ w ∥ = 2 ∥ w ∥ |H_{1}H_{2}| = \dfrac{1}{\| w \|} + \dfrac{1}{\| w \|} = \dfrac{2}{\| w \|} ∣H1H2∣=∥w∥1+∥w∥1=∥w∥2。
2、线性支持向量机与软间隔最大化
线性支持向量机
线性支持向量机(软间隔支持向量机):给定线性不可分训练数据集,通过求解凸二次规划问题
min
w
,
b
,
ξ
1
2
∥
w
∥
2
+
C
∑
i
=
1
N
ξ
i
s
.
t
.
y
i
(
w
⋅
x
i
+
b
)
≥
1
−
ξ
i
ξ
i
≥
0
,
i
=
1
,
2
,
⋯
,
N
\begin{aligned} & \min_{w,b,\xi} \quad \dfrac{1}{2} \| w \|^{2} + C \sum_{i=1}^{N} \xi_{i} \\ & s.t. \quad y_{i} \left( w \cdot x_{i} + b \right) \geq 1 - \xi_{i} \\ & \xi_{i} \geq 0, \quad i=1,2, \cdots, N \end{aligned}
w,b,ξmin21∥w∥2+Ci=1∑Nξis.t.yi(w⋅xi+b)≥1−ξiξi≥0,i=1,2,⋯,N
学习得到分离超平面为 w ∗ ⋅ x + b ∗ = 0 \begin{aligned} & w^{*} \cdot x + b^{*} = 0 \end{aligned} w∗⋅x+b∗=0
以及相应的分类决策函数 f ( x ) = s i g n ( w ∗ ⋅ x + b ∗ ) \begin{aligned} & f \left( x \right) = sign \left( w^{*} \cdot x + b^{*} \right) \end{aligned} f(x)=sign(w∗⋅x+b∗)
称为线型支持向量机。
最优化问题的求解:
-
引入拉格朗日乘子 α i ≥ 0 , i = 1 , 2 , ⋯ , N \alpha_{i} \geq 0, i = 1, 2, \cdots, N αi≥0,i=1,2,⋯,N构建拉格朗日函数 L ( w , b , α ) = 1 2 ∥ w ∥ 2 + ∑ i = 1 N α i [ − y i ( w ⋅ x i + b ) + 1 ] = 1 2 ∥ w ∥ 2 − ∑ i = 1 N α i y i ( w ⋅ x i + b ) + ∑ i = 1 N α i \begin{aligned} & L \left( w, b, \alpha \right) = \dfrac{1}{2} \| w \|^{2} + \sum_{i=1}^{N} \alpha_{i} \left[- y_{i} \left( w \cdot x_{i} + b \right) + 1 \right] \\ & = \dfrac{1}{2} \| w \|^{2} - \sum_{i=1}^{N} \alpha_{i} y_{i} \left( w \cdot x_{i} + b \right) + \sum_{i=1}^{N} \alpha_{i} \end{aligned} L(w,b,α)=21∥w∥2+i=1∑Nαi[−yi(w⋅xi+b)+1]=21∥w∥2−i=1∑Nαiyi(w⋅xi+b)+i=1∑Nαi
其中, α = ( α 1 , α 2 , ⋯ , α N ) T \alpha = \left( \alpha_{1}, \alpha_{2}, \cdots, \alpha_{N} \right)^{T} α=(α1,α2,⋯,αN)T为拉格朗日乘子向量。 -
求 min w , b L ( w , b , α ) \min_{w,b}L \left( w, b, \alpha \right) minw,bL(w,b,α): ∇ w L ( w , b , α ) = w − ∑ i = 1 N α i y i x i = 0 ∇ b L ( w , b , α ) = − ∑ i = 1 N α i y i = 0 \begin{aligned} & \nabla _{w} L \left( w, b, \alpha \right) = w - \sum_{i=1}^{N} \alpha_{i} y_{i} x_{i} = 0 \\ & \nabla _{b} L \left( w, b, \alpha \right) = -\sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \end{aligned} ∇wL(w,b,α)=w−i=1∑Nαiyixi=0∇bL(w,b,α)=−i=1∑Nαiyi=0
得
w = ∑ i = 1 N α i y i x i ∑ i = 1 N α i y i = 0 \begin{aligned} & w = \sum_{i=1}^{N} \alpha_{i} y_{i} x_{i} \\ & \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \end{aligned} w=i=1∑Nαiyixii=1∑Nαiyi=0代入拉格朗日函数,得 L ( w , b , α ) = 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) − ∑ i = 1 N α i y i [ ( ∑ j = 1 N α j y j x j ) ⋅ x i + b ] + ∑ i = 1 N α i = − 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) − ∑ i = 1 N α i y i b + ∑ i = 1 N α i = − 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) + ∑ i = 1 N α i \begin{aligned} \\ & L \left( w, b, \alpha \right) = \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} y_{i} \left[ \left( \sum_{j=1}^{N} \alpha_{j} y_{j} x_{j} \right) \cdot x_{i} + b \right] + \sum_{i=1}^{N} \alpha_{i} \\ & = - \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} y_{i} b + \sum_{i=1}^{N} \alpha_{i} \\ & = - \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) + \sum_{i=1}^{N} \alpha_{i} \end{aligned} L(w,b,α)=21i=1∑Nj=1∑Nαiαjyiyj(xi⋅xj)−i=1∑Nαiyi[(j=1∑Nαjyjxj)⋅xi+b]+i=1∑Nαi=−21i=1∑Nj=1∑Nαiαjyiyj(xi⋅xj)−i=1∑Nαiyib+i=1∑Nαi=−21i=1∑Nj=1∑Nαiαjyiyj(xi⋅xj)+i=1∑Nαi
即 min w , b L ( w , b , α ) = − 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) + ∑ i = 1 N α i \begin{aligned} \\ & \min_{w,b}L \left( w, b, \alpha \right) = - \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) + \sum_{i=1}^{N} \alpha_{i} \end{aligned} w,bminL(w,b,α)=−21i=1∑Nj=1∑Nαiαjyiyj(xi⋅xj)+i=1∑Nαi -
求 max α min w , b L ( w , b , α ) \max_{\alpha} \min_{w,b}L \left( w, b, \alpha \right) maxαminw,bL(w,b,α): max α − 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) + ∑ i = 1 N α i s . t . ∑ i = 1 N α i y i = 0 α i ≥ 0 , i = 1 , 2 , ⋯ , N \begin{aligned} \\ & \max_{\alpha} - \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) + \sum_{i=1}^{N} \alpha_{i} \\ & s.t. \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & \alpha_{i} \geq 0, \quad i=1,2, \cdots, N \end{aligned} αmax−21i=1∑Nj=1∑Nαiαjyiyj(xi⋅xj)+i=1∑Nαis.t.i=1∑Nαiyi=0αi≥0,i=1,2,⋯,N
等价的 min α 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) − ∑ i = 1 N α i s . t . ∑ i = 1 N α i y i = 0 α i ≥ 0 , i = 1 , 2 , ⋯ , N \begin{aligned} \\ & \min_{\alpha} \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} \\ & s.t. \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & \alpha_{i} \geq 0, \quad i=1,2, \cdots, N \end{aligned} αmin21i=1∑Nj=1∑Nαiαjyiyj(xi⋅xj)−i=1∑Nαis.t.i=1∑Nαiyi=0αi≥0,i=1,2,⋯,N
线性可分支持向量机(硬间隔支持向量机)学习算法:
- 输入:线性可分训练数据集 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , ⋯ , ( x N , y N ) } T = \left\{ \left( x_{1}, y_{1} \right), \left( x_{2}, y_{2} \right), \cdots, \left( x_{N}, y_{N} \right) \right\} T={(x1,y1),(x2,y2),⋯,(xN,yN)},其中 x i ∈ X = R n , y i ∈ Y = { + 1 , − 1 } , i = 1 , 2 , ⋯ , N x_{i} \in \mathcal{X} = R^{n}, y_{i} \in \mathcal{Y} = \left\{ +1, -1 \right\}, i = 1, 2, \cdots, N xi∈X=Rn,yi∈Y={+1,−1},i=1,2,⋯,N
- 输出:最大间隔分离超平面和分类决策函数
-
构建并求解约束最优化问题 min α 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) − ∑ i = 1 N α i s . t . ∑ i = 1 N α i y i = 0 α i ≥ 0 , i = 1 , 2 , ⋯ , N \begin{aligned} & \min_{\alpha} \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} \\ & s.t. \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & \alpha_{i} \geq 0, \quad i=1,2, \cdots, N \end{aligned} αmin21i=1∑Nj=1∑Nαiαjyiyj(xi⋅xj)−i=1∑Nαis.t.i=1∑Nαiyi=0αi≥0,i=1,2,⋯,N
求得最优解 α ∗ = ( α 1 ∗ , α 1 ∗ , ⋯ , α N ∗ ) \alpha^{*} = \left( \alpha_{1}^{*}, \alpha_{1}^{*}, \cdots, \alpha_{N}^{*} \right) α∗=(α1∗,α1∗,⋯,αN∗)
-
计算 w ∗ = ∑ i = 1 N α i ∗ y i x i \begin{aligned} & w^{*} = \sum_{i=1}^{N} \alpha_{i}^{*} y_{i} x_{i} \end{aligned} w∗=i=1∑Nαi∗yixi
并选择 α ∗ \alpha^{*} α∗的一个正分量 α j ∗ > 0 \alpha_{j}^{*} \gt 0 αj∗>0,计算 b ∗ = y j − ∑ i = 1 N α i ∗ y i ( x i ⋅ x j ) \begin{aligned} & b^{*} = y_{j} - \sum_{i=1}^{N} \alpha_{i}^{*} y_{i} \left( x_{i} \cdot x_{j} \right) \end{aligned} b∗=yj−i=1∑Nαi∗yi(xi⋅xj)
-
得到分离超平面 w ∗ ⋅ x + b ∗ = 0 \begin{aligned} & w^{*} \cdot x + b^{*} = 0 \end{aligned} w∗⋅x+b∗=0
以及分类决策函数
f ( x ) = s i g n ( w ∗ ⋅ x + b ∗ ) \begin{aligned} & f \left( x \right) = sign \left( w^{*} \cdot x + b^{*} \right) \end{aligned} f(x)=sign(w∗⋅x+b∗)
最优化问题的求解:
-
引入拉格朗日乘子 α i ≥ 0 , μ i ≥ 0 , i = 1 , 2 , ⋯ , N \alpha_{i} \geq 0, \mu_{i} \geq 0, i = 1, 2, \cdots, N αi≥0,μi≥0,i=1,2,⋯,N构建拉格朗日函数 L ( w , b , ξ , α , μ ) = 1 2 ∥ w ∥ 2 + C ∑ i = 1 N ξ i + ∑ i = 1 N α i [ − y i ( w ⋅ x i + b ) + 1 − ξ i ] + ∑ i = 1 N μ i ( − ξ i ) = 1 2 ∥ w ∥ 2 + C ∑ i = 1 N ξ i − ∑ i = 1 N α i [ y i ( w ⋅ x i + b ) − 1 + ξ i ] − ∑ i = 1 N μ i ξ i \begin{aligned} & L \left( w, b, \xi, \alpha, \mu \right) = \dfrac{1}{2} \| w \|^{2} + C \sum_{i=1}^{N} \xi_{i} + \sum_{i=1}^{N} \alpha_{i} \left[- y_{i} \left( w \cdot x_{i} + b \right) + 1 - \xi_{i} \right] + \sum_{i=1}^{N} \mu_{i} \left( -\xi_{i} \right) \\ & = \dfrac{1}{2} \| w \|^{2} + C \sum_{i=1}^{N} \xi_{i} - \sum_{i=1}^{N} \alpha_{i} \left[ y_{i} \left( w \cdot x_{i} + b \right) -1 + \xi_{i} \right] - \sum_{i=1}^{N} \mu_{i} \xi_{i} \end{aligned} L(w,b,ξ,α,μ)=21∥w∥2+Ci=1∑Nξi+i=1∑Nαi[−yi(w⋅xi+b)+1−ξi]+i=1∑Nμi(−ξi)=21∥w∥2+Ci=1∑Nξi−i=1∑Nαi[yi(w⋅xi+b)−1+ξi]−i=1∑Nμiξi
其中, α = ( α 1 , α 2 , ⋯ , α N ) T \alpha = \left( \alpha_{1}, \alpha_{2}, \cdots, \alpha_{N} \right)^{T} α=(α1,α2,⋯,αN)T以及 μ = ( μ 1 , μ 2 , ⋯ , μ N ) T \mu = \left( \mu_{1}, \mu_{2}, \cdots, \mu_{N} \right)^{T} μ=(μ1,μ2,⋯,μN)T为拉格朗日乘子向量。
-
求 min w , b L ( w , b , ξ , α , μ ) \min_{w,b}L \left( w, b, \xi, \alpha, \mu \right) minw,bL(w,b,ξ,α,μ): ∇ w L ( w , b , ξ , α , μ ) = w − ∑ i = 1 N α i y i x i = 0 ∇ b L ( w , b , ξ , α , μ ) = − ∑ i = 1 N α i y i = 0 ∇ ξ i L ( w , b , ξ , α , μ ) = C − α i − μ i = 0 \begin{aligned} & \nabla_{w} L \left( w, b, \xi, \alpha, \mu \right) = w - \sum_{i=1}^{N} \alpha_{i} y_{i} x_{i} = 0 \\ & \nabla_{b} L \left( w, b, \xi, \alpha, \mu \right) = -\sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & \nabla_{\xi_{i}} L \left( w, b, \xi, \alpha, \mu \right) = C - \alpha_{i} - \mu_{i} = 0 \end{aligned} ∇wL(w,b,ξ,α,μ)=w−i=1∑Nαiyixi=0∇bL(w,b,ξ,α,μ)=−i=1∑Nαiyi=0∇ξiL(w,b,ξ,α,μ)=C−αi−μi=0
得
w = ∑ i = 1 N α i y i x i ∑ i = 1 N α i y i = 0 C − α i − μ i = 0 \begin{aligned} & w = \sum_{i=1}^{N} \alpha_{i} y_{i} x_{i} \\ & \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & C - \alpha_{i} - \mu_{i} = 0\end{aligned} w=i=1∑Nαiyixii=1∑Nαiyi=0C−αi−μi=0代入拉格朗日函数,得 L ( w , b , ξ , α , μ ) = 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) + C ∑ i = 1 N ξ i − ∑ i = 1 N α i y i [ ( ∑ j = 1 N α j y j x j ) ⋅ x i + b ] + ∑ i = 1 N α i − ∑ i = 1 N α i ξ i − ∑ i N μ i ξ i = − 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) − ∑ i = 1 N α i y i b + ∑ i = 1 N α i + ∑ i = 1 N ξ i ( C − α i − μ i ) = − 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) + ∑ i = 1 N α i \begin{aligned} & L \left( w, b, \xi, \alpha, \mu \right) = \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) + C \sum_{i=1}^{N} \xi_{i} - \sum_{i=1}^{N} \alpha_{i} y_{i} \left[ \left( \sum_{j=1}^{N} \alpha_{j} y_{j} x_{j} \right) \cdot x_{i} + b \right] \\ & \quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad + \sum_{i=1}^{N} \alpha_{i} - \sum_{i=1}^{N} \alpha_{i} \xi_{i} - \sum_{i}^{N} \mu_{i} \xi_{i} \\ & = - \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} y_{i} b + \sum_{i=1}^{N} \alpha_{i} + \sum_{i=1}^{N} \xi_{i} \left( C - \alpha_{i} - \mu_{i} \right) \\ & = - \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) + \sum_{i=1}^{N} \alpha_{i} \end{aligned} L(w,b,ξ,α,μ)=21i=1∑Nj=1∑Nαiαjyiyj(xi⋅xj)+Ci=1∑Nξi−i=1∑Nαiyi[(j=1∑Nαjyjxj)⋅xi+b]+i=1∑Nαi−i=1∑Nαiξi−i∑Nμiξi=−21i=1∑Nj=1∑Nαiαjyiyj(xi⋅xj)−i=1∑Nαiyib+i=1∑Nαi+i=1∑Nξi(C−αi−μi)=−21i=1∑Nj=1∑Nαiαjyiyj(xi⋅xj)+i=1∑Nαi
即 min w , b , ξ L ( w , b , ξ , α , μ ) = − 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) + ∑ i = 1 N α i \begin{aligned} & \min_{w,b,\xi}L \left( w, b, \xi, \alpha, \mu \right) = - \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) + \sum_{i=1}^{N} \alpha_{i} \end{aligned} w,b,ξminL(w,b,ξ,α,μ)=−21i=1∑Nj=1∑Nαiαjyiyj(xi⋅xj)+i=1∑Nαi
-
求 max α min w , b , ξ L ( w , b , ξ , α , μ ) \max_{\alpha} \min_{w,b, \xi}L \left( w, b, \xi, \alpha, \mu \right) maxαminw,b,ξL(w,b,ξ,α,μ): max α − 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) + ∑ i = 1 N α i s . t . ∑ i = 1 N α i y i = 0 C − α i − μ i = 0 α i ≥ 0 μ i ≥ 0 , i = 1 , 2 , ⋯ , N \begin{aligned} & \max_{\alpha} - \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) + \sum_{i=1}^{N} \alpha_{i} \\ & s.t. \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & C - \alpha_{i} - \mu_{i} = 0 \\ & \alpha_{i} \geq 0 \\ & \mu_{i} \geq 0, \quad i=1,2, \cdots, N \end{aligned} αmax−21i=1∑Nj=1∑Nαiαjyiyj(xi⋅xj)+i=1∑Nαis.t.i=1∑Nαiyi=0C−αi−μi=0αi≥0μi≥0,i=1,2,⋯,N
等价的 min α 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) − ∑ i = 1 N α i s . t . ∑ i = 1 N α i y i = 0 0 ≤ α i ≤ C , i = 1 , 2 , ⋯ , N \begin{aligned} & \min_{\alpha} \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} \\ & s.t. \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & 0 \leq \alpha_{i} \leq C , \quad i=1,2, \cdots, N \end{aligned} αmin21i=1∑Nj=1∑Nαiαjyiyj(xi⋅xj)−i=1∑Nαis.t.i=1∑Nαiyi=00≤αi≤C,i=1,2,⋯,N
线性支持向量机(软间隔支持向量机)学习算法:
- 输入:训练数据集 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , ⋯ , ( x N , y N ) } T = \left\{ \left( x_{1}, y_{1} \right), \left( x_{2}, y_{2} \right), \cdots, \left( x_{N}, y_{N} \right) \right\} T={(x1,y1),(x2,y2),⋯,(xN,yN)},其中 x i ∈ X = R n , y i ∈ Y = { + 1 , − 1 } , i = 1 , 2 , ⋯ , N x_{i} \in \mathcal{X} = R^{n}, y_{i} \in \mathcal{Y} = \left\{ +1, -1 \right\}, i = 1, 2, \cdots, N xi∈X=Rn,yi∈Y={+1,−1},i=1,2,⋯,N
- 输出:最大间隔分离超平面和分类决策函数
-
选择惩罚参数 C ≥ 0 C \geq 0 C≥0,构建并求解约束最优化问题 min α 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) − ∑ i = 1 N α i s . t . ∑ i = 1 N α i y i = 0 0 ≤ α i ≤ C , i = 1 , 2 , ⋯ , N \begin{aligned} & \min_{\alpha} \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} \\ & s.t. \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & 0 \leq \alpha_{i} \leq C , \quad i=1,2, \cdots, N \end{aligned} αmin21i=1∑Nj=1∑Nαiαjyiyj(xi⋅xj)−i=1∑Nαis.t.i=1∑Nαiyi=00≤αi≤C,i=1,2,⋯,N
求得最优解 α ∗ = ( α 1 ∗ , α 1 ∗ , ⋯ , α N ∗ ) \alpha^{*} = \left( \alpha_{1}^{*}, \alpha_{1}^{*}, \cdots, \alpha_{N}^{*} \right) α∗=(α1∗,α1∗,⋯,αN∗)
-
计算 w ∗ = ∑ i = 1 N α i ∗ y i x i \begin{aligned} & w^{*} = \sum_{i=1}^{N} \alpha_{i}^{*} y_{i} x_{i} \end{aligned} w∗=i=1∑Nαi∗yixi
并选择 α ∗ \alpha^{*} α∗的一个分量 0 < α j ∗ < C 0 \lt \alpha_{j}^{*} \lt C 0<αj∗<C,计算 b ∗ = y j − ∑ i = 1 N α i ∗ y i ( x i ⋅ x j ) \begin{aligned} & b^{*} = y_{j} - \sum_{i=1}^{N} \alpha_{i}^{*} y_{i} \left( x_{i} \cdot x_{j} \right) \end{aligned} b∗=yj−i=1∑Nαi∗yi(xi⋅xj)
-
得到分离超平面 w ∗ ⋅ x + b ∗ = 0 \begin{aligned} & w^{*} \cdot x + b^{*} = 0 \end{aligned} w∗⋅x+b∗=0
以及分类决策函数
f ( x ) = s i g n ( w ∗ ⋅ x + b ∗ ) \begin{aligned} & f \left( x \right) = sign \left( w^{*} \cdot x + b^{*} \right) \end{aligned} f(x)=sign(w∗⋅x+b∗)
支持向量
(软间隔)支持向量:线性不可分情况下,最优化问题的解 α ∗ = ( α 1 ∗ , α 2 ∗ , ⋯ , α N ∗ ) T \alpha^{*} = \left( \alpha_{1}^{*}, \alpha_{2}^{*}, \cdots, \alpha_{N}^{*} \right)^{T} α∗=(α1∗,α2∗,⋯,αN∗)T中对应于 α i ∗ > 0 \alpha_{i}^{*} \gt 0 αi∗>0的样本点 ( x i , y i ) \left( x_{i}, y_{i} \right) (xi,yi)的实例 x i x_{i} xi。
实例 x i x_{i} xi的几何间隔 γ i = y i ( w ⋅ x i + b ) ∥ w ∥ = ∣ 1 − ξ i ∣ ∥ w ∥ \begin{aligned} & \gamma_{i} = \dfrac{y_{i} \left( w \cdot x_{i} + b \right)}{ \| w \|} = \dfrac{| 1 - \xi_{i} |}{\| w \|} \end{aligned} γi=∥w∥yi(w⋅xi+b)=∥w∥∣1−ξi∣
且 1 2 ∣ H 1 H 2 ∣ = 1 ∥ w ∥ \dfrac{1}{2} | H_{1}H_{2} | = \dfrac{1}{\| w \|} 21∣H1H2∣=∥w∥1
则实例 x i x_{i} xi到间隔边界的距离 ∣ γ i − 1 ∥ w ∥ ∣ = ∣ ∣ 1 − ξ i ∣ ∥ w ∥ − 1 ∥ w ∥ ∣ = ξ i ∥ w ∥ \begin{aligned} & \left| \gamma_{i} - \dfrac{1}{\| w \|} \right| = \left| \dfrac{| 1 - \xi_{i} |}{\| w \|} - \dfrac{1}{\| w \|} \right| = \dfrac{\xi_{i}}{\| w \|}\end{aligned} ∣∣∣∣γi−∥w∥1∣∣∣∣=∣∣∣∣∥w∥∣1−ξi∣−∥w∥1∣∣∣∣=∥w∥ξi
ξ i ≥ 0 ⇔ { ξ i = 0 , x i 在 间 隔 边 界 上 ; 0 < ξ i < 1 , x i 在 间 隔 边 界 与 分 离 超 平 面 之 间 ; ξ i = 1 , x i 在 分 离 超 平 面 上 ; ξ i > 1 , x i 在 分 离 超 平 面 误 分 类 一 侧 ; \begin{aligned} \xi_{i} \geq 0 \Leftrightarrow \left\{ \begin{aligned} \ & \xi_{i}=0, x_{i}在间隔边界上; \\ & 0 \lt \xi_{i} \lt 1, x_{i}在间隔边界与分离超平面之间; \\ & \xi_{i}=1, x_{i}在分离超平面上; \\ & \xi_{i}\gt1, x_{i}在分离超平面误分类一侧; \end{aligned} \right.\end{aligned} ξi≥0⇔⎩⎪⎪⎪⎪⎨⎪⎪⎪⎪⎧ ξi=0,xi在间隔边界上;0<ξi<1,xi在间隔边界与分离超平面之间;ξi=1,xi在分离超平面上;ξi>1,xi在分离超平面误分类一侧;
合页损失函数
线性支持向量机(软间隔)的合页损失函数 L ( y ( w ⋅ x + b ) ) = [ 1 − y ( w ⋅ x + b ) ] + \begin{aligned} & L \left( y \left( w \cdot x + b \right) \right) = \left[ 1 - y \left(w \cdot x + b \right) \right]_{+} \end{aligned} L(y(w⋅x+b))=[1−y(w⋅x+b)]+
其中,“+”为取正函数 [ z ] + = { z , z > 0 0 , z ≤ 0 \begin{aligned} \left[ z \right]_{+} = \left\{ \begin{aligned} \ & z, z \gt 0 \\ & 0, z \leq 0 \end{aligned} \right.\end{aligned} [z]+={ z,z>00,z≤0
3、非线性支持向量机与核函数
核函数
设 X \mathcal{X} X是输入空间(欧氏空间 R n R^{n} Rn的子集或离散集合), H \mathcal{H} H是特征空间(希尔伯特空间),如果存在一个从 X \mathcal{X} X到 H \mathcal{H} H的映射 ϕ ( x ) : X → H \begin{aligned} & \phi \left( x \right) : \mathcal{X} \to \mathcal{H} \end{aligned} ϕ(x):X→H
使得对所有
x
,
z
∈
X
x,z \in \mathcal{X}
x,z∈X,函数
K
(
x
,
z
)
K \left(x, z \right)
K(x,z)满足条件
K
(
x
,
z
)
=
ϕ
(
x
)
⋅
ϕ
(
z
)
\begin{aligned} & K \left(x, z \right) = \phi \left( x \right) \cdot \phi \left( z \right) \end{aligned}
K(x,z)=ϕ(x)⋅ϕ(z)
则称 K ( x , z ) K \left(x, z \right) K(x,z)为核函数, ϕ ( x ) \phi \left( x \right) ϕ(x)为映射函数,式中 ϕ ( x ) ⋅ ϕ ( z ) \phi \left( x \right) \cdot \phi \left( z \right) ϕ(x)⋅ϕ(z)为 ϕ ( x ) \phi \left( x \right) ϕ(x)和 ϕ ( z ) \phi \left( z \right) ϕ(z)的内积。
常用核函数
多项式核函数 K ( x , z ) = ( x ⋅ z + 1 ) p \begin{aligned} & K \left( x, z \right) = \left( x \cdot z + 1 \right)^{p} \end{aligned} K(x,z)=(x⋅z+1)p
高斯核函数
K
(
x
,
z
)
=
exp
(
−
∥
x
−
z
∥
2
2
σ
2
)
\begin{aligned} & K \left( x, z \right) = \exp \left( - \dfrac{\| x - z \|^{2}}{2 \sigma^{2}} \right) \end{aligned}
K(x,z)=exp(−2σ2∥x−z∥2)
非线性支持向量分类机
非线性支持向量机:从非线性分类训练集,通过核函数与软间隔最大化,学习得到分类决策函数
f
(
x
)
=
s
i
g
n
(
∑
i
=
1
N
α
i
∗
y
i
K
(
x
,
x
i
)
+
b
∗
)
\begin{aligned} & f \left( x \right) = sign \left( \sum_{i=1}^{N} \alpha_{i}^{*} y_{i} K \left(x, x_{i} \right) + b^{*} \right) \end{aligned}
f(x)=sign(i=1∑Nαi∗yiK(x,xi)+b∗)
称为非线性支持向量机, K ( x , z ) K \left( x, z \right) K(x,z)是正定核函数。
非线性支持向量机学习算法:
- 输入:训练数据集 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , ⋯ , ( x N , y N ) } T = \left\{ \left( x_{1}, y_{1} \right), \left( x_{2}, y_{2} \right), \cdots, \left( x_{N}, y_{N} \right) \right\} T={(x1,y1),(x2,y2),⋯,(xN,yN)},其中 x i ∈ X = R n , y i ∈ Y = { + 1 , − 1 } , i = 1 , 2 , ⋯ , N x_{i} \in \mathcal{X} = R^{n}, y_{i} \in \mathcal{Y} = \left\{ +1, -1 \right\}, i = 1, 2, \cdots, N xi∈X=Rn,yi∈Y={+1,−1},i=1,2,⋯,N
- 输出:分类决策函数
- 选择适当的核函数
K
(
x
,
z
)
K \left( x, z \right)
K(x,z)和惩罚参数
C
≥
0
C \geq 0
C≥0,构建并求解约束最优化问题
min
α
1
2
∑
i
=
1
N
∑
j
=
1
N
α
i
α
j
y
i
y
j
K
(
x
i
,
x
j
)
−
∑
i
=
1
N
α
i
s
.
t
.
∑
i
=
1
N
α
i
y
i
=
0
0
≤
α
i
≤
C
,
i
=
1
,
2
,
⋯
,
N
\begin{aligned} & \min_{\alpha} \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} K \left( x_{i}, x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} \\ & s.t. \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & 0 \leq \alpha_{i} \leq C , \quad i=1,2, \cdots, N \end{aligned}
αmin21i=1∑Nj=1∑NαiαjyiyjK(xi,xj)−i=1∑Nαis.t.i=1∑Nαiyi=00≤αi≤C,i=1,2,⋯,N
求得最优解 α ∗ = ( α 1 ∗ , α 1 ∗ , ⋯ , α N ∗ ) \alpha^{*} = \left( \alpha_{1}^{*}, \alpha_{1}^{*}, \cdots, \alpha_{N}^{*} \right) α∗=(α1∗,α1∗,⋯,αN∗) - 计算
w
∗
=
∑
i
=
1
N
α
i
∗
y
i
x
i
\begin{aligned} \\ & w^{*} = \sum_{i=1}^{N} \alpha_{i}^{*} y_{i} x_{i} \end{aligned}
w∗=i=1∑Nαi∗yixi
并选择 α ∗ \alpha^{*} α∗的一个分量 0 < α j ∗ < C 0 \lt \alpha_{j}^{*} \lt C 0<αj∗<C,计算 b ∗ = y j − ∑ i = 1 N α i ∗ y i K ( x i , x j ) \begin{aligned} \\ & b^{*} = y_{j} - \sum_{i=1}^{N} \alpha_{i}^{*} y_{i} K \left( x_{i}, x_{j} \right) \end{aligned} b∗=yj−i=1∑Nαi∗yiK(xi,xj) - 得到分离超平面
w
∗
⋅
x
+
b
∗
=
0
\begin{aligned} \\ & w^{*} \cdot x + b^{*} = 0 \end{aligned}
w∗⋅x+b∗=0
以及分类决策函数
f ( x ) = s i g n ( ∑ i = 1 N α i ∗ y i K ( x i , x j ) + b ∗ ) \begin{aligned} \\& f \left( x \right) = sign \left( \sum_{i=1}^{N} \alpha_{i}^{*} y_{i} K \left( x_{i}, x_{j} \right) + b^{*} \right) \end{aligned} f(x)=sign(i=1∑Nαi∗yiK(xi,xj)+b∗)
4、序列最小最优化算法
本节讨论支持向量机学习的实现问题。我们知道,支持向量机的学习问题可以形式化为求解凸二次规划问题。这样的凸二次规划问题具有全局最优解,并且有许多最优化算法可以用于这一问题的求解。但是当训练样本容量很大时,这些算法往往变得非常低效,以致无法使用。
两个变量二次规划的求解方法
序列最小最优化(sequential minimal optimization,SMO)算法 要解如下凸二次规划的对偶问题:
min
α
1
2
∑
i
=
1
N
∑
j
=
1
N
α
i
α
j
y
i
y
j
K
(
x
i
,
x
j
)
−
∑
i
=
1
N
α
i
s
.
t
.
∑
i
=
1
N
α
i
y
i
=
0
0
≤
α
i
≤
C
,
i
=
1
,
2
,
⋯
,
N
\begin{aligned} \min_{\alpha} &\dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} K \left( x_{i}, x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} \\ s.t. & \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & 0 \leq \alpha_{i} \leq C , \quad i=1,2, \cdots, N \end{aligned}
αmins.t.21i=1∑Nj=1∑NαiαjyiyjK(xi,xj)−i=1∑Nαii=1∑Nαiyi=00≤αi≤C,i=1,2,⋯,N
选择
α
1
,
α
2
\alpha_{1}, \alpha_{2}
α1,α2两个变量,其他变量
α
i
(
i
=
3
,
4
,
⋯
,
N
)
\alpha_{i} \left( i = 3, 4, \cdots, N \right)
αi(i=3,4,⋯,N)是固定的,SMO的最优化问题的子问题
min
α
1
,
α
2
W
(
α
1
,
α
2
)
=
1
2
K
11
α
1
2
+
1
2
K
22
α
2
2
+
y
1
y
2
K
12
α
1
α
2
−
(
α
1
+
α
2
)
+
y
1
α
1
∑
i
=
3
N
y
i
α
i
K
i
1
+
y
2
α
2
∑
i
=
3
N
y
i
α
i
K
i
2
s
.
t
.
α
1
+
α
2
=
−
∑
i
=
3
N
α
i
y
i
=
ς
0
≤
α
i
≤
C
,
i
=
1
,
2
\begin{aligned} & \min_{\alpha_{1}, \alpha_{2}} W \left( \alpha_{1}, \alpha_{2} \right) = \dfrac{1}{2} K_{11} \alpha_{1}^{2} + \dfrac{1}{2} K_{22} \alpha_{2}^{2} + y_{1} y_{2} K_{12} \alpha_{1} \alpha_{2} \\ & \quad\quad\quad\quad\quad\quad - \left( \alpha_{1} + \alpha_{2} \right) + y_{1} \alpha_{1} \sum_{i=3}^{N} y_{i} \alpha_{i} K_{i1} + y_{2} \alpha_{2} \sum_{i=3}^{N} y_{i} \alpha_i K_{i2} \\ & s.t. \quad \alpha_{1} + \alpha_{2} = -\sum_{i=3}^{N} \alpha_{i} y_{i} = \varsigma \\ & 0 \leq \alpha_{i} \leq C , \quad i=1,2 \end{aligned}
α1,α2minW(α1,α2)=21K11α12+21K22α22+y1y2K12α1α2−(α1+α2)+y1α1i=3∑NyiαiKi1+y2α2i=3∑NyiαiKi2s.t.α1+α2=−i=3∑Nαiyi=ς0≤αi≤C,i=1,2
其中, K i j = K ( x i , x j ) , i , j = 1 , 2 , ⋯ , N , ς K_{ij} = K \left( x_{i}, x_{j} \right), i,j = 1,2, \cdots, N, \varsigma Kij=K(xi,xj),i,j=1,2,⋯,N,ς是常数,且省略了不含 α 1 , α 2 \alpha_{1}, \alpha_{2} α1,α2的常数项。
设凸二次规划的对偶问题的初始可行解为 α 1 o l d , α 2 o l d \alpha_{1}^{old}, \alpha_{2}^{old} α1old,α2old,最优解为 α 1 n e w , α 2 n e w \alpha_{1}^{new}, \alpha_{2}^{new} α1new,α2new,且在沿着约束方向未经剪辑时 α 2 \alpha_{2} α2的最优解为 α 2 n e w , u n c \alpha_{2}^{new,unc} α2new,unc。
由于 α 2 n e w \alpha_{2}^{new} α2new需要满足 0 ≤ α i ≤ C 0 \leq \alpha_{i} \leq C 0≤αi≤C,所以最优解 α 2 n e w \alpha_{2}^{new} α2new的取值范围需满足 L ≤ α 2 n e w ≤ H \begin{aligned} & L \leq \alpha_{2}^{new} \leq H \end{aligned} L≤α2new≤H
其中,L与H是
α
2
n
e
w
\alpha_{2}^{new}
α2new所在的对角线段断点的界。
如果
y
1
≠
y
2
y_{1} \neq y_{2}
y1=y2,则
L
=
max
(
0
,
α
2
o
l
d
−
α
1
o
l
d
)
,
H
=
min
(
C
,
C
+
α
2
o
l
d
−
α
1
o
l
d
)
\begin{aligned} & L = \max \left( 0, \alpha_{2}^{old} - \alpha_{1}^{old} \right), H = \min \left( C, C + \alpha_{2}^{old} - \alpha_{1}^{old} \right) \end{aligned}
L=max(0,α2old−α1old),H=min(C,C+α2old−α1old)
如果
y
1
=
y
2
y_{1} = y_{2}
y1=y2,则
L
=
max
(
0
,
α
2
o
l
d
+
α
1
o
l
d
−
C
)
,
H
=
min
(
C
,
α
2
o
l
d
+
α
1
o
l
d
)
\begin{aligned} & L = \max \left( 0, \alpha_{2}^{old} + \alpha_{1}^{old} - C \right), H = \min \left( C, \alpha_{2}^{old} + \alpha_{1}^{old} \right) \end{aligned}
L=max(0,α2old+α1old−C),H=min(C,α2old+α1old)
记 g ( x ) = ∑ i = 1 N α i y i K ( x i , x ) + b \begin{aligned} & g \left( x \right) = \sum_{i=1}^{N} \alpha_{i} y_{i} K \left( x_{i}, x \right) + b \end{aligned} g(x)=i=1∑NαiyiK(xi,x)+b
令 E i = g ( x i ) − y i = ( ∑ j = 1 N α j y j K ( x j , x i ) + b ) − y i , i = 1 , 2 E_{i} = g \left( x_{i} \right) - y_{i} = \left( \sum_{j=1}^{N} \alpha_{j} y_{j} K \left( x_{j}, x_{i} \right) + b \right) - y_{i}, \quad i=1,2 Ei=g(xi)−yi=(j=1∑NαjyjK(xj,xi)+b)−yi,i=1,2
最优化问题
沿着约束方向未经剪辑时的解是
α 2 n e w , u n c = α 2 o l d + y 2 ( E 1 − E 2 ) η \begin{aligned} & \alpha_{2}^{new,unc} = \alpha_{2}^{old} + \dfrac{y_{2} \left( E_{1} - E_{2} \right)}{\eta}\end{aligned} α2new,unc=α2old+ηy2(E1−E2)
其中
η
=
K
11
+
K
22
−
2
K
12
=
∥
Φ
(
x
1
)
−
Φ
(
x
2
)
∥
2
\eta=K_{11}+K_{22}-2 K_{12}=\left\|\Phi\left(x_{1}\right)-\Phi\left(x_{2}\right)\right\|^{2}
η=K11+K22−2K12=∥Φ(x1)−Φ(x2)∥2
Φ ( x ) \Phi(x) Φ(x) 是输入空间到特征空间的映射, E i , i = 1 , 2 E_{i}, i=1,2 Ei,i=1,2 。
经剪辑后 α 2 n e w = { H , α 2 n e w , u n c g t ; H α 2 n e w , u n c , L ≤ α 2 n e w , u n c ≤ H L , α 2 n e w , u n c l t ; L \begin{aligned} \alpha_{2}^{new} = \left\{ \begin{aligned} \ & H, \alpha_{2}^{new,unc} > H \\ & \alpha_{2}^{new,unc}, L \leq \alpha_{2}^{new,unc} \leq H \\ & L, \alpha_{2}^{new,unc} < L \end{aligned} \right.\end{aligned} α2new=⎩⎪⎨⎪⎧ H,α2new,uncα2new,unc,L≤α2new,unc≤HL,α2new,uncgt;Hlt;L
由于
ς
=
α
1
o
l
d
y
1
+
α
2
o
l
d
y
2
\varsigma = \alpha_{1}^{old} y_{1} + \alpha_{2}^{old} y_{2}
ς=α1oldy1+α2oldy2及
ς
=
α
1
n
e
w
y
1
+
α
2
n
e
w
y
2
\varsigma = \alpha_{1}^{new} y_{1} + \alpha_{2}^{new} y_{2}
ς=α1newy1+α2newy2
则
α
1
o
l
d
y
1
+
α
2
o
l
d
y
2
=
α
1
n
e
w
y
1
+
α
2
n
e
w
y
2
α
1
n
e
w
=
α
1
o
l
d
+
y
1
y
2
(
α
2
o
l
d
−
α
2
n
e
w
)
\begin{aligned} & \alpha_{1}^{old} y_{1} + \alpha_{2}^{old} y_{2} = \alpha_{1}^{new} y_{1} + \alpha_{2}^{new} y_{2} \\ & \quad\quad\quad\quad \alpha_{1}^{new} = \alpha_{1}^{old} + y_{1} y_{2} \left( \alpha_{2}^{old} - \alpha_{2}^{new} \right) \end{aligned}
α1oldy1+α2oldy2=α1newy1+α2newy2α1new=α1old+y1y2(α2old−α2new)
证明:
引入 v i = ∑ j = 3 N α j y j K ( x i , x j ) = g ( x i ) − ∑ j = 1 2 α j y j K ( x i , x j ) − b , i = 1 , 2 v_{i} = \sum_{j=3}^{N} \alpha_{j} y_{j} K \left( x_{i}, x_{j} \right) = g \left( x_{i} \right) - \sum_{j=1}^{2}\alpha_{j} y_{j} K \left( x_{i}, x_{j} \right) - b, \quad i=1,2 vi=j=3∑NαjyjK(xi,xj)=g(xi)−j=1∑2αjyjK(xi,xj)−b,i=1,2
则目标函数 W ( α 1 , α 2 ) = 1 2 K 11 α 1 2 + 1 2 K 22 α 2 2 + y 1 y 2 K 12 α 1 α 2 − ( α 1 + α 2 ) + y 1 v 1 α 1 + y 2 v 2 α 2 \begin{aligned} & W \left( \alpha_{1}, \alpha_{2} \right) = \dfrac{1}{2} K_{11} \alpha_{1}^{2} + \dfrac{1}{2} K_{22} \alpha_{2}^{2} + y_{1} y_{2} K_{12} \alpha_{1} \alpha_{2} & \quad\quad\quad\quad\quad\quad - \left( \alpha_{1} + \alpha_{2} \right) + y_{1} v_{1} \alpha_{1}+ y_{2} v_{2} \alpha_{2} \end{aligned} W(α1,α2)=21K11α12+21K22α22+y1y2K12α1α2−(α1+α2)+y1v1α1+y2v2α2
由于
α
1
y
1
=
ς
,
y
i
2
=
1
\alpha_{1} y_{1} = \varsigma, y_{i}^{2} = 1
α1y1=ς,yi2=1,可将
α
1
\alpha_{1}
α1表示为
α
1
=
(
ς
−
y
2
α
2
)
y
1
\begin{aligned} \\ & \alpha_{1} = \left( \varsigma - y_{2} \alpha_{2} \right) y_{1}\end{aligned}
α1=(ς−y2α2)y1
代入,得
W
(
α
2
)
=
1
2
K
11
[
(
ς
−
y
2
α
2
)
y
1
]
2
+
1
2
K
22
α
2
2
+
y
1
y
2
K
12
(
ς
−
y
2
α
2
)
y
1
α
2
−
[
(
ς
−
y
2
α
2
)
y
1
+
α
2
]
+
y
1
v
1
(
ς
−
y
2
α
2
)
y
1
+
y
2
v
2
α
2
=
1
2
K
11
(
ς
−
y
2
α
2
)
2
+
1
2
K
22
α
2
2
+
y
2
K
12
(
ς
−
y
2
α
2
)
α
2
−
(
ς
−
y
2
α
2
)
y
1
−
α
2
+
v
1
(
ς
−
y
2
α
2
)
+
y
2
v
2
α
2
\begin{aligned} & W \left( \alpha_{2} \right) = \dfrac{1}{2} K_{11} \left[ \left( \varsigma - y_{2} \alpha_{2} \right) y_{1} \right]^{2} + \dfrac{1}{2} K_{22} \alpha_{2}^{2} + y_{1} y_{2} K_{12} \left( \varsigma - y_{2} \alpha_{2} \right) y_{1} \alpha_{2} \\ & \quad\quad\quad\quad\quad\quad - \left[ \left( \varsigma - y_{2} \alpha_{2} \right) y_{1} + \alpha_{2} \right] + y_{1} v_{1} \left( \varsigma - y_{2} \alpha_{2} \right) y_{1} + y_{2} v_{2} \alpha_{2} \\ & = \dfrac{1}{2} K_{11} \left( \varsigma - y_{2} \alpha_{2} \right)^{2} + \dfrac{1}{2} K_{22} \alpha_{2}^{2} + y_{2} K_{12} \left( \varsigma - y_{2} \alpha_{2} \right) \alpha_{2} \\ & \quad\quad\quad\quad\quad\quad - \left( \varsigma - y_{2} \alpha_{2} \right) y_{1} - \alpha_{2} + v_{1} \left( \varsigma - y_{2} \alpha_{2} \right) + y_{2} v_{2} \alpha_{2} \end{aligned}
W(α2)=21K11[(ς−y2α2)y1]2+21K22α22+y1y2K12(ς−y2α2)y1α2−[(ς−y2α2)y1+α2]+y1v1(ς−y2α2)y1+y2v2α2=21K11(ς−y2α2)2+21K22α22+y2K12(ς−y2α2)α2−(ς−y2α2)y1−α2+v1(ς−y2α2)+y2v2α2
对
α
2
\alpha_{2}
α2求导
∂
W
∂
α
2
=
K
11
α
2
+
K
22
α
2
−
2
K
12
α
2
−
K
11
ς
y
2
+
K
12
ς
y
2
+
y
1
y
2
−
1
−
v
1
y
2
+
y
2
v
2
\begin{aligned} & \dfrac {\partial W}{\partial \alpha_{2}} = K_{11} \alpha_{2} + K_{22} \alpha_{2} -2 K_{12} \alpha_{2} \\ & \quad\quad\quad - K_{11} \varsigma y_{2} + K_{12} \varsigma y_{2} + y_{1} y_{2} -1 - v_{1} y_{2} + y_{2} v_{2} \end{aligned}
∂α2∂W=K11α2+K22α2−2K12α2−K11ςy2+K12ςy2+y1y2−1−v1y2+y2v2
令其为0,得
(
K
11
+
K
22
−
2
K
12
)
α
2
=
y
2
(
y
2
−
y
1
+
ς
K
11
−
ς
K
12
+
v
1
−
v
2
)
=
y
2
[
y
2
−
y
1
+
ς
K
11
−
ς
K
12
+
(
g
(
x
1
)
−
∑
j
=
1
2
α
j
y
j
K
1
j
−
b
)
−
(
g
(
x
2
)
−
∑
j
=
1
2
α
j
y
j
K
2
j
−
b
)
]
\left( K_{11} + K_{22} - 2 K_{12} \right) \alpha_{2} = y_{2} \left( y_{2} - y_{1} + \varsigma K_{11} - \varsigma K_{12} + v_{1} - v_{2} \right) \\ \quad\quad\quad\quad\quad\quad\quad\quad = y_{2} \left[ y_{2} - y_{1} + \varsigma K_{11} - \varsigma K_{12} + \left( g \left( x_{1} \right) - \sum_{j=1}^{2}\alpha_{j} y_{j} K_1j - b \right) \\ - \left( g \left( x_{2} \right) - \sum_{j=1}^{2}\alpha_{j} y_{j} K_2j - b \right) \right]
(K11+K22−2K12)α2=y2(y2−y1+ςK11−ςK12+v1−v2)=y2[y2−y1+ςK11−ςK12+(g(x1)−j=1∑2αjyjK1j−b)−(g(x2)−j=1∑2αjyjK2j−b)]
将 ς = α 1 o l d y 1 + α 2 o l d y 2 \varsigma = \alpha_{1}^{old} y_{1} + \alpha_{2}^{old} y_{2} ς=α1oldy1+α2oldy2代入,得 ( K 11 + K 22 − 2 K 12 ) α 2 n e w , u n c = y 2 ( ( K 11 + K 22 − 2 K 12 ) α 2 o l d y 2 + y 2 − y 1 + g ( x 1 ) − g ( x 2 ) ) = ( K 11 + K 22 − 2 K 12 ) α 2 o l d + y 2 ( E 1 − E 2 ) \begin{aligned} \\ & \left( K_{11} + K_{22} - 2 K_{12} \right) \alpha_{2}^{new,unc} = y_{2} \left( \left( K_{11} + K_{22} - 2 K_{12} \right) \alpha_{2}^{old} y_{2} + y_{2} - y_{1} + g \left( x_{1} \right) - g \left( x_{2} \right) \right) \\ & \quad\quad\quad\quad\quad\quad\quad\quad\quad\quad = \left( K_{11} + K_{22} - 2 K_{12} \right) \alpha_{2}^{old} + y_{2} \left( E_{1} - E_{2} \right) \end{aligned} (K11+K22−2K12)α2new,unc=y2((K11+K22−2K12)α2oldy2+y2−y1+g(x1)−g(x2))=(K11+K22−2K12)α2old+y2(E1−E2)
令 η = K 11 + K 22 − 2 K 12 \eta = K_{11} + K_{22} - 2 K_{12} η=K11+K22−2K12代入,得 α 2 n e w , u n c = α 2 o l d + y 2 ( E 1 − E 2 ) η \begin{aligned} \\ & \alpha_{2}^{new,unc} = \alpha_{2}^{old} + \dfrac{y_{2} \left( E_{1} - E_{2} \right)}{\eta}\end{aligned} α2new,unc=α2old+ηy2(E1−E2)
计算阈值 b b b和差值 E i E_i Ei
由分量 0 < α 1 n e w < C 0 \lt \alpha_{1}^{new} \lt C 0<α1new<C,则 b 1 n e w = y 1 − ∑ i = 3 N α i y i K i 1 − α 1 n e w y 1 K 11 − α 2 n e w y 2 K 21 \begin{aligned} & b_1^{new} = y_{1} - \sum_{i=3}^{N} \alpha_{i} y_{i} K_{i1} - \alpha_{1}^{new} y_{1} K_{11} - \alpha_{2}^{new} y_{2} K_{21} \end{aligned} b1new=y1−i=3∑NαiyiKi1−α1newy1K11−α2newy2K21
由 E 1 = g ( x 1 ) − y 1 = ( ∑ j = 1 N α j y j K i j + b ) − y 1 = ∑ i = 3 N α i y i K i 1 + α 1 o l d y 1 K 11 + α 2 o l d y 2 K 21 + b o l d − y 1 \begin{aligned} & E_{1} = g \left( x_{1} \right) - y_{1} = \left( \sum_{j=1}^{N} \alpha_{j} y_{j} K_{ij} + b \right) - y_{1} \\ & = \sum_{i=3}^{N} \alpha_{i} y_{i} K_{i1} + \alpha_{1}^{old} y_{1} K_{11} + \alpha_{2}^{old} y_{2} K_{21} + b^{old} - y_{1} \end{aligned} E1=g(x1)−y1=(j=1∑NαjyjKij+b)−y1=i=3∑NαiyiKi1+α1oldy1K11+α2oldy2K21+bold−y1
则 y 1 − ∑ i = 3 N α i y i K i 1 = − E 1 + α 1 o l d y 1 K 11 + α 2 o l d y 2 K 21 + b o l d \begin{aligned} & y_{1} - \sum_{i=3}^{N} \alpha_{i} y_{i} K_{i1} = -E_{1} + \alpha_{1}^{old} y_{1} K_{11} + \alpha_{2}^{old} y_{2} K_{21} + b^{old} \end{aligned} y1−i=3∑NαiyiKi1=−E1+α1oldy1K11+α2oldy2K21+bold
代入,得 b 1 n e w = − E 1 + y 1 K 11 ( α 1 n e w − α 1 o l d ) − y 2 K 21 ( α 2 n e w − α 2 o l d ) + b o l d \begin{aligned} & b_1^{new} = -E_{1} + y_{1} K_{11} \left( \alpha_{1}^{new} - \alpha_{1}^{old} \right) - y_{2} K_{21} \left( \alpha_{2}^{new} - \alpha_{2}^{old} \right) + b^{old} \end{aligned} b1new=−E1+y1K11(α1new−α1old)−y2K21(α2new−α2old)+bold
同理,得 b 2 n e w = − E 2 + y 1 K 12 ( α 1 n e w − α 1 o l d ) − y 2 K 22 ( α 2 n e w − α 2 o l d ) + b o l d \begin{aligned} \\ & b_2^{new} = -E_{2} + y_{1} K_{12} \left( \alpha_{1}^{new} - \alpha_{1}^{old} \right) - y_{2} K_{22} \left( \alpha_{2}^{new} - \alpha_{2}^{old} \right) + b^{old} \end{aligned} b2new=−E2+y1K12(α1new−α1old)−y2K22(α2new−α2old)+bold
如果
α
1
n
e
w
,
α
2
n
e
w
\alpha_{1}^{new}, \alpha_{2}^{new}
α1new,α2new满足
0
<
α
i
n
e
w
<
C
,
i
=
1
,
2
0 \lt \alpha_{i}^{new} \lt C, i = 1, 2
0<αinew<C,i=1,2,
则
b
n
e
w
=
b
1
n
e
w
=
b
2
n
e
w
\begin{aligned} & b^{new} = b_{1}^{new} = b_{2}^{new}\end{aligned}
bnew=b1new=b2new
否则 b n e w = b 1 n e w + b 2 n e w 2 \begin{aligned} & b^{new} = \dfrac{b_{1}^{new} + b_{2}^{new}}{2} \end{aligned} bnew=2b1new+b2new
更新
E
i
E_{i}
Ei
E
i
n
e
w
=
∑
S
y
j
α
j
K
(
x
i
,
x
j
)
+
b
n
e
w
−
y
i
\begin{aligned} & E_{i}^{new} = \sum_{S} y_{j} \alpha_{j} K_{ \left( x_{i}, x_{j} \right)} + b^{new} - y_{i} \end{aligned}
Einew=S∑yjαjK(xi,xj)+bnew−yi
其中, S S S是所有支持向量 x j x_{j} xj的集合。
SMO算法
SMO算法:
- 输入:训练数据集 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , ⋯ , ( x N , y N ) } T = \left\{ \left( x_{1}, y_{1} \right), \left( x_{2}, y_{2} \right), \cdots, \left( x_{N}, y_{N} \right) \right\} T={(x1,y1),(x2,y2),⋯,(xN,yN)},其中 x i ∈ X = R n , y i ∈ Y = { + 1 , − 1 } , i = 1 , 2 , ⋯ , N x_{i} \in \mathcal{X} = R^{n}, y_{i} \in \mathcal{Y} = \left\{ +1, -1 \right\}, i = 1, 2, \cdots, N xi∈X=Rn,yi∈Y={+1,−1},i=1,2,⋯,N,精度 ε \varepsilon ε;
- 输出:近似解 α ^ \hat \alpha α^
- 取初始值 α 0 = 0 \alpha^{0} = 0 α0=0,令 k = 0 k = 0 k=0;
- 选取优化变量
α
1
(
k
)
,
α
2
(
k
)
\alpha_{1}^{\left( k \right)},\alpha_{2}^{\left( k \right)}
α1(k),α2(k),求解
min
α
1
,
α
2
W
(
α
1
,
α
2
)
=
1
2
K
11
α
1
2
+
1
2
K
22
α
2
2
+
y
1
y
2
K
12
α
1
α
2
−
(
α
1
+
α
2
)
+
y
1
α
1
∑
i
=
3
N
y
i
α
i
K
i
1
+
y
2
α
2
∑
i
=
3
N
y
i
α
i
K
i
2
s
.
t
.
α
1
+
α
2
=
−
∑
i
=
3
N
α
i
y
i
=
ς
0
≤
α
i
≤
C
,
i
=
1
,
2
\begin{aligned} \\ & \min_{\alpha_{1}, \alpha_{2}} W \left( \alpha_{1}, \alpha_{2} \right) = \dfrac{1}{2} K_{11} \alpha_{1}^{2} + \dfrac{1}{2} K_{22} \alpha_{2}^{2} + y_{1} y_{2} K_{12} \alpha_{1} \alpha_{2} \\ & \quad\quad\quad\quad\quad\quad - \left( \alpha_{1} + \alpha_{2} \right) + y_{1} \alpha_{1} \sum_{i=3}^{N} y_{i} \alpha_{i} K_{i1} + y_{2} \alpha_{2} \sum_{i=3}^{N} y_{i} \alpha_i K_{i2} \\ & s.t. \quad \alpha_{1} + \alpha_{2} = -\sum_{i=3}^{N} \alpha_{i} y_{i} = \varsigma \\ & 0 \leq \alpha_{i} \leq C , \quad i=1,2 \end{aligned}
α1,α2minW(α1,α2)=21K11α12+21K22α22+y1y2K12α1α2−(α1+α2)+y1α1i=3∑NyiαiKi1+y2α2i=3∑NyiαiKi2s.t.α1+α2=−i=3∑Nαiyi=ς0≤αi≤C,i=1,2
求得最优解 α 1 ( k + 1 ) , α 2 ( k + 1 ) \alpha_{1}^{\left( k+1 \right)},\alpha_{2}^{\left( k+1 \right)} α1(k+1),α2(k+1),更新 α \alpha α为 α ( k + 1 ) \alpha^{\left( k+1 \right)} α(k+1); - 若在精度
ε
\varepsilon
ε范围内满足停机条件
∑
i
=
1
N
α
i
y
i
=
0
0
≤
α
i
≤
C
,
i
=
1
,
2
,
⋯
,
N
y
i
⋅
g
(
x
i
)
=
{
⩾
1
,
{
x
i
∣
α
i
=
0
}
=
1
,
{
x
i
∣
0
<
α
i
<
C
}
⩽
1
,
{
x
i
∣
α
i
=
C
}
\begin{aligned} & \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & 0 \leq \alpha_{i} \leq C, i = 1, 2, \cdots, N \\ & \end{aligned} \\ y_{i} \cdot g\left(x_{i}\right)=\left\{\begin{array}{ll} \geqslant 1, & \left\{x_{i} \mid \alpha_{i}=0\right\} \\ =1, & \left\{x_{i} \mid 0<\alpha_{i}<C\right\} \\ \leqslant 1, & \left\{x_{i} \mid \alpha_{i}=C\right\} \end{array}\right.
i=1∑Nαiyi=00≤αi≤C,i=1,2,⋯,Nyi⋅g(xi)=⎩⎨⎧⩾1,=1,⩽1,{xi∣αi=0}{xi∣0<αi<C}{xi∣αi=C}则转4.;否则令
k
=
k
+
1
k = k + 1
k=k+1,转2.;
4.取 α ^ = α ( k + 1 ) \hat \alpha = \alpha^{\left( k + 1 \right)} α^=α(k+1)。
5、概要总结
1.支持向量机最简单的情况是线性可分支持向量机,或硬间隔支持向量机。构建它的条件是训练数据线性可分。其学习策略是最大间隔法。可以表示为凸二次规划问题,其原始最优化问题为
min
w
,
b
1
2
∥
w
∥
2
\min _{w, b} \frac{1}{2}\|w\|^{2}
w,bmin21∥w∥2
s
.
t
.
y
i
(
w
⋅
x
i
+
b
)
−
1
⩾
0
,
i
=
1
,
2
,
⋯
,
N
s.t. \quad y_{i}\left(w \cdot x_{i}+b\right)-1 \geqslant 0, \quad i=1,2, \cdots, N
s.t.yi(w⋅xi+b)−1⩾0,i=1,2,⋯,N
求得最优化问题的解为
w
∗
w^*
w∗,
b
∗
b^*
b∗,得到线性可分支持向量机,分离超平面是
w
∗
⋅
x
+
b
∗
=
0
w^{*} \cdot x+b^{*}=0
w∗⋅x+b∗=0
分类决策函数是
f
(
x
)
=
sign
(
w
∗
⋅
x
+
b
∗
)
f(x)=\operatorname{sign}\left(w^{*} \cdot x+b^{*}\right)
f(x)=sign(w∗⋅x+b∗)
最大间隔法中,函数间隔与几何间隔是重要的概念。
线性可分支持向量机的最优解存在且唯一。位于间隔边界上的实例点为支持向量。最优分离超平面由支持向量完全决定。 二次规划问题的对偶问题是 min 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) − ∑ i = 1 N α i \min \frac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j}\left(x_{i} \cdot x_{j}\right)-\sum_{i=1}^{N} \alpha_{i} min21i=1∑Nj=1∑Nαiαjyiyj(xi⋅xj)−i=1∑Nαi
s
.
t
.
∑
i
=
1
N
α
i
y
i
=
0
s.t. \quad \sum_{i=1}^{N} \alpha_{i} y_{i}=0
s.t.i=1∑Nαiyi=0
α
i
⩾
0
,
i
=
1
,
2
,
⋯
,
N
\alpha_{i} \geqslant 0, \quad i=1,2, \cdots, N
αi⩾0,i=1,2,⋯,N
通常,通过求解对偶问题学习线性可分支持向量机,即首先求解对偶问题的最优值
a ∗ a^* a∗,然后求最优值 w ∗ w^* w∗和 b ∗ b^* b∗,得出分离超平面和分类决策函数。
2.现实中训练数据是线性可分的情形较少,训练数据往往是近似线性可分的,这时使用线性支持向量机,或软间隔支持向量机。线性支持向量机是最基本的支持向量机。
对于噪声或例外,通过引入松弛变量 ξ i \xi_{\mathrm{i}} ξi,使其“可分”,得到线性支持向量机学习的凸二次规划问题,其原始最优化问题是
min
w
,
b
,
ξ
1
2
∥
w
∥
2
+
C
∑
i
=
1
N
ξ
i
\min _{w, b, \xi} \frac{1}{2}\|w\|^{2}+C \sum_{i=1}^{N} \xi_{i}
w,b,ξmin21∥w∥2+Ci=1∑Nξi
s
.
t
.
y
i
(
w
⋅
x
i
+
b
)
⩾
1
−
ξ
i
,
i
=
1
,
2
,
⋯
,
N
s.t. \quad y_{i}\left(w \cdot x_{i}+b\right) \geqslant 1-\xi_{i}, \quad i=1,2, \cdots, N
s.t.yi(w⋅xi+b)⩾1−ξi,i=1,2,⋯,N
ξ
i
⩾
0
,
i
=
1
,
2
,
⋯
,
N
\xi_{i} \geqslant 0, \quad i=1,2, \cdots, N
ξi⩾0,i=1,2,⋯,N
求解原始最优化问题的解
w
∗
w^*
w∗和
b
∗
b^*
b∗,得到线性支持向量机,其分离超平面为
w
∗
⋅
x
+
b
∗
=
0
w^{*} \cdot x+b^{*}=0
w∗⋅x+b∗=0
分类决策函数为
f
(
x
)
=
sign
(
w
∗
⋅
x
+
b
∗
)
f(x)=\operatorname{sign}\left(w^{*} \cdot x+b^{*}\right)
f(x)=sign(w∗⋅x+b∗)
线性可分支持向量机的解
w
∗
w^*
w∗唯一但
b
∗
b^*
b∗不唯一。对偶问题是
min
α
1
2
∑
i
=
1
N
∑
j
=
1
N
α
i
α
j
y
i
y
j
(
x
i
⋅
x
j
)
−
∑
i
=
1
N
α
i
\min _{\alpha} \frac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j}\left(x_{i} \cdot x_{j}\right)-\sum_{i=1}^{N} \alpha_{i}
αmin21i=1∑Nj=1∑Nαiαjyiyj(xi⋅xj)−i=1∑Nαi
s
.
t
.
∑
i
=
1
N
α
i
y
i
=
0
s.t. \quad \sum_{i=1}^{N} \alpha_{i} y_{i}=0
s.t.i=1∑Nαiyi=0
0
⩽
α
i
⩽
C
,
i
=
1
,
2
,
⋯
,
N
0 \leqslant \alpha_{i} \leqslant C, \quad i=1,2, \cdots, N
0⩽αi⩽C,i=1,2,⋯,N
线性支持向量机的对偶学习算法,首先求解对偶问题得到最优解
α
∗
\alpha^*
α∗,然后求原始问题最优解
w
∗
w^*
w∗和
b
∗
b^*
b∗,得出分离超平面和分类决策函数。
对偶问题的解 α ∗ \alpha^* α∗中满 α i ∗ > 0 \alpha_{i}^{*}\gt0 αi∗>0的实例点 x i x_i xi称为支持向量。支持向量可在间隔边界上,也可在间隔边界与分离超平面之间,或者在分离超平面误分一侧。最优分离超平面由支持向量完全决定。
线性支持向量机学习等价于最小化二阶范数正则化的合页函数
∑ i = 1 N [ 1 − y i ( w ⋅ x i + b ) ] + + λ ∥ w ∥ 2 \sum_{i=1}^{N}\left[1-y_{i}\left(w \cdot x_{i}+b\right)\right]_{+}+\lambda\|w\|^{2} i=1∑N[1−yi(w⋅xi+b)]++λ∥w∥2
3.非线性支持向量机
对于输入空间中的非线性分类问题,可以通过非线性变换将它转化为某个高维特征空间中的线性分类问题,在高维特征空间中学习线性支持向量机。由于在线性支持向量机学习的对偶问题里,目标函数和分类决策函数都只涉及实例与实例之间的内积,所以不需要显式地指定非线性变换,而是用核函数来替换当中的内积。核函数表示,通过一个非线性转换后的两个实例间的内积。具体地, K ( x , z ) K(x,z) K(x,z)是一个核函数,或正定核,意味着存在一个从输入空间x到特征空间的映射 X → H \mathcal{X} \rightarrow \mathcal{H} X→H,对任意 X \mathcal{X} X,有
K ( x , z ) = ϕ ( x ) ⋅ ϕ ( z ) K(x, z)=\phi(x) \cdot \phi(z) K(x,z)=ϕ(x)⋅ϕ(z)
对称函数 K ( x , z ) K(x,z) K(x,z)为正定核的充要条件如下:对任意 x i ∈ X , i = 1 , 2 , … , m \mathrm{x}_{\mathrm{i}} \in \mathcal{X}, \quad \mathrm{i}=1,2, \ldots, \mathrm{m} xi∈X,i=1,2,…,m,任意正整数 m m m,对称函数 K ( x , z ) K(x,z) K(x,z)对应的Gram矩阵是半正定的。
所以,在线性支持向量机学习的对偶问题中,用核函数 K ( x , z ) K(x,z) K(x,z)替代内积,求解得到的就是非线性支持向量机
f ( x ) = sign ( ∑ i = 1 N α i ∗ y i K ( x , x i ) + b ∗ ) f(x)=\operatorname{sign}\left(\sum_{i=1}^{N} \alpha_{i}^{*} y_{i} K\left(x, x_{i}\right)+b^{*}\right) f(x)=sign(i=1∑Nαi∗yiK(x,xi)+b∗)
4.SMO算法
SMO算法是支持向量机学习的一种快速算法,其特点是不断地将原二次规划问题分解为只有两个变量的二次规划子问题,并对子问题进行解析求解,直到所有变量满足KKT条件为止。这样通过启发式的方法得到原二次规划问题的最优解。因为子问题有解析解,所以每次计算子问题都很快,虽然计算子问题次数很多,但在总体上还是高效的。