硬间隔支持向量机(下)

1. 支持向量机

1.1 支持向量机模型

        给定一个线性可分数据集 D = { ( x i , y i ) ∣ x i ∈ R n , y i ∈ { − 1 , 1 } } \mathcal{D}=\{(x_i,y_i)|x_i \in \mathcal{R^n}, y_i \in \{-1,1\}\} D={(xiyi)xiRnyi{11}},一个法向量 w ⃗ \vec{w} w 和偏置 b b b ,则记 M M M 为示例的最小几何间隔。
M = min ⁡ i = 1 , … , m γ i = min ⁡ i = 1 , … , m y i ( w ⃗ T ∣ ∣ w ⃗ ∣ ∣ x i ⃗ + b ∣ ∣ w ⃗ ∣ ∣ ) (1-5) M=\min_{i=1,\dots,m}\gamma_i = \min_{i=1,\dots,m}y_i(\frac{\vec{w}^T}{||\vec{w}||}\vec{x_i} + \frac{b}{||\vec{w}||}) \tag{1-5} M=i=1,,mminγi=i=1,,mminyi(∣∣w ∣∣w Txi +∣∣w ∣∣b)(1-5)
通过公式 1-2 可知,SVM 优化问题可以写成下面形式:
max ⁡ w ⃗ , b M s . t . γ i ≥ M , i = 1 , ⋯   , m (1-6) \max_{\vec{w},b} M \\ s.t. \gamma_i \geq M,i=1, \cdots, m \tag{1-6} w ,bmaxMs.t.γiMi=1,,m(1-6)
这个约束条件保证了每个样本点的几何间隔只少为 M M M,即数据集 D \mathcal{D} D 是线性可分的,具体来说:

  1. 正样本( y i = 1 y_i=1 yi=1):对于正样本,这个条件就变为 w ⃗ T x i ⃗ + b ≥ M ∣ ∣ w ⃗ ∣ ∣ \vec{w}^T\vec{x_i} + b \geq M||\vec{w}|| w Txi +bM∣∣w ∣∣。这确保了正样本在超平面的正侧,并且距离超平面至少为 M M M
  2. 负样本( y i = − 1 y_i=-1 yi=1):对于负样本,这个条件就变为 w ⃗ T x i ⃗ + b ≤ − M ∣ ∣ w ⃗ ∣ ∣ \vec{w}^T\vec{x_i} + b \leq -M||\vec{w}|| w Txi +bM∣∣w ∣∣。这确保了负样本在超平面负侧,并且距离超平面至少为 M M M

        根据比率不变性,我们可以令 M ∣ ∣ w ⃗ ∣ ∣ = 1 M||\vec{w}|| = 1 M∣∣w ∣∣=1,原优化问题的约束条件依然不变(数据线性可分),简化后的方程如下:
max ⁡ w ⃗ , b 1 ∣ ∣ w ⃗ ∣ ∣ s . t . y i ( w ⃗ T x i ⃗ + b ) ≥ 1 , i = 1 , ⋯   , m \max_{\vec{w},b}\frac{1}{{||\vec{w}||}} \\ s.t.\quad y_i(\vec{w}^T\vec{x_i} + b) \geq 1,i=1, \cdots, m w ,bmax∣∣w ∣∣1s.t.yi(w Txi +b)1i=1,,m
在运筹学中,一般将最大值优化问题转化最小值优化问题,即
min ⁡ w ⃗ , b 1 2 ∣ ∣ w ⃗ ∣ ∣ 2 s . t . 1 − y i ( w ⃗ T x i ⃗ + b ) ≤ 0 , i = 1 , ⋯   , m (1-7) \min_{\vec{w},b}\frac{1}{2}||\vec{w}||^2 \\ s.t. \quad 1 - y_i(\vec{w}^T\vec{x_i} + b) \leq 0,i=1, \cdots, m \tag{1-7} w ,bmin21∣∣w 2s.t.1yi(w Txi +b)0i=1,,m(1-7)
其中,目标函数乘以 1 2 \frac{1}{2} 21 为了简化优化过程。很显然,该优化为凸优化问题,更具体,它是一个二次优化问题 - 目标函数是二次函数,约束条件是线性函数。这个优化问题可以使用现成的二次规划(Quadratic Programming,QP)优化器进行求解。

1. 2 模型求解

1.2.1 原问题转换为Lagrange对偶问题

第一步: 固定 λ \lambda λ,让 L \mathscr{L} L 关于 w ⃗ \vec{w} w b b b 最小化。
        构建Lagrange函数:
L = 1 2 ∣ ∣ w ⃗ ∣ ∣ 2 + ∑ i = 1 m λ i { 1 − y i ( w ⃗ T x i ⃗ + b ) } (1-8) \mathscr{L}=\frac{1}{2}||\vec{w}||^2 + \sum_{i=1}^{m}\lambda_i\{1 - y_i(\vec{w}^T\vec{x_i} + b)\} \tag{1-8} L=21∣∣w 2+i=1mλi{1yi(w Txi +b)}(1-8)
解:
L = 1 2 w ⃗ T w ⃗ + ∑ i = 1 m λ i { 1 − y i ( w ⃗ T x i ⃗ + b ) } = 1 2 w ⃗ T w ⃗ + ∑ i = 1 m λ i − ∑ i = 1 m λ i y i w ⃗ T x i ⃗ − ∑ i = 1 m λ i y i b \begin{align*} \mathscr{L}&=\frac{1}{2}\vec{w}^T\vec{w} + \sum_{i=1}^{m}\lambda_i\{1 - y_i(\vec{w}^T\vec{x_i} + b)\} \\ &= \frac{1}{2}\vec{w}^T\vec{w} + \sum_{i=1}^{m}\lambda_i - \sum_{i=1}^{m}\lambda_iy_i\vec{w}^T\vec{x_i} - \sum_{i=1}^{m}\lambda_iy_ib \end{align*} L=21w Tw +i=1mλi{1yi(w Txi +b)}=21w Tw +i=1mλii=1mλiyiw Txi i=1mλiyib
w ⃗ , b \vec{w},b w b求偏导,并令其等于0,可得
∂ L ∂ w ⃗ = w ⃗ − ∑ i = 1 m λ i y i x i ⃗ = 0 ∂ L ∂ b = ∑ i = 1 m λ i y i = 0 \frac{\partial{\mathscr{L}}}{\partial{\vec{w}}} = \vec{w} - \sum_{i=1}^{m}\lambda_iy_i\vec{x_i} = 0 \\ \frac{\partial{\mathscr{L}}}{\partial{b}} = \sum_{i=1}^{m}\lambda_iy_i = 0 w L=w i=1mλiyixi =0bL=i=1mλiyi=0

w ⃗ = ∑ i = 1 m λ i y i x i ⃗ ∑ i = 1 m λ i y i = 0 \vec{w} = \sum_{i=1}^{m}\lambda_iy_i\vec{x_i} \\ \sum_{i=1}^{m}\lambda_iy_i = 0 w =i=1mλiyixi i=1mλiyi=0
将以上结果代回式(1-8),可得
L = 1 2 w ⃗ T w ⃗ + ∑ i = 1 m λ i − ∑ i = 1 m λ i y i w ⃗ T x i ⃗ − ∑ i = 1 m λ i y i b = 1 2 w ⃗ T ∑ i = 1 m λ i y i x i ⃗ − w ⃗ T ∑ i = 1 m λ i y i x i ⃗ + ∑ i = 1 m λ i = ∑ i = 1 m λ i − 1 2 w ⃗ T ∑ i = 1 m λ i y i x i ⃗ = ∑ i = 1 m λ i − 1 2 ( ∑ i = 1 m λ i y i x i ⃗ ) T ∑ i = 1 m λ i y i x i ⃗ = ∑ i = 1 m λ i − 1 2 ∑ i = 1 , j = 1 m λ i λ j y i x i ⃗ T y j x j ⃗ (1-9) \begin{align} \mathscr{L}&=\frac{1}{2}\vec{w}^T\vec{w} + \sum_{i=1}^{m}\lambda_i - \sum_{i=1}^{m}\lambda_iy_i\vec{w}^T\vec{x_i} - \sum_{i=1}^{m}\lambda_iy_ib \\ &=\frac{1}{2}\vec{w}^T\sum_{i=1}^{m}\lambda_iy_i\vec{x_i} - \vec{w}^T \sum_{i=1}^{m}\lambda_iy_i\vec{x_i} + \sum_{i=1}^{m}\lambda_i \\ &=\sum_{i=1}^{m}\lambda_i - \frac{1}{2}\vec{w}^T\sum_{i=1}^{m}\lambda_iy_i\vec{x_i} \\ &=\sum_{i=1}^{m}\lambda_i - \frac{1}{2}(\sum_{i=1}^{m}\lambda_iy_i\vec{x_i})^T\sum_{i=1}^{m}\lambda_iy_i\vec{x_i} \\ &=\sum_{i=1}^{m}\lambda_i - \frac{1}{2}\sum_{i=1,j=1}^{m}\lambda_i\lambda_jy_i\vec{x_i}^Ty_j\vec{x_j} \end{align} \tag{1-9} L=21w Tw +i=1mλii=1mλiyiw Txi i=1mλiyib=21w Ti=1mλiyixi w Ti=1mλiyixi +i=1mλi=i=1mλi21w Ti=1mλiyixi =i=1mλi21(i=1mλiyixi )Ti=1mλiyixi =i=1mλi21i=1j=1mλiλjyixi Tyjxj (1-9)
从上式可以看出,使用Lagrange定理求解凸优化问题可以使用一个对偶变量表示,转换为对偶问题后,通常比原问题更加容易处理。

1.2.2 Lagrange对偶问题求解

第二步: 证明极大值存在,求对 λ \lambda λ 的极大,即
max ⁡ λ ∑ i = 1 m λ i − 1 2 ∑ i = 1 , j = 1 m λ i λ j y i x i ⃗ T y j x j ⃗ s . t . { λ i ≥ 0 , i = 1 , ⋯   , m ∑ i = 1 m λ i y i = 0 \max_{\lambda}{\sum_{i=1}^{m}\lambda_i - \frac{1}{2}\sum_{i=1, j=1}^{m}\lambda_i\lambda_jy_i\vec{x_i}^Ty_j\vec{x_j}} \\ s.t. \begin{cases} \lambda_i \geq 0,\quad\quad\quad\quad i=1, \cdots, m \\ \sum_{i=1}^{m}\lambda_iy_i = 0 \end{cases} λmaxi=1mλi21i=1,j=1mλiλjyixi Tyjxj s.t.{λi0i=1,,mi=1mλiyi=0
其矩阵形式为:
W ( λ ) = ∑ i = 1 m λ i − 1 2 ∑ i = 1 , j = 1 m λ i λ j y i y j x i ⃗ T x j ⃗ = [ 1 , 1 , ⋯ , 1 ] [ λ 1 λ 2 ⋮ λ m ] − 1 2 [ λ 1 , λ 2 , ⋯ , λ m ] [ y 1 y 1 x 1 ⃗ T x 1 ⃗ y 1 y 2 x 1 ⃗ T x 2 ⃗ ⋯ y 1 y m x 1 ⃗ T x m ⃗ y 2 y 1 x 2 ⃗ T x 1 ⃗ y 2 y 2 x 2 ⃗ T x 2 ⃗ ⋯ y 2 y m x 2 ⃗ T x m ⃗ ⋮ ⋮ ⋱ ⋮ y m y 1 x m ⃗ T x m ⃗ y m y 2 x m ⃗ T x 2 ⃗ ⋯ y m y m x m ⃗ T x m ⃗ ] [ λ 1 λ 2 ⋮ λ m ] \begin{align*} W(\lambda) &= {\sum_{i=1}^{m}\lambda_i - \frac{1}{2}\sum_{i=1, j=1}^{m}\lambda_i\lambda_jy_iy_j\vec{x_i}^T\vec{x_j}} \\ &= \begin{bmatrix} 1,1,\cdots,1 \end{bmatrix} \begin{bmatrix} \lambda_1 \\ \lambda_2 \\ \vdots \\\lambda_m \end{bmatrix}- \frac{1}{2} \begin{bmatrix} \lambda_1,\lambda_2,\cdots,\lambda_m \end{bmatrix} \begin{bmatrix} y_1y_1\vec{x_1}^T\vec{x_1} & y_1y_2\vec{x_1}^T\vec{x_2} & \cdots & y_1y_m\vec{x_1}^T\vec{x_m} \\ y_2y_1\vec{x_2}^T\vec{x_1} & y_2y_2\vec{x_2}^T\vec{x_2} & \cdots & y_2y_m\vec{x_2}^T\vec{x_m} \\ {\vdots}&{\vdots}&{\ddots}&{\vdots}\\ y_my_1\vec{x_m}^T\vec{x_m} & y_my_2\vec{x_m}^T\vec{x_2} & \cdots & y_my_m\vec{x_m}^T\vec{x_m} \end{bmatrix} \begin{bmatrix} \lambda_1 \\ \lambda_2 \\ \vdots \\\lambda_m \end{bmatrix} \end{align*} W(λ)=i=1mλi21i=1,j=1mλiλjyiyjxi Txj =[111] λ1λ2λm 21[λ1λ2λm] y1y1x1 Tx1 y2y1x2 Tx1 ymy1xm Txm y1y2x1 Tx2 y2y2x2 Tx2 ymy2xm Tx2 y1ymx1 Txm y2ymx2 Txm ymymxm Txm λ1λ2λm
λ ⃗ = [ λ 1 λ 2 ⋮ λ m ] \vec{\lambda}= \begin{bmatrix}\lambda_1 \\ \lambda_2 \\ \vdots \\\lambda_m\end{bmatrix} λ = λ1λ2λm M = [ y 1 y 1 x 1 ⃗ T x 1 ⃗ y 1 y 2 x 1 ⃗ T x 2 ⃗ ⋯ y 1 y m x 1 ⃗ T x m ⃗ y 2 y 1 x 2 ⃗ T x 1 ⃗ y 2 y 2 x 2 ⃗ T x 2 ⃗ ⋯ y 2 y m x 2 ⃗ T x m ⃗ ⋮ ⋮ ⋱ ⋮ y m y 1 x m ⃗ T x m ⃗ y m y 2 x m ⃗ T x 2 ⃗ ⋯ y m y m x m ⃗ T x m ⃗ ] M= \begin{bmatrix} y_1y_1\vec{x_1}^T\vec{x_1} & y_1y_2\vec{x_1}^T\vec{x_2} & \cdots & y_1y_m\vec{x_1}^T\vec{x_m} \\ y_2y_1\vec{x_2}^T\vec{x_1} & y_2y_2\vec{x_2}^T\vec{x_2} & \cdots & y_2y_m\vec{x_2}^T\vec{x_m} \\ {\vdots}&{\vdots}&{\ddots}&{\vdots}\\ y_my_1\vec{x_m}^T\vec{x_m} & y_my_2\vec{x_m}^T\vec{x_2} & \cdots & y_my_m\vec{x_m}^T\vec{x_m} \end{bmatrix} M= y1y1x1 Tx1 y2y1x2 Tx1 ymy1xm Txm y1y2x1 Tx2 y2y2x2 Tx2 ymy2xm Tx2 y1ymx1 Txm y2ymx2 Txm ymymxm Txm
W ( λ ) = λ ⃗ − 1 2 λ ⃗ T M λ ⃗ W(\lambda) = \vec{\lambda} - \frac{1}{2}\vec{\lambda}^TM\vec{\lambda} W(λ)=λ 21λ TMλ
要证明 W ( λ ) W(\lambda) W(λ)极大值存在,则要证明M负定。
(1)证明:
        令
H = [ x 1 ⃗ T x 1 ⃗ x 1 ⃗ T x 2 ⃗ ⋯ x 1 ⃗ T x m ⃗ x 2 ⃗ T x 1 ⃗ x 2 ⃗ T x 2 ⃗ ⋯ x 2 ⃗ T x m ⃗ ⋮ ⋮ ⋱ ⋮ x m ⃗ T x m ⃗ x m ⃗ T x 2 ⃗ ⋯ x m ⃗ T x m ⃗ ] H = \begin{bmatrix} \vec{x_1}^T\vec{x_1} & \vec{x_1}^T\vec{x_2} & \cdots & \vec{x_1}^T\vec{x_m} \\ \vec{x_2}^T\vec{x_1} & \vec{x_2}^T\vec{x_2} & \cdots & \vec{x_2}^T\vec{x_m} \\ {\vdots}&{\vdots}&{\ddots}&{\vdots}\\ \vec{x_m}^T\vec{x_m} & \vec{x_m}^T\vec{x_2} & \cdots & \vec{x_m}^T\vec{x_m} \end{bmatrix} H= x1 Tx1 x2 Tx1 xm Txm x1 Tx2 x2 Tx2 xm Tx2 x1 Txm x2 Txm xm Txm


W ( λ ) = 1 2 λ ⃗ T y ⃗ T H λ ⃗ y ⃗ − λ ⃗ W(\lambda) = \frac{1}{2}\vec{\lambda}^T\vec{y}^TH\vec{\lambda}\vec{y} - \vec{\lambda} W(λ)=21λ Ty THλ y λ
又因为 H H H 是基于向量 x i ⃗ \vec{x_i} xi 的内积矩阵,也称为Gram矩阵,即 H H H 为半正定矩阵,极小值存在。
(2)使用SMO求解极大值
        前面给出了支持向量机的对偶问题,接下来我们将使用顺序最小优化算法(Sequential Minimal Optimization,SMO)求解对偶问题。算法的核心思想是每次在优化变量中选择两个分量进行优化,让其他分量固定,针对这两个分量进行优化,且使它们满足KKT条件。每进行一次优化,优化参数越接近最优解,反复执行此步骤,直至所有变量满足KKT条件时,便可获得最优解。
        假设在某次优化过程中,选择的优化变量为 λ 1 , λ 2 \lambda_1,\lambda_2 λ1λ2,其他变量 λ 3 , λ 4 , ⋯ , λ m \lambda_3,\lambda_4,\cdots,\lambda_m λ3λ4λm 设为常数,则最优化问题变为:
min ⁡ λ W ( λ ) = 1 2 ∑ i = 1 , j = 1 m λ i λ j y i x i ⃗ T y j x j ⃗ − ∑ i = 1 m λ i s . t . { λ i ≥ 0 , i = 1 , 2 , ⋯   , m ∑ i = 1 m λ i y i = 0 (1-10) \min_{\lambda}W(\lambda) = {\frac{1}{2}\sum_{i=1, j=1}^{m}\lambda_i\lambda_jy_i\vec{x_i}^Ty_j\vec{x_j} - \sum_{i=1}^{m}\lambda_i} \tag{1-10}\\ s.t. \begin{cases} \lambda_i \geq 0,\quad\quad\quad\quad i=1, 2, \cdots, m \\ \sum_{i=1}^{m}\lambda_iy_i = 0 \end{cases} λminW(λ)=21i=1,j=1mλiλjyixi Tyjxj i=1mλis.t.{λi0i=1,2,,mi=1mλiyi=0(1-10)
为了书写方便,我们将优化变量为 λ 1 , λ 2 \lambda_1,\lambda_2 λ1λ2 ∑ i = 1 , j = 1 m λ i λ j y i x i ⃗ T y j x j ⃗ \sum_{i=1, j=1}^{m}\lambda_i\lambda_jy_i\vec{x_i}^Ty_j\vec{x_j} i=1,j=1mλiλjyixi Tyjxj 拆分出来,即
∑ i = 1 , j = 1 m λ i λ j y i y j x i ⃗ T x j ⃗ = ∑ i = 1 m ∑ j = 1 m λ i λ j y i y j x i ⃗ T x j ⃗ = λ 1 y 1 ∑ j = 1 m λ j y j x 1 ⃗ T x j ⃗ + λ 2 y 2 ∑ j = 1 m λ j y j x 2 ⃗ T x j ⃗ + ∑ i = 3 m ∑ j = 1 m λ i λ j y i y j x i ⃗ T x j ⃗ = ( λ 1 2 y 1 2 x 1 ⃗ T x 1 ⃗ + λ 1 λ 2 y 1 y 2 x 1 ⃗ T x 2 ⃗ + λ 1 y 1 ∑ j = 3 m λ j y j x 1 ⃗ T x j ⃗ ) + ( λ 1 λ 2 y 1 y 2 x 1 ⃗ T x 2 ⃗ + λ 2 2 y 2 2 x 2 ⃗ T x 2 ⃗ + λ 2 y 2 ∑ j = 3 m λ j y j x 2 ⃗ T x j ⃗ ) + ( λ 1 y 1 ∑ i = 3 m λ i y i x i ⃗ T x 1 ⃗ + λ 2 y 2 ∑ i = 3 m λ i y i x i ⃗ T x 2 ⃗ + ∑ i = 3 m ∑ j = 3 m λ i λ j y i y j x i ⃗ T x j ⃗ ) \begin{align*} \sum_{i=1, j=1}^{m}\lambda_i\lambda_jy_iy_j\vec{x_i}^T\vec{x_j} &= \sum_{i=1}^{m}\sum_{j=1}^{m}\lambda_i\lambda_jy_iy_j\vec{x_i}^T\vec{x_j} \\ &=\lambda_1y_1\sum_{j=1}^{m}\lambda_jy_j\vec{x_1}^T\vec{x_j} + \lambda_2y_2\sum_{j=1}^{m}\lambda_jy_j\vec{x_2}^T\vec{x_j} + \sum_{i=3}^{m}\sum_{j=1}^{m}\lambda_i\lambda_jy_iy_j\vec{x_i}^T\vec{x_j} \\ &=(\lambda_1^2y_1^2\vec{x_1}^T\vec{x_1} + \lambda_1\lambda_2y_1y_2\vec{x_1}^T\vec{x_2}+\lambda_1y_1\sum_{j=3}^{m}\lambda_jy_j\vec{x_1}^T\vec{x_j}) +(\lambda_1\lambda_2y_1y_2\vec{x_1}^T\vec{x_2} + \lambda_2^2y_2^2\vec{x_2}^T\vec{x_2}+\lambda_2y_2\sum_{j=3}^{m}\lambda_jy_j\vec{x_2}^T\vec{x_j}) + (\lambda_1y_1\sum_{i=3}^{m}\lambda_iy_i\vec{x_i}^T\vec{x_1} + \lambda_2y_2\sum_{i=3}^{m}\lambda_iy_i\vec{x_i}^T\vec{x_2} + \sum_{i=3}^{m}\sum_{j=3}^{m}\lambda_i\lambda_jy_iy_j\vec{x_i}^T\vec{x_j}) \end{align*} i=1,j=1mλiλjyiyjxi Txj =i=1mj=1mλiλjyiyjxi Txj =λ1y1j=1mλjyjx1 Txj +λ2y2j=1mλjyjx2 Txj +i=3mj=1mλiλjyiyjxi Txj =(λ12y12x1 Tx1 +λ1λ2y1y2x1 Tx2 +λ1y1j=3mλjyjx1 Txj )+(λ1λ2y1y2x1 Tx2 +λ22y22x2 Tx2 +λ2y2j=3mλjyjx2 Txj )+(λ1y1i=3mλiyixi Tx1 +λ2y2i=3mλiyixi Tx2 +i=3mj=3mλiλjyiyjxi Txj )
为了描述方便,我们去掉与 λ 1 , λ 2 \lambda_1,\lambda_2 λ1λ2无关的常数项,则上式可简化为:
λ 1 2 y 1 2 x 1 ⃗ T x 1 ⃗ + λ 2 2 y 2 2 x 2 ⃗ T x 2 ⃗ + 2 λ 1 λ 2 y 1 y 2 x 1 ⃗ T x 2 ⃗ + 2 λ 1 y 1 ∑ j = 3 m λ j y j x 1 ⃗ T x j ⃗ + + 2 λ 2 y 2 ∑ j = 3 m λ j y j x 2 ⃗ T x j ⃗ (1-11) \lambda_1^2y_1^2\vec{x_1}^T\vec{x_1} + \lambda_2^2y_2^2\vec{x_2}^T\vec{x_2}+ 2\lambda_1\lambda_2y_1y_2\vec{x_1}^T\vec{x_2}+2\lambda_1y_1\sum_{j=3}^{m}\lambda_jy_j\vec{x_1}^T\vec{x_j} + +2\lambda_2y_2\sum_{j=3}^{m}\lambda_jy_j\vec{x_2}^T\vec{x_j} \tag{1-11} λ12y12x1 Tx1 +λ22y22x2 Tx2 +2λ1λ2y1y2x1 Tx2 +2λ1y1j=3mλjyjx1 Txj ++2λ2y2j=3mλjyjx2 Txj (1-11)
则对偶优化问题变为:
min ⁡ λ 1 , λ 2 W ( λ 1 , λ 2 ) = 1 2 λ 1 2 x 1 ⃗ T x 1 ⃗ + 1 2 λ 2 2 x 2 ⃗ T x 2 ⃗ + λ 1 λ 2 y 1 y 2 x 1 ⃗ T x 2 ⃗ + λ 1 y 1 ∑ j = 3 m λ j y j x 1 ⃗ T x j ⃗ + + λ 2 y 2 ∑ j = 3 m λ j y j x 2 ⃗ T x j ⃗ − ( λ 1 + λ 2 ) s . t . { λ i ≥ 0 , i = 1 , 2 , ⋯   , m ∑ i = 1 m λ i y i = 0 \min_{\lambda_1,\lambda_2}W(\lambda_1,\lambda_2)=\frac{1}{2}\lambda_1^2\vec{x_1}^T\vec{x_1} + \frac{1}{2}\lambda_2^2\vec{x_2}^T\vec{x_2}+ \lambda_1\lambda_2y_1y_2\vec{x_1}^T\vec{x_2}+\lambda_1y_1\sum_{j=3}^{m}\lambda_jy_j\vec{x_1}^T\vec{x_j} + +\lambda_2y_2\sum_{j=3}^{m}\lambda_jy_j\vec{x_2}^T\vec{x_j} - (\lambda_1 + \lambda_2) \\ s.t. \begin{cases} \lambda_i \geq 0,\quad\quad\quad\quad i=1, 2, \cdots, m \\ \sum_{i=1}^{m}\lambda_iy_i = 0 \end{cases} λ1,λ2minW(λ1,λ2)=21λ12x1 Tx1 +21λ22x2 Tx2 +λ1λ2y1y2x1 Tx2 +λ1y1j=3mλjyjx1 Txj ++λ2y2j=3mλjyjx2 Txj (λ1+λ2)s.t.{λi0i=1,2,,mi=1mλiyi=0
根据约束条件 ∑ i = 1 m λ i y i = 0 \sum_{i=1}^{m}\lambda_iy_i = 0 i=1mλiyi=0可推导出:
λ 1 y 1 + λ 2 y 2 = − ∑ i = 3 m λ i y i \lambda_1y_1 + \lambda_2y_2 = -\sum_{i=3}^{m}\lambda_iy_i λ1y1+λ2y2=i=3mλiyi
又因为其他优化变量为常量,则上式可写为:
λ 1 y 1 + λ 2 y 2 = ζ (1-12) \lambda_1y_1 + \lambda_2y_2 = \zeta \tag{1-12} λ1y1+λ2y2=ζ(1-12)
通过式1-12可以看出, λ 1 , λ 2 \lambda_1,\lambda_2 λ1λ2位于一条直线上,由于 y i ∈ ( 1 , − 1 ) y_i \in (1,-1) yi(1,1),所以直线的斜率只能为 ± 1 \pm1 ±1。由约束条件 0 ≤ λ i ≤ C 0 \leq \lambda_i \leq C 0λiC可知, λ 1 , λ 2 \lambda_1,\lambda_2 λ1λ2在下图的直线上取值。
        对于以上优化问题,我们可以先忽略第一个约束条件 0 ≤ λ i ≤ C 0 \leq \lambda_i \leq C 0λiC,利用Lagrange乘数法,作为Lagrange函数:
L ( λ 1 , λ 2 , α ) = W ( λ 1 , λ 2 ) + α ( λ 1 y 1 + λ 2 y 2 − ζ ) \mathscr{L}(\lambda_1,\lambda_2,\alpha) = W(\lambda_1,\lambda_2) + \alpha(\lambda_1y_1 + \lambda_2y_2 - \zeta) L(λ1,λ2,α)=W(λ1,λ2)+α(λ1y1+λ2y2ζ)
求各偏导数,并令其为0,得:
∂ L ∂ λ 1 = λ 1 x 1 ⃗ T x 1 ⃗ + y 1 y 2 λ 2 x 1 ⃗ T x 2 ⃗ + y 1 ∑ j = 3 m λ j y j x 1 ⃗ T x j ⃗ − 1 + α y 1 = 0 ∂ L ∂ λ 2 = λ 2 x 2 ⃗ T x 2 ⃗ + y 1 y 2 λ 1 x 1 ⃗ T x 2 ⃗ + y 2 ∑ j = 3 m λ j y j x 1 ⃗ T x j ⃗ − 1 + α y 2 = 0 \frac{\partial{\mathscr{L}}}{\partial{\lambda_1}} = \lambda_1\vec{x_1}^T\vec{x_1}+y_1y_2\lambda_2\vec{x_1}^T\vec{x_2}+y_1\sum_{j=3}^{m}\lambda_jy_j\vec{x_1}^T\vec{x_j} - 1 + \alpha y_1=0 \\ \frac{\partial{\mathscr{L}}}{\partial{\lambda_2}} = \lambda_2\vec{x_2}^T\vec{x_2}+y_1y_2\lambda_1\vec{x_1}^T\vec{x_2}+y_2\sum_{j=3}^{m}\lambda_jy_j\vec{x_1}^T\vec{x_j} - 1 + \alpha y_2=0 λ1L=λ1x1 Tx1 +y1y2λ2x1 Tx2 +y1j=3mλjyjx1 Txj 1+αy1=0λ2L=λ2x2 Tx2 +y1y2λ1x1 Tx2 +y2j=3mλjyjx1 Txj 1+αy2=0
为了简化,令
v 1 = ∑ j = 3 m λ j y j x 1 ⃗ T x j ⃗ v 2 = ∑ j = 3 m λ j y j x 1 ⃗ T x j ⃗ v_1 = \sum_{j=3}^{m}\lambda_jy_j\vec{x_1}^T\vec{x_j}\\ v_2 = \sum_{j=3}^{m}\lambda_jy_j\vec{x_1}^T\vec{x_j} v1=j=3mλjyjx1 Txj v2=j=3mλjyjx1 Txj
对上述1式乘以 y 1 y_1 y1,上述2式乘以 y 2 y_2 y2,可得
∂ L ∂ λ 1 = y 1 λ 1 x 1 ⃗ T x 1 ⃗ + y 2 λ 2 x 1 ⃗ T x 2 ⃗ + v 1 − y 1 + α = 0 ∂ L ∂ λ 2 = y 2 λ 2 x 2 ⃗ T x 2 ⃗ + y 1 λ 1 x 1 ⃗ T x 2 ⃗ + v 2 − y 2 + α = 0 \frac{\partial{\mathscr{L}}}{\partial{\lambda_1}} = y_1\lambda_1\vec{x_1}^T\vec{x_1}+y_2\lambda_2\vec{x_1}^T\vec{x_2}+v_1 - y_1 + \alpha=0 \\ \frac{\partial{\mathscr{L}}}{\partial{\lambda_2}} = y_2\lambda_2\vec{x_2}^T\vec{x_2}+y_1\lambda_1\vec{x_1}^T\vec{x_2}+v_2 - y_2 + \alpha=0 λ1L=y1λ1x1 Tx1 +y2λ2x1 Tx2 +v1y1+α=0λ2L=y2λ2x2 Tx2 +y1λ1x1 Tx2 +v2y2+α=0
由上式可得
( v 1 − y 1 ) − ( v 2 − y 2 ) + ζ x 1 ⃗ T x 1 ⃗ − ζ x 1 ⃗ T x 2 ⃗ = y 2 λ 2 ( x 1 ⃗ T x 1 ⃗ + x 2 ⃗ T x 2 ⃗ − 2 x 1 ⃗ T x 2 ⃗ ) (1-13) (v_1-y_1)-(v_2-y_2) + \zeta\vec{x_1}^T\vec{x_1} - \zeta\vec{x_1}^T\vec{x_2} = y_2\lambda_2(\vec{x_1}^T\vec{x_1}+\vec{x_2}^T\vec{x_2}-2\vec{x_1}^T\vec{x_2}) \tag{1-13} (v1y1)(v2y2)+ζx1 Tx1 ζx1 Tx2 =y2λ2(x1 Tx1 +x2 Tx2 2x1 Tx2 )(1-13)
通过该式可以看出, 除 λ 2 \lambda_2 λ2以外,其余都是常数,因此, λ 2 \lambda_2 λ2可以由该等式计算。由于该等式比较复杂,不便于描述,因此需要进一步整理,使得解得形式更加简洁。
        在优化过程中,当前 λ \lambda λ 求得得超平面方程为:
g ( x ⃗ ) = ∑ i = 1 m λ i y i x i ⃗ T x ⃗ + b g(\vec{x}) = \sum_{i=1}^{m}\lambda_iy_i\vec{x_i}^T\vec{x} + b g(x )=i=1mλiyixi Tx +b
E ( x i ⃗ ) = g ( x i ⃗ ) − y i , i = 1 , 2 E(\vec{x_i}) = g(\vec{x_i}) - y_i,i=1,2 E(xi )=g(xi )yii=1,2 λ 1 o l d , λ 2 o l d \lambda_1^{old},\lambda_2^{old} λ1oldλ2old为优化前得值,根据约束条件可得
y 1 λ 1 o l d + y 2 λ 2 o l d = ζ (1-14) y_1\lambda_1^{old}+y_2\lambda_2^{old}=\zeta \tag{1-14} y1λ1old+y2λ2old=ζ(1-14)
观察 v 1 , v 2 v_1,v_2 v1v2 形式,发现
v 1 = g ( x 1 ⃗ ) − b − y 1 λ 1 o l d x 1 ⃗ T x 1 ⃗ − y 2 λ 2 o l d x 1 ⃗ T x 2 ⃗ (1-15) v_1 =g(\vec{x_1}) - b - y_1\lambda_1^{old}\vec{x_1}^T\vec{x_1} - y_2\lambda_2^{old}\vec{x_1}^T\vec{x_2} \tag{1-15} v1=g(x1 )by1λ1oldx1 Tx1 y2λ2oldx1 Tx2 (1-15)
v 2 = g ( x 2 ⃗ ) − b − y 1 λ 1 o l d x 1 ⃗ T x 2 ⃗ − y 2 λ 2 o l d x 2 ⃗ T x 2 ⃗ (1-16) v_2 =g(\vec{x_2}) - b - y_1\lambda_1^{old}\vec{x_1}^T\vec{x_2} - y_2\lambda_2^{old}\vec{x_2}^T\vec{x_2} \tag{1-16} v2=g(x2 )by1λ1oldx1 Tx2 y2λ2oldx2 Tx2 (1-16)

将1-14~16代回1-13整理, 可得最终新的 λ 2 n e w , u n c \lambda_2^{new,unc} λ2new,unc 为:
λ 2 n e w , u n c = λ 2 o l d + y 2 ( E ( x 1 ⃗ ) − E ( x 2 ⃗ ) ) x 1 ⃗ T x 1 ⃗ + x 2 ⃗ T x 2 ⃗ − 2 x 1 ⃗ T x 2 ⃗ \lambda_2^{new,unc}=\lambda_2^{old}+\frac{y_2(E(\vec{x_1})-E(\vec{x_2}))}{\vec{x_1}^T\vec{x_1}+\vec{x_2}^T\vec{x_2}-2\vec{x_1}^T\vec{x_2}} λ2new,unc=λ2old+x1 Tx1 +x2 Tx2 2x1 Tx2 y2(E(x1 )E(x2 ))
通过第一个约束条件,可推导出:
L ≤ λ 2 n e w ≤ H L \leq \lambda_2^{new}\leq H Lλ2newH
其中, L , H L,H L,H为直线与方形边界交点的 λ 2 \lambda_2 λ2 值。 L , H L,H L,H可根据 λ 1 o l d , λ 2 o l d \lambda_1^{old}, \lambda_2^{old} λ1old,λ2old 进行计算。
        当 y 1 ≠ y 2 y_1 \neq y_2 y1=y2 时:
L = max ⁡ ( 0 , λ 2 o l d − λ 1 o l d ) H = min ⁡ ( C , λ 2 o l d − λ 1 o l d + C ) L = \max(0,\lambda_2^{old}-\lambda_1^{old}) \\ H = \min(C,\lambda_2^{old}-\lambda_1^{old}+C) L=max(0λ2oldλ1old)H=min(Cλ2oldλ1old+C)
        当 y 1 = y 2 y_1 = y_2 y1=y2 时:
L = max ⁡ ( 0 , λ 2 o l d + λ 1 o l d − C ) H = min ⁡ ( C , λ 2 o l d + λ 1 o l d ) L = \max(0,\lambda_2^{old}+\lambda_1^{old}-C) \\ H = \min(C,\lambda_2^{old}+\lambda_1^{old}) L=max(0λ2old+λ1oldC)H=min(Cλ2old+λ1old)
通过对 λ 2 n e w , u n c \lambda_2^{new,unc} λ2new,unc 裁剪可得:
λ 2 n e w = { L , λ 2 n e w , u n c < L λ 2 n e w , L ≤ λ 2 n e w , u n c ≤ H H , λ 2 n e w , u n c ≥ H \lambda_2^{new}= \begin{cases} L,\quad \lambda_2^{new,unc} < L\\ \lambda_2^{new},\quad L\leq \lambda_2^{new,unc} \leq H\\ H,\lambda_2^{new,unc} \geq H \end{cases} λ2new= Lλ2new,unc<Lλ2newLλ2new,uncHHλ2new,uncH
同理可求解 λ 2 n e w \lambda_2^{new} λ2new

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值