《统计学习方法》(第七章)—— 支持向量机

线性可分支持向量机与硬间隔最大化

线性可分支持向量机

  • 定义:给定线性可分训练数据集,通过间隔最大化或等价地求解相应的凸二次规划问题学习得到的分离超平面为
    w ∗ ⋅ x + b ∗ = 0 w^* \cdot x +b^*=0 wx+b=0
    以及相应的分类决策函数
    f ( x ) = s i g n ( w ∗ ⋅ x + b ∗ ) f(x)=sign(w^* \cdot x +b^*) f(x)=sign(wx+b)
    称为线性可分支持向量机.

函数间隔和几何间隔

  • 函数间隔定义:对于给定的训练数据集T和超平面 ( w , b ) (w,b) (w,b)定义超平面 ( w , b ) (w,b) (w,b)关于样本点 ( x i , y i ) (x_i,y_i) (xi,yi)的函数间隔为
    γ ^ i = y i ( w ⋅ x i + b ) \hat{\gamma}_i=y_i(w \cdot x_i+b) γ^i=yi(wxi+b)
    定义超平面 ( w , b ) (w,b) (w,b)关于训练数据集 T T T的函数间隔为超平面 ( w , b ) (w,b) (w,b)关于 T T T中所有样本点 ( x i , y i ) (x_i,y_i) (xi,yi)的函数间隔最小值
    γ ^ = min ⁡ i = 1 , . . . , N γ ^ i \hat{\gamma}=\min\limits_{i=1,...,N}\hat{\gamma}_i γ^=i=1,...,Nminγ^i
  • 几何间隔定义:对于给定的训练数据集 T T T和超平面 ( w , b ) (w,b) (w,b),定义超平面 ( w , b ) (w,b) (w,b)关于样本点 ( x i , y i ) (x_i,y_i) (xi,yi)的几何间隔为
    γ i = y i ( w ∣ ∣ w ∣ ∣ ⋅ x i + b ∣ ∣ w ∣ ∣ ) \gamma_i=y_i(\frac{w}{||w||}\cdot x_i+\frac{b}{||w||}) γi=yi(wwxi+wb)
    定义超平面 ( w , b ) (w,b) (w,b)关于训练数据集 T T T的几何间隔为超平面 ( w , b ) (w,b) (w,b)关于 T T T中所有样本点 ( x i , y i ) (x_i,y_i) (xi,yi)的几何间隔最小
    γ = min ⁡ i = 1 , . . . , N γ i {\gamma}=\min\limits_{i=1,...,N}{\gamma}_i γ=i=1,...,Nminγi
    于是我们有
    γ i = γ i ^ ∣ ∣ w ∣ ∣ {\gamma}_i=\frac{\hat{\gamma_i}}{||w||} γi=wγi^
    γ = γ ^ ∣ ∣ w ∣ ∣ {\gamma}=\frac{\hat{\gamma}}{||w||} γ=wγ^

间隔最大化

最大间隔超平面为
                                       max ⁡ w , b   γ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \max\limits_{w,b} \ \gamma                                       w,bmax γ
                                       s . t .      y i ( w ∣ ∣ w ∣ ∣ + b ∣ ∣ w ∣ ∣ ) ≥ γ , i = 1 , 2 , . . . , N \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ s.t. \ \ \ \ y_i(\frac{w}{||w||}+\frac{b}{||w||})\ge \gamma, i=1,2,...,N                                       s.t.    yi(ww+wb)γ,i=1,2,...,N
等价于
                                       max ⁡ w , b   γ ^ ∣ ∣ w ∣ ∣ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \max\limits_{w,b} \ \frac{\hat{\gamma}}{||w||}                                       w,bmax wγ^
                                       s . t .      y i ( w + b ) ≥ γ ^ , i = 1 , 2 , . . . , N \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ s.t. \ \ \ \ y_i({w}+{b})\ge \hat{\gamma}, i=1,2,...,N                                       s.t.    yi(w+b)γ^,i=1,2,...,N
因为 γ ^ \hat{\gamma} γ^取值无所谓,我们取 γ ^ = 1 \hat{\gamma}=1 γ^=1
则最终等价于
                                       min ⁡ w , b   1 2 ∣ ∣ w ∣ ∣ 2 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \min\limits_{w,b} \ \frac{1}{2}||w||^2                                       w,bmin 21w2
                                       s . t .      y i ( w + b ) − 1 ≥ 0 , i = 1 , 2 , . . . , N \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ s.t. \ \ \ \ y_i({w}+{b})-1 \ge0, i=1,2,...,N                                       s.t.    yi(w+b)10,i=1,2,...,N
这是一个凸优化问题

最终算法

输入:线性可分训练数据集 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x N , y N ) } T=\{(x_1,y_1),(x_2,y_2),...,(x_N,y_N)\} T={(x1,y1),(x2,y2),...,(xN,yN)}其中 x i ∈ R n , y i ∈ { − 1 , + 1 } , i = 1 , 2 , . . . , N x_i \in R^n,y_i \in \{-1,+1\},i=1,2,...,N xiRn,yi{1,+1},i=1,2,...,N
输出:最大间隔分离超平面和分类函数

( 1 ) (1) (1)构造优化问题
                                      min ⁡ w , b   1 2 ∣ ∣ w ∣ ∣ 2 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \min\limits_{w,b} \ \frac{1}{2}||w||^2                                      w,bmin 21w2
                                       s . t .      y i ( w + b ) − 1 ≥ 0 , i = 1 , 2 , . . . , N \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ s.t. \ \ \ \ y_i({w}+{b})-1 \ge0, i=1,2,...,N                                       s.t.    yi(w+b)10,i=1,2,...,N
( 2 ) (2) (2)得到分类超平面
w ∗ ⋅ x + b ∗ = 0 w^* \cdot x+b^*=0 wx+b=0
以及分类决策函数
f ( x ) = s i g n ( w ∗ ⋅ x + b ∗ ) f(x)=sign(w^* \cdot x+b^*) f(x)=sign(wx+b)

  • 最大间隔分离超平面存在且唯一性证明:
    ( 1 ) (1) (1)存在性
    由于数据线性可分,必然存在可行解,又由于目标函数有下界,所以最优解必然存在,记 ( w ∗ , b ∗ ) (w^*,b^*) (w,b)又因为数据中存在正负样本,所以 w ∗ ≠ 0 w^* \ne 0 w=0,存在性得证
    ( 2 ) (2) (2)唯一性
    首先证明 w ∗ w^* w唯一.假设有两个最优解 ( w 1 ∗ , b 1 ∗ ) (w_1^*,b_1^*) (w1,b1) ( w 2 ∗ , b 2 ∗ ) (w_2^*,b_2^*) (w2,b2)显然 ∣ ∣ w 1 ∗ ∣ ∣ = ∣ ∣ w 2 ∗ ∣ ∣ = c ||w_1^*||=||w_2^*||=c w1=w2=c
    w = w 1 ∗ + w 2 ∗ 2 , b = b 1 ∗ + b 2 ∗ 2 w=\frac{w_1^*+w_2^*}{2},b=\frac{b_1^*+b_2^*}{2} w=2w1+w2,b=2b1+b2,则 c ≤ ∣ ∣ w ∣ ∣ ≤ 1 2 ∣ ∣ w 1 ∗ ∣ ∣ + 1 2 ∣ ∣ w 2 ∗ ∣ ∣ = c , c\le||w||\le\frac{1}{2}||w_1^*||+\frac{1}{2}||w_2^*||=c, cw21w1+21w2=c,所以 ∣ ∣ w ∣ ∣ = 1 2 ∣ ∣ w 1 ∗ ∣ ∣ + 1 2 ∣ ∣ w 2 ∗ ∣ ∣ ||w||=\frac{1}{2}||w_1^*||+\frac{1}{2}||w_2^*|| w=21w1+21w2
    从而 ∣ ∣ w 1 ∗ ∣ ∣ = λ ∣ ∣ w 2 ∗ ∣ ∣ ||w_1^*||=\lambda||w_2^*|| w1=λw2, ∣ λ ∣ = 1 |\lambda|=1 λ=1如果 λ = − 1 \lambda=-1 λ=1,则 ∣ ∣ w ∣ ∣ = 0 ||w||=0 w=0矛盾,如果 λ = 1 \lambda=1 λ=1,则 ∣ ∣ w 1 ∗ ∣ ∣ = ∣ ∣ w 2 ∗ ∣ ∣ ||w_1^*||=||w_2^*|| w1=w2矛盾,所以 w ∗ w^* w唯一
    再证 b ∗ b^* b
    x 1 ‘ , x 2 ‘ x_1^`,x_2^` x1,x2为集合 { x i ∣ y i = + 1 } \{x_i|y_i=+1\} {xiyi=+1}中分别对应 ( w ∗ , b 1 ∗ ) (w^*,b_1^*) (w,b1) ( w ∗ , b 2 ∗ ) (w^*,b_2^*) (w,b2)成立的点
    x 1 ‘ ‘ , x 2 ‘ ‘ x_1^{``},x_2^{``} x1,x2为集合 { x i ∣ y i = − 1 } \{x_i|y_i=-1\} {xiyi=1}中分别对应 ( w ∗ , b 1 ∗ ) (w^*,b_1^*) (w,b1) ( w ∗ , b 2 ∗ ) (w^*,b_2^*) (w,b2)成立的点
    b 1 ∗ = − 1 2 ( w ∗ ⋅ x 1 ′ + w ∗ ⋅ x 1 ′ ′ ) , b 2 ∗ = − 1 2 ( w ∗ ⋅ x 2 ′ + w ∗ ⋅ x 2 ′ ′ ) b_1^*=-\frac{1}{2}(w^* \cdot x_1^{'}+w^* \cdot x_1^{''}),b_2^*=-\frac{1}{2}(w^* \cdot x_2^{'}+w^* \cdot x_2^{''}) b1=21(wx1+wx1),b2=21(wx2+wx2)
    b 1 ∗ − b 2 ∗ = − 1 2 [ w ∗ ⋅ ( x 1 ′ − x 2 ′ ) + w ∗ ⋅ ( x 1 ′ ′ − x 2 ′ ′ ) ] b_1^*-b_2^*=-\frac{1}{2}[w^*\cdot (x_1^{'}-x_2^{'})+w^*\cdot (x_1^{''}-x_2^{''})] b1b2=21[w(x1x2)+w(x1x2)]

    w ∗ ⋅ x 2 ′ + b 1 ∗ ≥ 1 = w ∗ ⋅ x 1 ′ + b 1 ∗ w^* \cdot x_2^{'}+b_1^* \ge 1=w^* \cdot x_1^{'}+b_1^* wx2+b11=wx1+b1
    w ∗ ⋅ x 1 ′ + b 1 ∗ ≥ 1 = w ∗ ⋅ x 2 ′ + b 2 ∗ w^* \cdot x_1^{'}+b_1^* \ge 1=w^* \cdot x_2^{'}+b_2^* wx1+b11=wx2+b2
    所以 w ∗ ⋅ ( x 1 ′ − x 2 ′ ) = 0 w^* \cdot(x_1^{'}-x_2^{'})=0 w(x1x2)=0,同理 w ∗ ⋅ ( x 1 ′ ‘ − x 2 ′ ’ ) = 0 w^* \cdot(x_1^{'‘}-x_2^{'’})=0 w(x1x2)=0
    所以 b 1 ∗ = b 2 ∗ b_1^*=b_2^* b1=b2成立.
  • 支持向量和间隔边界
    满足 w ⋅ x i + b = y i w \cdot x_i+b=y_i wxi+b=yi的点称为支持向量
    H 1 : w ⋅ x i + b = + 1 H_1:w \cdot x_i+b=+1 H1:wxi+b=+1
    H 2 : w ⋅ x i + b = − 1 H_2:w \cdot x_i+b=-1 H2:wxi+b=1
    H 1 和 H 2 H_1和H_2 H1H2之间的宽度为 2 ∣ ∣ w ∣ ∣ \frac{2}{||w||} w2为间隔边界

学习的对偶算法

对优化问题求解,首先定义拉格朗日函数
L ( w , b , a ) = 1 2 ∣ ∣ w ∣ ∣ 2 − ∑ i = 1 N a i y i ( w ⋅ x i + b ) + ∑ i = 1 N a i , L(w,b,a)=\frac{1}{2}||w||^2-\sum\limits_{i=1}^Na_iy_i(w \cdot x_i+b)+\sum\limits_{i=1}^Na_i, L(w,b,a)=21w2i=1Naiyi(wxi+b)+i=1Nai,其中 a i ≥ 0 , i = 1 , 2 , . . . , N a_i \ge0,i=1,2,...,N ai0,i=1,2,...,N
定义 a = ( a 1 , a 2 , . . . , a N ) T a=(a_1,a_2,...,a_N)^T a=(a1,a2,...,aN)T
则原问题等价于
max ⁡ a min ⁡ w , b L ( w , b , a ) \max\limits_a\min\limits_{w,b}L(w,b,a) amaxw,bminL(w,b,a)
( 1 ) (1) (1) min ⁡ w , b L ( w , b , a ) \min\limits_{w,b}L(w,b,a) w,bminL(w,b,a) w , b w,b w,b偏导数等于0
∇ w L ( w , b , a ) = w − ∑ i = 1 N a i y i x i = 0 \nabla_wL(w,b,a)=w-\sum\limits_{i=1}^Na_iy_ix_i=0 wL(w,b,a)=wi=1Naiyixi=0
∇ b L ( w , b , a ) = − ∑ i = 1 N a i y i = 0 \nabla_bL(w,b,a)=-\sum\limits_{i=1}^Na_iy_i=0 bL(w,b,a)=i=1Naiyi=0

w = ∑ i = 1 N a i y i x i w=\sum\limits_{i=1}^Na_iy_ix_i w=i=1Naiyixi
∑ i = 1 N a i y i = 0 \sum\limits_{i=1}^Na_iy_i=0 i=1Naiyi=0
代入得
L ( w , b , a ) = 1 2 ∑ i = 1 N ∑ j = 1 N a i a j y i y j ( x i ⋅ x j ) − ∑ i = 1 N a i y i ( ( ∑ j = 1 N a j y j x j ) ⋅ x i + b + ∑ i = 1 N a i ) L(w,b,a)=\frac{1}{2}\sum\limits_{i=1}^N\sum\limits_{j=1}^Na_ia_jy_iy_j(x_i \cdot x_j)-\sum\limits_{i=1}^Na_iy_i\Bigg((\sum\limits_{j=1}^Na_jy_jx_j)\cdot x_i +b+\sum\limits_{i=1}^Na_i\Bigg) L(w,b,a)=21i=1Nj=1Naiajyiyj(xixj)i=1Naiyi((j=1Najyjxj)xi+b+i=1Nai)

= − 1 2 ∑ i = 1 N ∑ j = 1 N a i a j y i y j ( x i ⋅ x j ) + ∑ i = 1 N a i =-\frac{1}{2}\sum\limits_{i=1}^N\sum\limits_{j=1}^Na_ia_jy_iy_j(x_i \cdot x_j)+\sum\limits_{i=1}^Na_i =21i=1Nj=1Naiajyiyj(xixj)+i=1Nai

( 2 ) (2) (2)
                                       min ⁡ a   1 2 ∑ i = 1 N ∑ j = 1 N a i a j y i y j ( x i ⋅ x j ) − ∑ i = 1 N a i \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \min\limits_a\ \frac{1}{2}\sum\limits_{i=1}^N\sum\limits_{j=1}^Na_ia_jy_iy_j(x_i \cdot x_j)-\sum\limits_{i=1}^Na_i                                       amin 21i=1Nj=1Naiajyiyj(xixj)i=1Nai
                                       s . t .    ∑ i = 1 N a i y i = 0 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ s.t.\ \ \sum\limits_{i=1}^Na_iy_i=0                                       s.t.  i=1Naiyi=0
                                       a i ≥ 0 , i = 1 , 2 , . . . , N \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ a_i\ge 0,i=1,2,...,N                                       ai0,i=1,2,...,N



w ∗ = ∑ i = 1 N a i ∗ y i x i w^*=\sum\limits_{i=1}^Na_i^*y_ix_i w=i=1Naiyixi
b ∗ = y i − ∑ i = 1 N a i ∗ y i ( x i ⋅ x j ) , a i > 0 b^*=y_i-\sum\limits_{i=1}^Na_i^*y_i(x_i \cdot x_j),a_i>0 b=yii=1Naiyi(xixj),ai>0
f ( x ) = s i g n ( ∑ i = 1 N a i ∗ y i ( x ⋅ x i ) + b ∗ ) f(x)=sign(\sum\limits_{i=1}^Na_i^*y_i(x \cdot x_i)+b^*) f(x)=sign(i=1Naiyi(xxi)+b)
算法
输入:线性可分训练集 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x N , y N ) } T=\{(x_1,y_1),(x_2,y_2),...,(x_N,y_N)\} T={(x1,y1),(x2,y2),...,(xN,yN)},其中 x i ∈ R n , y i ∈ { − 1 , + 1 } , i = 1 , 2 , . . . , N x_i \in R^n,y_i \in \{-1,+1\},i=1,2,...,N xiRn,yi{1,+1},i=1,2,...,N
输出:分离超平面和分类决策函数
( 1 ) (1) (1)
                                       min ⁡ a   1 2 ∑ i = 1 N ∑ j = 1 N a i a j y i y j ( x i ⋅ x j ) − ∑ i = 1 N a i \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \min\limits_a\ \frac{1}{2}\sum\limits_{i=1}^N\sum\limits_{j=1}^Na_ia_jy_iy_j(x_i \cdot x_j)-\sum\limits_{i=1}^Na_i                                       amin 21i=1Nj=1Naiajyiyj(xixj)i=1Nai
                                       s . t .    ∑ i = 1 N a i y i = 0 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ s.t.\ \ \sum\limits_{i=1}^Na_iy_i=0                                       s.t.  i=1Naiyi=0
                                       a i ≥ 0 , i = 1 , 2 , . . . , N \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ a_i\ge 0,i=1,2,...,N                                       ai0,i=1,2,...,N
求解 a ∗ a^* a
( 2 ) (2) (2)计算
w ∗ = ∑ i = 1 N a i ∗ y i x i w^*=\sum\limits_{i=1}^Na_i^*y_ix_i w=i=1Naiyixi
b ∗ = y j − ∑ i = 1 N a i ∗ y i ( x i ⋅ x j ) , a j > 0 b^*=y_j-\sum\limits_{i=1}^Na_i^*y_i(x_i \cdot x_j),a_j>0 b=yji=1Naiyi(xixj),aj>0

( 3 ) (3) (3)求得分类超平面
f ( x ) = s i g n ( ∑ i = 1 N a i ∗ y i ( x ⋅ x i ) + b ∗ ) f(x)=sign(\sum\limits_{i=1}^Na_i^*y_i(x \cdot x_i)+b^*) f(x)=sign(i=1Naiyi(xxi)+b)

线性支持向量机与软间隔最大化

线性支持向量机

  • 定义 给定线性不可分的训练数据集,通过求解凸二次规划问题,即软间隔最大化,得到分离超平面为
    w ∗ ⋅ x + b ∗ = 0 w^* \cdot x+b^*=0 wx+b=0
    以及决策分类函数
    f ( x ) = s i g n ( w ∗ ⋅ x + b ∗ ) f(x)=sign(w^* \cdot x+b^*) f(x)=sign(wx+b)
    称为线性支持向量机,

    改变约束条件为
    y i ( w ⋅ x i + b ) ≥ 1 − ξ i , ξ i ≥ 0 y_i(w \cdot x_i+b)\ge 1-\xi_i,\xi_i\ge0 yi(wxi+b)1ξi,ξi0
    目标函数为
    1 2 ∣ ∣ w ∣ ∣ 2 + C ∑ i = 1 N ξ i , C > 0 \frac{1}{2}||w||^2+C\sum\limits_{i=1}^N\xi_i,C>0 21w2+Ci=1Nξi,C>0
    最终为
    min ⁡ w , b     1 2 ∣ ∣ w ∣ ∣ 2 + C ∑ i = 1 N ξ i                           \min\limits_{w,b} \ \ \ \frac{1}{2}||w||^2+C\sum\limits_{i=1}^N\xi_i\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ w,bmin   21w2+Ci=1Nξi                         
    s . t .      y i ( w + b ) ≥ 1 − ξ i , i = 1 , 2 , . . . , N s.t. \ \ \ \ y_i({w}+{b}) \ge1-\xi_i, i=1,2,...,N s.t.    yi(w+b)1ξi,i=1,2,...,N

学习的对偶算法

根据对偶原理
L ( w , b , ξ , a , μ ) = 1 2 ∣ ∣ w ∣ ∣ 2 + C ∑ i = 1 N ξ i − ∑ i = 1 N a i ( y i ( w ⋅ x i + b ) − 1 + ξ i ) − ∑ i = 1 N μ i ξ i ,                 ξ i ≥ 0 , μ i ≥ 0 L(w,b,\xi,a,\mu)=\frac{1}{2}||w||^2+C\sum\limits_{i=1}^N\xi_i-\sum\limits_{i=1}^Na_i(y_i(w \cdot x_i+b)-1+\xi_i)-\sum_{i=1}^N\mu_i\xi_i,\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \xi_i \ge0,\mu_i\ge0 L(w,b,ξ,a,μ)=21w2+Ci=1Nξii=1Nai(yi(wxi+b)1+ξi)i=1Nμiξi,               ξi0,μi0
∇ w L ( w , b , ξ , a , μ ) = w − ∑ i = 1 N a i y i x i = 0 \nabla_wL(w,b,\xi,a,\mu)=w-\sum\limits_{i=1}^Na_iy_ix_i=0 wL(w,b,ξ,a,μ)=wi=1Naiyixi=0
∇ b L ( w , b , ξ , a , μ ) = − ∑ i = 1 N a i y i = 0 \nabla_bL(w,b,\xi,a,\mu)=-\sum\limits_{i=1}^Na_iy_i=0 bL(w,b,ξ,a,μ)=i=1Naiyi=0
∇ ξ i L ( w , b , ξ , a , μ ) = C − a i − μ i = 0 \nabla_{\xi_i}L(w,b,\xi,a,\mu)=C-a_i-\mu_i=0 ξiL(w,b,ξ,a,μ)=Caiμi=0
代入原式中得
                                  min ⁡ w , b     1 2 ∑ i = 1 N ∑ j = 1 N a i a j y i y j ( x i ⋅ x j ) − ∑ i = 1 N a i       \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \min\limits_{w,b} \ \ \ \frac{1}{2}\sum\limits_{i=1}^N\sum\limits_{j=1}^Na_ia_jy_iy_j(x_i \cdot x_j)-\sum\limits_{i=1}^Na_i \ \ \ \ \                                  w,bmin   21i=1Nj=1Naiajyiyj(xixj)i=1Nai     
s . t .              ∑ i = 1 N a i y i = 0 s.t. \ \ \ \ \ \ \ \ \ \ \ \ \sum\limits_{i=1}^Na_iy_i=0 s.t.            i=1Naiyi=0
       0 ≤ a i ≤ C , i = 1 , 2 , . . . , N \ \ \ \ \ \ 0\le a_i\le C,i=1,2,...,N       0aiC,i=1,2,...,N
其中
w ∗ = ∑ i = 1 N a i ∗ y i x i w^*=\sum\limits_{i=1}^Na_i^*y_ix_i w=i=1Naiyixi
b ∗ = y j − ∑ i = 1 N y i a i ∗ ( x i ⋅ x j ) , 0 < a j ∗ < C b^*=y_j-\sum\limits_{i=1}^Ny_ia_i^*(x_i \cdot x_j),0<a_j^*<C b=yji=1Nyiai(xixj),0<aj<C
算法:
输入:线性可分训练集 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x N , y N ) } T=\{(x_1,y_1),(x_2,y_2),...,(x_N,y_N)\} T={(x1,y1),(x2,y2),...,(xN,yN)},其中 x i ∈ R n , y i ∈ { − 1 , + 1 } , i = 1 , 2 , . . . , N x_i \in R^n,y_i \in \{-1,+1\},i=1,2,...,N xiRn,yi{1,+1},i=1,2,...,N
输出:分离超平面和分类决策函数
( 1 ) (1) (1)
                                       min ⁡ a   1 2 ∑ i = 1 N ∑ j = 1 N a i a j y i y j ( x i ⋅ x j ) − ∑ i = 1 N a i \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \min\limits_a\ \frac{1}{2}\sum\limits_{i=1}^N\sum\limits_{j=1}^Na_ia_jy_iy_j(x_i \cdot x_j)-\sum\limits_{i=1}^Na_i                                       amin 21i=1Nj=1Naiajyiyj(xixj)i=1Nai
                                       s . t .    ∑ i = 1 N a i y i = 0 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ s.t.\ \ \sum\limits_{i=1}^Na_iy_i=0                                       s.t.  i=1Naiyi=0
                                       0 ≤ a i ≤ C , i = 1 , 2 , . . . , N \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 0 \le a_i\le C,i=1,2,...,N                                       0aiC,i=1,2,...,N
求解 a ∗ a^* a
( 2 ) (2) (2)计算
w ∗ = ∑ i = 1 N a i ∗ y i x i w^*=\sum\limits_{i=1}^Na_i^*y_ix_i w=i=1Naiyixi
b ∗ = y j − ∑ i = 1 N a i ∗ y i ( x i ⋅ x j ) , a j > 0 b^*=y_j-\sum\limits_{i=1}^Na_i^*y_i(x_i \cdot x_j),a_j>0 b=yji=1Naiyi(xixj),aj>0

( 3 ) (3) (3)求得分类超平面
f ( x ) = s i g n ( ∑ i = 1 N a i ∗ y i ( x ⋅ x i ) + b ∗ ) f(x)=sign(\sum\limits_{i=1}^Na_i^*y_i(x \cdot x_i)+b^*) f(x)=sign(i=1Naiyi(xxi)+b)

支持向量

  • 0 < a i ∗ < C 0<a_i^*<C 0<ai<C x i x_i xi在间隔边界上
  • a i ∗ = C , 0 < ξ i < 1 a_i^*=C,0 < \xi_i <1 ai=C,0<ξi<1则分类正确,且在间隔边界和超平面之间
  • a i ∗ = C , ξ i = 1 a_i^*=C,\xi_i =1 ai=C,ξi=1 x i x_i xi在分离超平面上
  • a i ∗ = C , 1 < ξ i a_i^*=C,1 < \xi_i ai=C,1<ξi x i x_i xi在另一测

合页损失函数

修改目标函数为
∑ i = 1 N [ 1 − y i ( w ⋅ x + b ) ] + + λ ∣ ∣ w ∣ ∣ 2 \sum\limits_{i=1}^N[1-y_i(w \cdot x+b)]_++ \lambda||w||^2 i=1N[1yi(wx+b)]++λw2
等价于线性支持向量机
ξ i = [ 1 − y i ( w ⋅ x + b ) ] + \xi_i=[1-y_i(w \cdot x+b)]_+ ξi=[1yi(wx+b)]+

min ⁡ w , b ∑ i = 1 N ξ i + λ ∣ ∣ w ∣ ∣ 2 \min\limits_{w,b}\sum\limits_{i=1}^N\xi_i+\lambda||w||^2 w,bmini=1Nξi+λw2
λ = 1 2 C \lambda=\frac{1}{2C} λ=2C1

min ⁡ w , b 1 C ( C ∑ i = 1 N ξ i + 1 2 λ ∣ ∣ w ∣ ∣ 2 ) \min\limits_{w,b}\frac{1}{C}(C\sum\limits_{i=1}^N\xi_i+\frac{1}{2}\lambda||w||^2) w,bminC1(Ci=1Nξi+21λw2)
等价之

非线性支持向量机与核函数

核技巧

针对线性不可分问题,我们应用核技巧
ϕ ( x ) \phi(x) ϕ(x)为x向特征空间的映射
k ( x , z ) = ϕ ( x ) ⋅ ϕ ( z ) k(x,z)=\phi(x) \cdot \phi(z) k(x,z)=ϕ(x)ϕ(z)
替换 x j ⋅ x i x_j \cdot x_i xjxi k ( x , z ) k(x,z) k(x,z)

正定核

K ( x , z ) K(x,z) K(x,z)为正定核函数的充要条件为其Gram矩阵是半正定的
K = [ K ( x i , x j ) ] m × m K=[K(x_i,x_j)]_{m×m} K=[K(xi,xj)]m×m
为半正定

常用核函数

  • 多项式核函数
    K ( x , z ) = ( x ⋅ z + 1 ) p K(x,z)=(x \cdot z +1)^p K(x,z)=(xz+1)p
  • 高斯核函数
    K ( x , z ) = exp ⁡ ( − ∣ ∣ x − z ∣ ∣ 2 2 σ 2 ) K(x,z)=\exp(-\frac{||x-z||^2}{2\sigma^2}) K(x,z)=exp(2σ2xz2)
  • 字符串核函数
    K n ( s , t ) = ∑ u ∈ ∑ n [ ϕ n ( s ) ] n [ ϕ n ( t ) ] n = ∑ u ∈ ∑ n ∑ ( i , j ) : s ( i ) = t ( j ) = u λ l ( i ) + l ( j ) K_n(s,t)=\sum\limits_{u \in \sum^n}[\phi_n(s)]_n[\phi_n(t)]_n=\sum\limits_{u \in \sum^n}\sum\limits_{(i,j):s(i)=t(j)=u}\lambda^{l(i)+l(j)} Kn(s,t)=un[ϕn(s)]n[ϕn(t)]n=un(i,j):s(i)=t(j)=uλl(i)+l(j)
    其中 0 < λ ≤ 1 , l ( i ) 0<\lambda\le1,l(i) 0<λ1,l(i)为字符串 i i i的长度,在 s , t s,t s,t子串上进行
    l ( i ) = i ∣ u ∣ − i 1 + 1 , 1 ≤ i 1 < i 2 , . . . , i ∣ u ∣ ≤ ∣ s ∣ l(i)=i_{|u|}-i_1+1,1\le i_1<i_2,...,i_{|u|}\le |s| l(i)=iui1+1,1i1<i2,...,ius

非线性支持向量机

  • 定义:从非线性分类训练集,通过核函数与软间隔最大化,或凸规划,学习得到的分类决策函数
    f ( x ) = s i g n ( ∑ i = 1 N a i ∗ y i K ( x , x i ) + b ∗ ) f(x)=sign(\sum\limits_{i=1}^Na_i^*y_iK(x,x_i)+b^*) f(x)=sign(i=1NaiyiK(x,xi)+b)
    称为非线性支持向量机, K ( x , z ) K(x,z) K(x,z)为正定核函数
    算法:
    输入:线性可分训练集 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x N , y N ) } T=\{(x_1,y_1),(x_2,y_2),...,(x_N,y_N)\} T={(x1,y1),(x2,y2),...,(xN,yN)},其中 x i ∈ R n , y i ∈ { − 1 , + 1 } , i = 1 , 2 , . . . , N x_i \in R^n,y_i \in \{-1,+1\},i=1,2,...,N xiRn,yi{1,+1},i=1,2,...,N
    输出:分离超平面和分类决策函数
    ( 1 ) (1) (1)
                                           min ⁡ a   1 2 ∑ i = 1 N ∑ j = 1 N a i a j y i y j K ( x i , x j ) − ∑ i = 1 N a i \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \min\limits_a\ \frac{1}{2}\sum\limits_{i=1}^N\sum\limits_{j=1}^Na_ia_jy_iy_jK(x_i,x_j)-\sum\limits_{i=1}^Na_i                                       amin 21i=1Nj=1NaiajyiyjK(xi,xj)i=1Nai
                                           s . t .    ∑ i = 1 N a i y i = 0 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ s.t.\ \ \sum\limits_{i=1}^Na_iy_i=0                                       s.t.  i=1Naiyi=0
                                           0 ≤ a i ≤ C , i = 1 , 2 , . . . , N \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 0 \le a_i\le C,i=1,2,...,N                                       0aiC,i=1,2,...,N
    求解 a ∗ a^* a
    ( 2 ) (2) (2)计算
    w ∗ = ∑ i = 1 N a i ∗ y i x i w^*=\sum\limits_{i=1}^Na_i^*y_ix_i w=i=1Naiyixi
    b ∗ = y j − ∑ i = 1 N a i ∗ y i K ( x i , x j ) , a j > 0 b^*=y_j-\sum\limits_{i=1}^Na_i^*y_iK(x_i,x_j),a_j>0 b=yji=1NaiyiK(xi,xj),aj>0

( 3 ) (3) (3)求得分类超平面
f ( x ) = s i g n ( ∑ i = 1 N a i ∗ y i K ( x i , x ) + b ∗ ) f(x)=sign(\sum\limits_{i=1}^Na_i^*y_iK(x_i,x)+b^*) f(x)=sign(i=1NaiyiK(xi,x)+b)

序列最小最优化算法

选择两个违反KKT条件的变量进行优化,直到满足停止条件或者都满足KKT条件,如果满足KKT条件,则是最优解

两个变量二次规划的求解方法

设选择 a 1 , a 2 a_1,a_2 a1,a2
min ⁡ a 1 , a 2       W ( a 1 , a 2 ) = 1 2 K 11 a 1 2 + 1 2 K 22 a 2 2 + y 1 y 2 K 12 a 1 a 2 − ( a 1 + a 2 ) + y 1 a 1 ∑ i = 3 N y i a i K i 1 + y 2 a 2 ∑ i = 3 N y i a i K i 2 \min\limits_{a_1,a_2}\ \ \ \ \ W(a_1,a_2)=\frac{1}{2}K_{11}a_1^2+\frac{1}{2}K_{22}a_2^2+y_1y_2K_{12}a_1a_2-(a_1+a_2)+y_1a_1\sum\limits_{i=3}^Ny_ia_iK_{i1}+y_2a_2\sum\limits_{i=3}^Ny_ia_iK_{i2} a1,a2min     W(a1,a2)=21K11a12+21K22a22+y1y2K12a1a2(a1+a2)+y1a1i=3NyiaiKi1+y2a2i=3NyiaiKi2
s . t .          a 1 y 1 + a 2 y 2 = − ∑ i = 3 N y i a i = ξ s.t.\ \ \ \ \ \ \ \ a_1y_1+a_2y_2=-\sum\limits_{i=3}^Ny_ia_i=\xi s.t.        a1y1+a2y2=i=3Nyiai=ξ
              0 ≤ a i ≤ C , i = 1 , 2 \ \ \ \ \ \ \ \ \ \ \ \ \ 0 \le a_i \le C,i=1,2              0aiC,i=1,2
K i j = K ( x i , x j ) K_{ij}=K(x_i,x_j) Kij=K(xi,xj)
我们要求
L ≤ a 2 n e w ≤ H L \le a_2^{new}\le H La2newH

  • y 1 ≠ y 2 y_1\ne y_2 y1=y2 L = max ⁡ ( 0 , a 2 o l d − a 1 o l d ) , R = min ⁡ ( C , C + a 2 o l d − a 1 o l d ) L=\max(0,a_2^{old}-a_1^{old}),R=\min(C,C+a_2^{old}-a_1^{old}) L=max(0,a2olda1old),R=min(C,C+a2olda1old)

  • y 1 = y 2 y_1= y_2 y1=y2 L = max ⁡ ( 0 , a 2 o l d + a 1 o l d − C ) , R = min ⁡ ( C , a 2 o l d + a 1 o l d ) L=\max(0,a_2^{old}+a_1^{old}-C),R=\min(C,a_2^{old}+a_1^{old}) L=max(0,a2old+a1oldC),R=min(C,a2old+a1old)
    未剪辑和考虑约束条件的解为 a 2 n e w , u n c a_2^{new,unc} a2new,unc
    g ( x ) = ∑ i = 1 N a i y i K ( x i , x ) + b g(x)=\sum\limits_{i=1}^Na_iy_iK(x_i,x)+b g(x)=i=1NaiyiK(xi,x)+b
    E i = g ( x i ) − y i = ( ∑ j = 1 N a j y j K ( x j , x i ) + b ) − y i ,       i = 1 , 2 E_i=g(x_i)-y_i=(\sum\limits_{j=1}^Na_jy_jK(x_j,x_i)+b)-y_i,\ \ \ \ \ i=1,2 Ei=g(xi)yi=(j=1NajyjK(xj,xi)+b)yi,     i=1,2

    a 2 n e w , u n c = a 2 o l d + y 2 ( E 1 − E 2 ) η a_2^{new,unc}=a_2^{old}+\frac{y_2(E_1-E_2)}{\eta} a2new,unc=a2old+ηy2(E1E2)
    其中
    η = K 11 + K 22 − 2 K 12 \eta=K_{11}+K_{22}-2K_{12} η=K11+K222K12
    再进行剪辑
    a n e w = { H a 2 n e w , u n c > H a 2 n e w , u n c L ≤ a 2 n e w , u n c ≤ H L a 2 n e w , u n c < L a^{new}=\begin{cases} H & a_2^{new,unc}>H\\ a_2^{new,unc} & L \le a_2^{new,unc} \le H\\ L & a_2^{new,unc}<L\\ \end{cases} anew=Ha2new,uncLa2new,unc>HLa2new,uncHa2new,unc<L

    a 1 n e w = a 1 o l d + y 1 y 2 ( a 2 o l d − a 1 o l d ) a_1^{new}=a_1^{old}+y_1y_2(a_2^{old}-a_1^{old}) a1new=a1old+y1y2(a2olda1old)

  • 以上更新公式的证明:
    v i = ∑ j = 3 N a j y j K ( x i , x j ) = g ( x i ) − ∑ j = 1 2 a j y j K ( x i , x j ) − b v_i=\sum\limits_{j=3}^Na_jy_jK(x_i,x_j)=g(x_i)-\sum\limits_{j=1}^2a_jy_jK(x_i,x_j)-b vi=j=3NajyjK(xi,xj)=g(xi)j=12ajyjK(xi,xj)b
    则原问题为
    W ( a 1 , a 2 ) = 1 2 K 11 a 1 2 + 1 2 K 22 a 2 2 + y 1 y 2 K 12 a 1 a 2 − ( a 1 + a 2 ) + y 1 v 1 a 1 + y 2 v 2 a 2 W(a_1,a_2)=\frac{1}{2}K_{11}a_1^2+\frac{1}{2}K_{22}a_2^2+y_1y_2K_{12}a_1a_2-(a_1+a_2)+y_1v_1a_1+y_2v_2a_2 W(a1,a2)=21K11a12+21K22a22+y1y2K12a1a2(a1+a2)+y1v1a1+y2v2a2

    a 1 = ( ξ − y 2 a 2 ) y 1 a_1=(\xi-y_2a_2)y_1 a1=(ξy2a2)y1
    则得到
    W ( a 2 ) = 1 2 K 11 ( ξ − a 2 y 2 ) 2 + 1 2 K 22 a 2 2 + y 2 K 12 ( ξ − a 2 y 2 ) a 2 − ( ξ − a 2 y 2 ) y 1 − a 2 + v 1 ( ξ − a 2 y 2 ) + y 2 v 2 a 2 W(a_2)=\frac{1}{2}K_{11}(\xi-a_2y_2)^2+\frac{1}{2}K_{22}a_2^2+y_2K_{12}(\xi-a_2y_2)a_2-(\xi-a_2y_2)y_1-a_2+v_1(\xi-a_2y_2)+y_2v_2a_2 W(a2)=21K11(ξa2y2)2+21K22a22+y2K12(ξa2y2)a2(ξa2y2)y1a2+v1(ξa2y2)+y2v2a2
    求导
    ∂ W ∂ a 2 = K 11 a 2 + K 22 a 2 − 2 K 12 a 2 − K 11 ξ y 2 + K 12 ξ y 2 + y 1 y 2 − 1 − v 1 y 2 + y 2 v 2 = 0 \frac{\partial W}{\partial a_2}=K_{11}a_2+K_{22}a_2-2K_{12}a_2-K_{11}\xi y_2+K_{12}\xi y_2+y_1y_2-1-v_1y_2+y_2v_2=0 a2W=K11a2+K22a22K12a2K11ξy2+K12ξy2+y1y21v1y2+y2v2=0
    同时
    η = K 11 + K 22 − 2 K 12 \eta=K_{11}+K_{22}-2K_{12} η=K11+K222K12

    a 2 n e w , u n c = a 2 o l d + y 2 ( E 1 − E 2 ) η a_2^{new,unc}=a_2^{old}+\frac{y_2(E_1-E_2)}{\eta} a2new,unc=a2old+ηy2(E1E2)

变量的选择方法

  1. 选择第一个变量
    KTT条件如下
    a i = 0    ⟺    y i g ( x i ) ≥ 1 a_i=0\iff y_ig(x_i) \ge 1 ai=0yig(xi)1
    0 < a i < C    ⟺    y i g ( x i ) = 1 0<a_i<C\iff y_ig(x_i) = 1 0<ai<Cyig(xi)=1
    a i = C    ⟺    y i g ( x i ) ≤ 1 a_i=C\iff y_ig(x_i) \le 1 ai=Cyig(xi)1
    优先选择不满足第二个条件,再遍历整个数据集选其他不满足的
  2. 选择第二个变量
    在第一个选择后,我们选择 a 2 a_2 a2的原则是尽量变化的快,即
  • E 1 > 0 E_1>0 E1>0,选最小的 E 2 E_2 E2
  • E 1 < 0 E_1<0 E1<0,选最大的 E 2 E_2 E2
    优先选择间隔边界上的点,如果没有变化快的,则遍历整个数据集,如果再没有,则放弃 a 1 a_1 a1重新选择 a 1 a_1 a1
  1. 计算 b b b E i E_i Ei
    由KKT条件, 0 < a 1 n e w < C 0<a_1^{new}<C 0<a1new<C
    ∑ i = 1 N a i y i K i 1 + b = y 1 \sum\limits_{i=1}^Na_iy_iK_{i1}+b=y_1 i=1NaiyiKi1+b=y1

    b 1 n e w = y 1 − ∑ i = 3 N a i y i K i 1 − a 1 n e w y 1 K 11 − a 2 n e w y 2 K 21 b_1^{new}=y_1-\sum\limits_{i=3}^Na_iy_iK_{i1}-a_1^{new}y_1K_{11}-a_2^{new}y_2K_{21} b1new=y1i=3NaiyiKi1a1newy1K11a2newy2K21

    E 1 = ∑ i = 3 N a i y i K i 1 + a 1 o l d y 1 K 11 + a 2 o l d y 2 K 21 + b o l d − y 1 E_1=\sum\limits_{i=3}^Na_iy_iK_{i1}+a_1^{old}y_1K_{11}+a_2^{old}y_2K_{21}+b^{old}-y_1 E1=i=3NaiyiKi1+a1oldy1K11+a2oldy2K21+boldy1
    由两项得
    b 1 n e w = − E 1 − y 1 K 11 ( a 1 n e w − a 1 o l d ) − y 2 K 21 ( a 2 n e w − a 2 o l d ) + b o l d b_1^{new}=-E_1-y_1K_{11}(a_1^{new}-a_1^{old})-y_2K_{21}(a_2^{new}-a_2^{old})+b^{old} b1new=E1y1K11(a1newa1old)y2K21(a2newa2old)+bold
    同样如果 0 < a 2 n e w < C 0<a_2^{new}<C 0<a2new<C
    b 2 n e w = − E 2 − y 1 K 12 ( a 1 n e w − a 1 o l d ) − y 2 K 22 ( a 2 n e w − a 2 o l d ) + b o l d b_2^{new}=-E_2-y_1K_{12}(a_1^{new}-a_1^{old})-y_2K_{22}(a_2^{new}-a_2^{old})+b^{old} b2new=E2y1K12(a1newa1old)y2K22(a2newa2old)+bold
    如果 a 1 n e w , a 2 n e w a_1^{new},a_2^{new} a1new,a2new同时满足条件,则 b 1 n e w = b 2 n e w b_1^{new}=b_2^{new} b1new=b2new
    如果 a 1 n e w , a 2 n e w a_1^{new},a_2^{new} a1new,a2new为0或 C C C,则我们取 b n e w = b 1 n e w + b 2 n e w 2 b^{new}=\frac{b_1^{new}+b_2^{new}}{2} bnew=2b1new+b2new
    最后
    E i n e w = ∑ S y j a j K ( x i , x j ) + b n e w − y i E_i^{new}=\sum\limits_{S}y_ja_jK(x_i,x_j)+b^{new}-y_i Einew=SyjajK(xi,xj)+bnewyi
    其中 S S S为支持向量的集合

SMO算法

输入:训练数据集 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x N , y N ) } , x i ∈ R n , y i ∈ { − 1 , + 1 } T=\{(x_1,y_1),(x_2,y_2),...,(x_N,y_N)\},x_i \in R^n,y_i \in \{-1,+1\} T={(x1,y1),(x2,y2),...,(xN,yN)},xiRn,yi{1,+1},精度 ϵ \epsilon ϵ
输出:近似解 a ^ \hat{a} a^
( 1 ) (1) (1)取初始值 a ( 0 ) = 0 , k = 0 a^{(0)}=0,k=0 a(0)=0,k=0
( 2 ) (2) (2)按照算法求解以 a 1 ( k ) a 2 ( k ) , a^{(k)}_1a^{(k)}_2, a1(k)a2(k), a 1 ( k + 1 ) a 2 ( k + 1 ) , a^{(k+1)}_1a^{(k+1)}_2, a1(k+1)a2(k+1),
( 3 ) (3) (3)如果以精度 ϵ \epsilon ϵ满足条件则停止,
∑ i = 1 N a i y i = 0 , 0 ≤ a i ≤ C , i = 1 , 2 , . . . , N \sum\limits_{i=1}^Na_iy_i=0,0\le a_i \le C,i=1,2,...,N i=1Naiyi=0,0aiC,i=1,2,...,N
y i ⋅ g ( x i ) = { ≥ 1 { x i ∣ a i = 0 } = 1 { x i ∣ 0 < a i < C } ≤ 1 { x i ∣ a i = C } y_i \cdot g(x_i)=\begin{cases} \ge 1 &\{x_i|a_i=0\}\\ =1 &\{x_i|0<a_i<C\}\\ \le 1 & \{x_i|a_i=C\}\\ \end{cases} yig(xi)=1=11{xiai=0}{xi0<ai<C}{xiai=C}
其中
g ( x i ) = ∑ j = 1 N a j y j K ( x j , x i ) + b g(x_i)=\sum\limits_{j=1}^Na_jy_jK(x_j,x_i)+b g(xi)=j=1NajyjK(xj,xi)+b
否则转 ( 4 ) , k = k + 1 (4),k=k+1 (4),k=k+1
( 4 ) (4) (4) a ^ = a ( k + 1 ) \hat{a}=a^{(k+1)} a^=a(k+1)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
目标检测(Object Detection)是计算机视觉领域的一个核心问题,其主要任务是找出图像中所有感兴趣的目标(物体),并确定它们的类别和位置。以下是对目标检测的详细阐述: 一、基本概念 目标检测的任务是解决“在哪里?是什么?”的问题,即定位出图像中目标的位置并识别出目标的类别。由于各类物体具有不同的外观、形状和姿态,加上成像时光照、遮挡等因素的干扰,目标检测一直是计算机视觉领域最具挑战性的任务之一。 二、核心问题 目标检测涉及以下几个核心问题: 分类问题:判断图像中的目标属于哪个类别。 定位问题:确定目标在图像中的具体位置。 大小问题:目标可能具有不同的大小。 形状问题:目标可能具有不同的形状。 三、算法分类 基于深度学习的目标检测算法主要分为两大类: Two-stage算法:先进行区域生成(Region Proposal),生成有可能包含待检物体的预选框(Region Proposal),再通过卷积神经网络进行样本分类。常见的Two-stage算法包括R-CNN、Fast R-CNN、Faster R-CNN等。 One-stage算法:不用生成区域提议,直接在网络中提取特征来预测物体分类和位置。常见的One-stage算法包括YOLO系列(YOLOv1、YOLOv2、YOLOv3、YOLOv4、YOLOv5等)、SSD和RetinaNet等。 四、算法原理 以YOLO系列为例,YOLO将目标检测视为回归问题,将输入图像一次性划分为多个区域,直接在输出层预测边界框和类别概率。YOLO采用卷积网络来提取特征,使用全连接层来得到预测值。其网络结构通常包含多个卷积层和全连接层,通过卷积层提取图像特征,通过全连接层输出预测结果。 五、应用领域 目标检测技术已经广泛应用于各个领域,为人们的生活带来了极大的便利。以下是一些主要的应用领域: 安全监控:在商场、银行
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值