统计学习理论的本质 笔记 5 模式识别的方法 part 1(5.1-5.5)

5 模式识别的方法

5.1 为什么学习机器能够推广

假设我们采用ERM原则,对给定数目的训练样本上设计了一个十分复杂的学习机器(VC维很大),在训练样本上经验风险可以很小,但置信区间变大,这种现象称为过学习或过适应。故我们希望在两者之间折衷考虑,这产生了两种方法:

  1. 保持置信范围一定(选择适当构造的机器),最小化经验风险,具体实现如神经网络
  2. 保持经验风险固定(如等于0,在完全可分时),最小化置信范围,具体实现如支持向量机

5.2 指示函数的 sigmoid 逼近

考虑指示函数集合
f ( x , ω ) = s g n { ω ⋅ x + b } , ω ∈ R n , b ∈ R f(x, \omega) = sgn\{ \omega \cdot x + b\}, \omega \in \R^n, b \in \R f(x,ω)=sgn{ωx+b},ωRn,bR
当训练数据对于 ω ∈ R n \omega \in \R^n ωRn 无法完全正确分开时,我们只能希望找到错误最少的分类,但这一过程是NP完全的,而且我们无法使用基于梯度的算法找到局部极小值点(因指示函数导数要么为0,要么不存在),因此人们提出了用一个可导的函数(sigmoid)去逼近指示函数。

称平滑单调的函数 S S S 为 sigmoid函数,若 S S S 满足 S ( − ∞ ) = − 1 ,    S ( + ∞ ) = 1 S(-\infty) = -1,\ \ S(+\infty) = 1 S()=1,  S(+)=1
一个典型的例子是
S ( u ) = tanh ⁡ ( u ) = e u − e − u e u + e − u S(u) = \tanh(u) = \dfrac{e^u-e^{-u}}{e^u+e^{-u}} S(u)=tanh(u)=eu+eueueu
简单起见,不考虑常数偏置,设
f ( x , ω ) = S ( ω ⋅ x ) , ω ∈ R n R e m p ( ω ) = 1 l ∑ i = 1 l ( y i − S ( ω ⋅ x i ) ) 2 f(x, \omega) = S(\omega \cdot x), \omega \in \R^n \\ R_{emp} (\omega) = \dfrac{1}{l}\sum\limits_{i=1}^l (y_i - S(\omega \cdot x_i))^2 f(x,ω)=S(ωx),ωRnRemp(ω)=l1i=1l(yiS(ωxi))2
有如下梯度下降解法( n n n 为迭代次数)
g r a d ω R e m p ( ω ) = − 2 l ∑ j = 1 l ( y j − S ( ω ⋅ x j ) ) S ′ ( ω ⋅ x j ) x j T ω n e w = ω o l d − γ ( n ) g r a d ω R e m p ( ω o l d ) grad_\omega R_{emp}(\omega) = -\dfrac{2}{l}\sum\limits_{j=1}^l (y_j - S(\omega \cdot x_j))S'(\omega \cdot x_j)x_j^T \\ \omega_{new} = \omega_{old} - \gamma(n) grad_\omega R_{emp}(\omega_{old}) gradωRemp(ω)=l2j=1l(yjS(ωxj))S(ωxj)xjTωnew=ωoldγ(n)gradωRemp(ωold)
其中,梯度下降法收敛于局部极小值点的充分条件为梯度值有界,且系数满足
∑ n = 1 ∞ γ ( n ) = ∞ ,    ∑ n = 1 ∞ γ 2 ( n ) < ∞ \sum\limits_{n=1}^\infty \gamma(n) = \infty, \ \ \sum\limits_{n=1}^\infty \gamma^2(n) < \infty n=1γ(n)=,  n=1γ2(n)<

# 但实际操作过程中 γ \gamma γ 即学习率始终为定值,尽管在有限次迭代之后终止(这可能意味着在有限次终止之后的无限次假想迭代过程中的 γ \gamma γ 将满足上面的条件)

# 另外 ∂ ω T x ∂ ω = I x = x \dfrac{\partial \omega^T x}{\partial \omega} = Ix=x ωωTx=Ix=x 而非 x T x^T xT 这个情况也值得商榷。(即使从直觉角度, ω \omega ω x x x 的形状相同,而我们要求经验风险梯度的形状与 ω \omega ω 相同, 故也应该是 x x x 而非 x T x^T xT

5.3 神经网络

5.3.1 后向传播方法

假设存在神经网络共 m + 1 m+1 m+1层, 最后一层是单输出的感知器,前 m m m 层满足
x i ( k ) = S ( w ( k ) x i ( k − 1 ) ) , k = 1 , 2 , . . . , m u i ( k ) = w ( k ) x i ( k − 1 ) = [ u i 1 ( k ) , . . . , u i n k ( k ) ] T S ( u i ( k ) ) = [ S ( u i 1 ( k ) ) , . . . , S ( u i n k ( k ) ) ] T x_i(k) = S(w(k)x_i(k-1)), k=1,2,...,m \\ u_i(k) = w(k)x_i(k-1) = [u^1_i(k),...,u^{n_k}_i(k)]^T \\ S(u_i(k)) = [S(u^1_i(k)),...,S(u^{n_k}_i(k))]^T xi(k)=S(w(k)xi(k1)),k=1,2,...,mui(k)=w(k)xi(k1)=[ui1(k),...,uink(k)]TS(ui(k))=[S(ui1(k)),...,S(uink(k))]T
x i ( k ) x_i(k) xi(k) 为第i个样本的第k层向量, w ( k ) w(k) w(k) 是连接第k-1层和第k层的权值矩阵,目标为最小化经验泛函
I ( w ( 1 ) , . . . , w ( m ) ) = 1 l ∑ i = 1 l ( y i − x i ( m ) ) 2 I(w(1),...,w(m)) =\dfrac{1}{l}\sum\limits_{i=1}^l (y_i - x_i(m))^2 I(w(1),...,w(m))=l1i=1l(yixi(m))2
我们将其看作一个带等式约束条件的凸优化问题,采用拉格朗日乘子法解决:
L ( w , x , b ) = 1 l ∑ i = 1 l ( y i − x i ( m ) ) 2 − ∑ i = 1 l ∑ k = 1 m ( b i ( k ) ⋅ [ x i ( k ) − S ( w ( k ) x i ( k − 1 ) ) ] ) L(w,x,b) = \dfrac{1}{l}\sum\limits_{i=1}^l (y_i - x_i(m))^2 -\sum\limits_{i=1}^l\sum\limits_{k=1}^m (b_i(k) \cdot [x_i(k) - S(w(k)x_i(k-1))]) L(w,x,b)=l1i=1l(yixi(m))2i=1lk=1m(bi(k)[xi(k)S(w(k)xi(k1))])

# 需要注意 x i ( m ) x_i(m) xi(m) 为标量,但 x i ( k ) , k ≠ m x_i(k), k \not = m xi(k),k=m 为向量。矩阵求导规律见《矩阵分析和应用(张贤达,第五章)》另外,以下推导出的几个子条件和原文结果不一样,待商榷。

第一个子条件(前向动力)
∂ L ∂ b i ( k ) = 0    →    x i ( k ) = S ( w ( k ) x i ( k − 1 ) ) ,    i = 1 , . . . , l ,   k = 1 , . . . , m \dfrac{\partial L}{\partial b_i(k)} = 0\ \ \to \ \ x_i(k) = S(w(k)x_i(k-1)),\ \ i=1,...,l,\ k=1,...,m bi(k)L=0    xi(k)=S(w(k)xi(k1)),  i=1,...,l, k=1,...,m
第一个子条件(后向动力)
∂ L ∂ x i ( m ) = 0    →    b i ( m ) = − 2 l ( y i − x i ( m ) ) ,    i = 1 , . . . , l ∂ L ∂ x i ( k ) = 0 , k ≠ m    →    0 = ∂ ( − b i ( k ) ⋅ x i ( k ) + b i ( k + 1 ) ⋅ S ( w ( k + 1 ) x i ( k ) ) ) ∂ x i ( k ) = − b i ( k ) + ∂ S ( w ( k ) x i ( k ) ) ∂ x i ( k ) b i ( k + 1 ) → b i ( k ) = ∂ S ( w ( k ) x i ( k ) ) ∂ x i ( k ) b i ( k + 1 ) \dfrac{\partial L}{\partial x_i(m)} = 0\ \ \to \ \ b_i(m) = \dfrac{-2}{l} (y_i - x_i(m)),\ \ i=1,...,l \\ \dfrac{\partial L}{\partial x_i(k)} = 0, k \not = m\ \ \to \ \ 0=\dfrac{\partial (-b_i(k) \cdot x_i(k) + b_i(k+1) \cdot S(w(k+1)x_i(k)))}{\partial x_i(k)}\\=-b_i(k) + \dfrac{\partial S(w(k)x_i(k))}{\partial x_i(k)}b_i(k+1) \to b_i(k) = \dfrac{\partial S(w(k)x_i(k))}{\partial x_i(k)} b_i(k+1) xi(m)L=0    bi(m)=l2(yixi(m)),  i=1,...,lxi(k)L=0,k=m    0=xi(k)(bi(k)xi(k)+bi(k+1)S(w(k+1)xi(k)))=bi(k)+xi(k)S(w(k)xi(k))bi(k+1)bi(k)=xi(k)S(w(k)xi(k))bi(k+1)

第三个子条件(权值更新)
在极值点上 ∂ L ∂ w ( k ) = 0 \dfrac{\partial L}{\partial w(k)} = 0 w(k)L=0, 考虑不在极值点:
w ( k ) ← w ( k ) − γ ( n ) ∂ L ∂ w ( k ) ,    ∂ L ∂ w ( k ) = ∑ i = 1 l b i ( k ) ∂ S ( w ( k ) x i ( k − 1 ) ) ∂ w ( k ) w(k) \leftarrow w(k) - \gamma(n) \dfrac{\partial L}{\partial w(k)}, \ \ \dfrac{\partial L}{\partial w(k)} = \sum\limits_{i=1}^l b_i(k) \dfrac{\partial S(w(k)x_i(k-1))}{\partial w(k)} w(k)w(k)γ(n)w(k)L,  w(k)L=i=1lbi(k)w(k)S(w(k)xi(k1))

5.3.2 后向传播算法
5.3.3 用于回归估计的神经网络

在最后一层用线性函数来取代sigmoid函数即可。

5.3.4 关于后向传播方法的讨论

5.4 最优分类超平面

5.4.1 最优超平面

假定训练数据(向量集合)
( x 1 , y 1 ) , . . . ( x n , y n ) , x ∈ R n , y ∈ { − 1 , + 1 } (x_1,y_1),...(x_n,y_n), x \in \R^n, y \in \{-1, +1\} (x1,y1),...(xn,yn),xRn,y{1,+1}
可以被超平面 w ⋅ x − b = 0 w \cdot x - b = 0 wxb=0 分开。若分开的结果可以为完全正确,且离超平面最近的向量与超平面之间的距离是所有可能中最大的,称该向量集合被最优超平面(最大间隔超平面)分开。
超平面的正确分类结果有:
( w ⋅ x i − b ) { ≥ 1 i f    y i = 1 ≤ − 1 i f    y i = − 1 (w \cdot x_i -b) \begin{cases} \ge 1 &if\ \ y_i = 1 \\ \le -1 &if\ \ y_i = -1 \end{cases} (wxib){11if  yi=1if  yi=1
或者写成 y i [ w ⋅ x i − b ] ≥ 1 ,    i = 1 , . . . , l y_i[w \cdot x_i -b] \ge 1,\ \ i=1,...,l yi[wxib]1,  i=1,...,l
最优超平面除了需要满足分类正确,还需满足距离最大条件,即 min ⁡ { Φ ( w ) = ∣ ∣ w ∣ ∣ 2 } \min\{ \Phi(w) = ||w||^2\} min{Φ(w)=w2}

5.4.2 Δ \Delta Δ-间隔分类超平面

一个超平面 w ∗ ⋅ x − b = 0 , ∣ ∣ w ∗ ∣ ∣ = 1 w^* \cdot x - b = 0, ||w^*|| = 1 wxb=0,w=1 以如下方式将向量 x x x 分类
y = { 1 i f    w ∗ ⋅ x − b ≥ Δ − 1 i f    w ∗ ⋅ x − b ≤ − Δ y=\begin{cases} 1 &if \ \ w^* \cdot x - b \ge \Delta \\ -1 &if \ \ w^* \cdot x - b \le -\Delta \end{cases} y={11if  wxbΔif  wxbΔ
则称该超平面为 Δ \Delta Δ-间隔分类超平面,显然最优超平面为 Δ = 1 / ∣ ∣ w ∣ ∣ \Delta = 1/ ||w|| Δ=1/w Δ \Delta Δ-间隔分类超平面。
定理 5.1
设向量 x ∈ X x \in X xX 在一个半径为 R R R 的球中,那么 Δ \Delta Δ-间隔分类超平面的VC维 h ≤ min ⁡ { [ R 2 Δ 2 ] , n } + 1 h \le \min\{[\dfrac{R^2}{\Delta^2}],n\} + 1 hmin{[Δ2R2],n}+1

5.5 构造最优超平面

即在约束条件 y i [ w ⋅ x i − b ] ≥ 1 ,    i = 1 , . . . , l y_i[w \cdot x_i -b] \ge 1,\ \ i=1,...,l yi[wxib]1,  i=1,...,l 下最小化泛函 Φ ( w ) = 1 2 w ⋅ w \Phi(w) = \dfrac{1}{2}w \cdot w Φ(w)=21ww
采用不等式约束的拉格朗日乘子法
L ( w , b , α ) = 1 2 w ⋅ w + ∑ i = 1 l α i ( 1 − y i [ w ⋅ x i − b ] ) ,    α i ≥ 0 L(w, b, \alpha) = \dfrac{1}{2}w \cdot w + \sum\limits_{i=1}^l \alpha_i (1-y_i[w \cdot x_i -b] ),\ \ \alpha_i \ge 0 L(w,b,α)=21ww+i=1lαi(1yi[wxib]),  αi0
目标为 max ⁡ α min ⁡ w , b L \max\limits_{\alpha}\min\limits_{w, b}L αmaxw,bminL,对拉格朗日函数求梯度得
∂ L ∂ w = 0 → w 0 = ∑ i = 1 l α i 0 y i x i ∂ L ∂ b = 0 → ∑ i = 1 l α i 0 y i = 0 \dfrac{\partial L}{\partial w} = 0 \to w_0 = \sum\limits_{i=1}^l \alpha^0_i y_i x_i\\ \dfrac{\partial L}{\partial b} = 0 \to \sum\limits_{i=1}^l \alpha^0_i y_i = 0 wL=0w0=i=1lαi0yixibL=0i=1lαi0yi=0
代入拉格朗日函数可得到该问题的对偶问题,最大化泛函
W ( α ) = ∑ i = 1 l α i − ∑ i , j = 1 l α i α j y i y j ( x i ⋅ x j ) W(\alpha) = \sum\limits_{i=1}^l \alpha_i - \sum\limits_{i,j=1}^l \alpha_i \alpha_j y_i y_j (x_i \cdot x_j) W(α)=i=1lαii,j=1lαiαjyiyj(xixj)
约束条件为
α i ≥ 0 , i = 1 , . . . , l     ∑ i = 1 l α i 0 y i = 0 \alpha_i \ge 0, i=1,...,l\ \ \ \sum\limits_{i=1}^l \alpha^0_i y_i = 0 αi0,i=1,...,l   i=1lαi0yi=0
α 0 = ( α 1 0 , . . . , α l 0 ) \alpha_0 = (\alpha^0_1,...,\alpha^0_l) α0=(α10,...,αl0) 为该问题的解,那么最优超平面的分类规则为
f ( x ) = s g n { ∑ α i 0 ≠ 0 α i 0 y i ( x i ⋅ x ) − b 0 } f(x) = sgn\{\sum\limits_{\alpha^0_i \not = 0} \alpha^0_i y_i (x_i \cdot x) - b_0\} f(x)=sgn{αi0=0αi0yi(xix)b0}
还需满足 Kuhn-Tucker条件(极值点存在的充要条件)
α i 0 ( 1 − y i [ w 0 ⋅ x i − b 0 ] ) = 0 ,    i = 1 , . . , l \alpha^0_i (1-y_i[w_0 \cdot x_i -b_0] ) = 0,\ \ i=1,..,l αi0(1yi[w0xib0])=0,  i=1,..,l
由 Kuhn-Tucker条件得
b 0 = w 0 x i s v − y i = 1 2 [ w 0 x i , y i = 1 s v + w 0 x j , y j = − 1 s v ] b_0 = w_0 x^{sv}_i - y_i = \dfrac{1}{2}[w_0 x^{sv}_{i, y_i = 1} + w_0 x^{sv}_{j,y_j = -1}] b0=w0xisvyi=21[w0xi,yi=1sv+w0xj,yj=1sv]
其中, x i s v x^{sv}_i xisv 表明该向量为支持向量(support vector)。 x i , y i = 1 s v x^{sv}_{i, y_i = 1} xi,yi=1sv 为从满足 y i = 1 y_i=1 yi=1 的支持向量中任取的某一向量。支持向量为满足 α i 0 ≠ 0 \alpha^0_i \not = 0 αi0=0 的向量。

不可分情况的推广

为在数据线性不可分时构造最优的超平面,引入变量 ξ i \xi_i ξi表示分类误差,常数参数 σ > 0 \sigma > 0 σ>0, 试图最小化泛函(最小化误差) F σ ( ξ ) = ∑ i = 1 l ξ i σ F_\sigma(\xi) = \sum\limits_{i=1}^l \xi^\sigma_i Fσ(ξ)=i=1lξiσ,约束条件为“在一定误差下分类正确”、“分类间隔至少为 Δ \Delta Δ”,即
ξ i ≥ 0 ,    y i ( w ⋅ x i − b ) ≥ 1 − ξ i ,    w ⋅ w ≤ Δ − 2 \xi_i \ge 0,\ \ y_i(w \cdot x_i - b) \ge 1-\xi_i,\ \ w \cdot w \le \Delta ^ {-2} ξi0,  yi(wxib)1ξi,  wwΔ2
这是一个很好的SRM的例子,上述过程定义了结构(由定理5.1,各元素VC维递增,故为一个结构)
S n = { w ⋅ x − b : w ⋅ w ≤ c n = Δ n − 2 } ,    Δ n ≤ Δ n − 1 S_n=\{w \cdot x -b: w \cdot w \le c_n = \Delta^{-2}_n\},\ \ \Delta_n \le \Delta_{n-1} Sn={wxb:wwcn=Δn2},  ΔnΔn1
为计算方便,考虑 σ = 1 \sigma = 1 σ=1

构造 Δ \Delta Δ-间隔超平面
拉格朗日函数
L ( w , b , α , β , λ ) = ∑ i = 1 l ξ i σ + ∑ i = 1 l α i ( 1 − ξ i − y i ( w ⋅ x i − b ) ) + ∑ i = 1 l β i ( − ξ i ) + λ 2 ( w ⋅ w − Δ − 2 ) σ = 1 ,    α i ≥ 0 ,    β i ≥ 0 ,    λ ≥ 0 L(w, b, \alpha, \beta, \lambda) = \sum\limits_{i=1}^l \xi^\sigma_i + \sum\limits_{i=1}^l \alpha_i (1- \xi_i - y_i(w \cdot x_i -b)) + \sum\limits_{i=1}^l \beta_i (-\xi_i) + \dfrac{\lambda}{2} (w \cdot w - \Delta^{-2}) \\ \sigma = 1,\ \ \alpha_i \ge 0,\ \ \beta_i \ge 0,\ \ \lambda \ge 0 L(w,b,α,β,λ)=i=1lξiσ+i=1lαi(1ξiyi(wxib))+i=1lβi(ξi)+2λ(wwΔ2)σ=1,  αi0,  βi0,  λ0
目标为 max ⁡ α , β , λ min ⁡ w , b , ξ L \max\limits_{\alpha, \beta, \lambda}\min\limits_{w, b, \xi}L α,β,λmaxw,b,ξminL,对拉格朗日函数求梯度得
∂ L ∂ w = 0 → w = 1 λ ∑ i = 1 l α i y i x i ∂ L ∂ b = 0 → ∑ i = 1 l α i y i = 0 ∂ L ∂ ξ i = 0 → α i + β i = 1 \dfrac{\partial L}{\partial w} = 0 \to w = \dfrac{1}{\lambda} \sum\limits_{i=1}^l \alpha_i y_i x_i\\ \dfrac{\partial L}{\partial b} = 0 \to \sum\limits_{i=1}^l \alpha_i y_i = 0\\ \dfrac{\partial L}{\partial \xi_i} = 0 \to \alpha_i + \beta_i = 1 wL=0w=λ1i=1lαiyixibL=0i=1lαiyi=0ξiL=0αi+βi=1
代入拉格朗日函数可得到该问题的对偶问题,最大化泛函
W ( α , λ ) = ∑ i = 1 l α i − 1 2 λ ∑ i , j = 1 l α i α j y i y j ( x i ⋅ x j ) − λ 2 Δ 2 W(\alpha,\lambda) = \sum\limits_{i=1}^l \alpha_i - \dfrac{1}{2\lambda}\sum\limits_{i,j=1}^l \alpha_i \alpha_j y_i y_j (x_i \cdot x_j) - \dfrac{\lambda}{2\Delta^2} W(α,λ)=i=1lαi2λ1i,j=1lαiαjyiyj(xixj)2Δ2λ
约束条件
∑ i = 1 l α i y i = 0 ,    λ ≥ 0 ,    0 ≤ α i ≤ 1 ,    β i = 1 − α i \sum\limits_{i=1}^l \alpha_i y_i = 0,\ \ \lambda \ge 0,\ \ 0 \le \alpha_i \le 1,\ \ \beta_i = 1- \alpha_i i=1lαiyi=0,  λ0,  0αi1,  βi=1αi
还需满足 Kuhn-Tucker条件
α i ( 1 − ξ i − y i ( w ⋅ x i − b ) ) = 0 ,    β i ξ i = 0 ,    λ 2 ( w ⋅ w − Δ − 2 ) = 0 \alpha_i (1- \xi_i - y_i(w \cdot x_i -b)) = 0,\ \ \beta_i \xi_i = 0,\ \ \dfrac{\lambda}{2} (w \cdot w - \Delta^{-2}) = 0 αi(1ξiyi(wxib))=0,  βiξi=0,  2λ(wwΔ2)=0

构造软间隔分类超平面(广义最优超平面)
为简化问题,采用软间隔的概念,即不再硬性规定分类间隔所需满足的条件(如 w ⋅ w ≤ c n w \cdot w \le c_n wwcn),而是给定常数 C > 0 C > 0 C>0 ,最小化泛函
Φ ( w , ξ ) = 1 2 w ⋅ w + C ∑ i = 1 l ξ i \Phi(w, \xi) = \dfrac{1}{2} w \cdot w + C \sum\limits_{i=1}^l\xi_i Φ(w,ξ)=21ww+Ci=1lξi
显然当 C = λ 0 C= \lambda^0 C=λ0 时,该问题与之前的问题完全等价。约束条件为
ξ i ≥ 0 ,    y i ( w ⋅ x i − b ) ≥ 1 − ξ i \xi_i \ge 0,\ \ y_i(w \cdot x_i - b) \ge 1-\xi_i ξi0,  yi(wxib)1ξi
拉格朗日函数
L ( w , b , α , β ) = 1 2 w ⋅ w + C ∑ i = 1 l ξ i σ + ∑ i = 1 l α i ( 1 − ξ i − y i ( w ⋅ x i − b ) ) + ∑ i = 1 l β i ( − ξ i ) ,     α i ≥ 0 ,    β i ≥ 0 L(w, b, \alpha, \beta) = \dfrac{1}{2} w \cdot w + C\sum\limits_{i=1}^l \xi^\sigma_i + \sum\limits_{i=1}^l \alpha_i (1- \xi_i - y_i(w \cdot x_i -b)) + \sum\limits_{i=1}^l \beta_i (-\xi_i),\ \ \ \alpha_i \ge 0,\ \ \beta_i \ge 0 L(w,b,α,β)=21ww+Ci=1lξiσ+i=1lαi(1ξiyi(wxib))+i=1lβi(ξi),   αi0,  βi0
目标为 max ⁡ α , β min ⁡ w , b , ξ L \max\limits_{\alpha, \beta}\min\limits_{w, b, \xi}L α,βmaxw,b,ξminL,对拉格朗日函数求梯度得
∂ L ∂ w = 0 → w = ∑ i = 1 l α i y i x i ∂ L ∂ b = 0 → ∑ i = 1 l α i y i = 0 ∂ L ∂ ξ i = 0 → α i + β i = 1 \dfrac{\partial L}{\partial w} = 0 \to w = \sum\limits_{i=1}^l \alpha_i y_i x_i\\ \dfrac{\partial L}{\partial b} = 0 \to \sum\limits_{i=1}^l \alpha_i y_i = 0\\ \dfrac{\partial L}{\partial \xi_i} = 0 \to \alpha_i + \beta_i = 1 wL=0w=i=1lαiyixibL=0i=1lαiyi=0ξiL=0αi+βi=1
代入拉格朗日函数可得到该问题的对偶问题,最大化泛函
W ( α ) = ∑ i = 1 l α i − 1 2 ∑ i , j = 1 l α i α j y i y j ( x i ⋅ x j ) W(\alpha) = \sum\limits_{i=1}^l \alpha_i - \dfrac{1}{2}\sum\limits_{i,j=1}^l \alpha_i \alpha_j y_i y_j (x_i \cdot x_j) W(α)=i=1lαi21i,j=1lαiαjyiyj(xixj)
约束条件
∑ i = 1 l α i y i = 0 ,    0 ≤ α i ≤ C ,    β i = C − α i \sum\limits_{i=1}^l \alpha_i y_i = 0,\ \ 0 \le \alpha_i \le C,\ \ \beta_i = C- \alpha_i i=1lαiyi=0,  0αiC,  βi=Cαi
还需满足 Kuhn-Tucker条件
α i ( 1 − ξ i − y i ( w ⋅ x i − b ) ) = 0 ,    β i ξ i = 0 \alpha_i (1- \xi_i - y_i(w \cdot x_i -b)) = 0,\ \ \beta_i \xi_i = 0 αi(1ξiyi(wxib))=0,  βiξi=0

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值