支持向量机中的拉格朗日函数的矩阵偏导基础

前言

西瓜书中,123页的(6.8)式:
L ( w , b , α ) = 1 2 ∥ w ∥ 2 + ∑ i = 1 m α i ( 1 − y i ( w T x i + b ) ) (6.8) L(w,b,\alpha)=\displaystyle\frac{1}{2}\|w\|^{2}+ \sum\limits_{i=1}^{m} \alpha_i(1-y_i(w^{T}x_i+b)) \tag{6.8} L(w,b,α)=21w2+i=1mαi(1yi(wTxi+b))(6.8)
其中 α = ( α 1 , α 2 , . . . α n ) \alpha=(\alpha_1,\alpha_2,...\alpha_n) α=(α1,α2,...αn)。令 ∂ L ∂ w = 0 \displaystyle\frac{\partial L}{\partial w}=0 wL=0 ∂ L ∂ b = 0 \displaystyle\frac{\partial L}{\partial b}=0 bL=0,可得
w = ∑ i = 1 m α i y i x i (6.9) w= \sum\limits_{i=1}^{m} \alpha_iy_ix_i \tag{6.9} w=i=1mαiyixi(6.9)
0 = ∑ i = 1 m α i y i (6.10) 0= \sum\limits_{i=1}^{m} \alpha_iy_i \tag{6.10} 0=i=1mαiyi(6.10)
求解过程需要掌握线性代数中的向量的范数和矩阵论课程中矩阵求导的相关知识,因此对向量的范数和矩阵求导进行相关的说明,再进一步对 L ( w , b , α ) L(w,b,\alpha) L(w,b,α)求解。

一、向量的范数

向量的范数
若向量 w = ( w 1 , w 2 , . . . , w n ) ∈ ℜ n w=(w_1,w_2,...,w_n)\in \Re^n w=(w1,w2,...,wn)n的范数(长度)定义为
∥ w ∥ = w 1 2 + w 2 2 + . . . + w n 2 \|w\|=\sqrt{w_1^2+w_2^2+...+w_n^2} w=w12+w22+...+wn2
因此有
∥ w ∥ = w 1 2 + w 2 2 + . . . + w n 2 = w w T \|w\|=\sqrt{w_1^2+w_2^2+...+w_n^2}=\sqrt{ww^T} w=w12+w22+...+wn2 =wwT

∥ w ∥ 2 = w 1 2 + w 2 2 + . . . + w n 2 = w w T \|w\|^2=w_1^2+w_2^2+...+w_n^2=ww^T w2=w12+w22+...+wn2=wwT

二、矩阵的求导

例:
假设现有一个二元函数:
f ( x 1 , x 2 ) = 3 x 1 + 2 x 2 f(x_1,x_2)=3x_1+2x_2 f(x1,x2)=3x1+2x2
分别对该函数求偏导,则有
∂ f ∂ x 1 = 3 \displaystyle\frac{\partial f}{\partial x_1}=3 x1f=3
∂ f ∂ x 2 = 2 \displaystyle\frac{\partial f}{\partial x_2}=2 x2f=2
那么由多个变量组成的向量,即向量变元可写为
x = [ x 1 , x 2 ] T x=[x_1,x_2]^T x=[x1,x2]T
可以得到对函数 f f f进行向量变元 x x x的向量求导的结果是
∂ f ( x ) ∂ x = [ 3 2 ] \displaystyle\frac{\partial f(x)}{\partial x}= \begin{bmatrix} 3 \\ 2 \end{bmatrix} xf(x)=[32]
这就是向量 x x x求导的结果。
二元函数 f ( x 1 , x 2 ) = 3 x 1 + 2 x 2 f(x_1,x_2)=3x_1+2x_2 f(x1,x2)=3x1+2x2可写为
f ( x ) = A T x f(x)=A^Tx f(x)=ATx
其中 A = [ 3 , 2 ] T A=[3,2]^T A=[3,2]T
x = [ x 1 , x 2 ] T x=[x_1,x_2]^T x=[x1,x2]T
所以
∂ f ( x ) ∂ x = ∂ A T x ∂ x = A = [ 3 2 ] \displaystyle\frac{\partial f(x)}{\partial x}=\displaystyle\frac{\partial A^Tx}{\partial x}=A= \begin{bmatrix} 3 \\ 2 \end{bmatrix} xf(x)=xATx=A=[32]

1. 向量求导的梯度向量形式

一般的情况下,设 f ( x ) f(x) f(x)是一个关于向量变元x的函数,且
x = [ x 1 , x 2 , . . . , x n ] T x=[x_1,x_2,...,x_n]^T x=[x1,x2,...,xn]T

∂ f ( x ) ∂ x = [ ∂ f ∂ x 1 , ∂ f ∂ x 2 , . . . , ∂ f ∂ x n ] T \displaystyle\frac{\partial f(x)}{\partial x}=[\displaystyle\frac{\partial f}{\partial x_1},\displaystyle\frac{\partial f}{\partial x_2},...,\displaystyle\frac{\partial f}{\partial x_n}]^T xf(x)=[x1f,x2f,...,xnf]T
此式也被称为向量求导的梯度向量形式:
▽ x f ( x ) = ∂ f ( x ) ∂ x = [ ∂ f ∂ x 1 , ∂ f ∂ x 2 , . . . , ∂ f ∂ x n ] T \bigtriangledown_x f(x) =\displaystyle\frac{\partial f(x)}{\partial x}=[\displaystyle\frac{\partial f}{\partial x_1},\displaystyle\frac{\partial f}{\partial x_2},...,\displaystyle\frac{\partial f}{\partial x_n}]^T xf(x)=xf(x)=[x1f,x2f,...,xnf]T
因此,矩阵求导与向量求导类似。

1. 结论1

∂ ( x T A ) ∂ x = ∂ ( A x T ) ∂ x = A \displaystyle\frac{\partial (x^TA)}{\partial x}=\displaystyle\frac{\partial (Ax^T)}{\partial x}=A x(xTA)=x(AxT)=A
证明:
A = [ a 1 , a 2 , . . . , a n ] T A=[a_1,a_2,...,a_n]^T A=[a1,a2,...,an]T,其中 a 1 , a 2 , . . . , a n a_1,a_2,...,a_n a1,a2,...,an为常数,则有
∂ ( x T A ) ∂ x = ∂ ( A x T ) ∂ x = A \displaystyle\frac{\partial (x^TA)}{\partial x}=\displaystyle\frac{\partial (Ax^T)}{\partial x}=A x(xTA)=x(AxT)=A
证明如下:
∂ ( x T A ) ∂ x = ∂ ( A x T ) ∂ x = ∂ ( a 1 x 1 + a 2 x 2 + . . . + a n x n ) ∂ x = [ ∂ ( a 1 x 1 + a 2 x 2 + . . . + a n x n ) ∂ x 1 ∂ ( a 1 x 1 + a 2 x 2 + . . . + a n x n ) ∂ x 2 . . . ∂ ( a 1 x 1 + a 2 x 2 + . . . + a n x n ) ∂ x n ] = [ a 1 , a 2 , . . . , a n ] T = A \begin{aligned} \displaystyle\frac{\partial (x^TA)}{\partial x}&=\displaystyle\frac{\partial (Ax^T)}{\partial x}\\ &=\displaystyle\frac{\partial (a_1x_1+a_2x_2+...+a_nx_n)}{\partial x}\\ &= \begin{bmatrix} \displaystyle\frac{\partial (a_1x_1+a_2x_2+...+a_nx_n)}{\partial x_1}\\ \displaystyle\frac{\partial (a_1x_1+a_2x_2+...+a_nx_n)}{\partial x_2} \\ ...\\ \displaystyle\frac{\partial (a_1x_1+a_2x_2+...+a_nx_n)}{\partial x_n} \\ \end{bmatrix} \\ &=[a_1,a_2,...,a_n]^T\\ &=A\\ \end{aligned} x(xTA)=x(AxT)=x(a1x1+a2x2+...+anxn)=x1(a1x1+a2x2+...+anxn)x2(a1x1+a2x2+...+anxn)...xn(a1x1+a2x2+...+anxn)=[a1,a2,...,an]T=A

2. 结论2

∂ ( x T x ) ∂ x = 2 x \displaystyle\frac{\partial (x^Tx)}{\partial x}=2x x(xTx)=2x
证明:
∂ ( x T x ) ∂ x = [ ∂ ( x 1 2 + x 2 2 + . . . + x n 2 ) ∂ x 1 ∂ ( x 1 2 + x 2 2 + . . . + x n 2 ) ∂ x 2 . . . ∂ ( x 1 2 + x 2 2 + . . . + x n 2 ) ∂ x n ] = [ 2 x 1 , 2 x 2 , . . . , 2 x n ] T = 2 x \begin{aligned} \displaystyle\frac{\partial (x^Tx)}{\partial x}&= \begin{bmatrix} \displaystyle\frac{\partial (x_1^2+x_2^2+...+x_n^2)}{\partial x_1}\\ \displaystyle\frac{\partial (x_1^2+x_2^2+...+x_n^2)}{\partial x_2} \\ ...\\ \displaystyle\frac{\partial (x_1^2+x_2^2+...+x_n^2)}{\partial x_n} \\ \end{bmatrix} \\ &=[2x_1,2x_2,...,2x_n]^T\\ &=2x \end{aligned} x(xTx)=x1(x12+x22+...+xn2)x2(x12+x22+...+xn2)...xn(x12+x22+...+xn2)=[2x1,2x2,...,2xn]T=2x

三、 ∂ ∥ w ∥ 2 ∂ w \displaystyle\frac{\partial \|w\|^2}{\partial w} ww2的结果

根据一和二的讨论,则有
∂ ∥ w ∥ 2 ∂ w = ∂ ( w w T ) ∂ w = 2 w \displaystyle\frac{\partial \|w\|^2}{\partial w}=\displaystyle\frac{\partial (ww^T)}{\partial w}= 2w ww2=w(wwT)=2w

四、(6.8)式的计算结果


∂ L ∂ w = 0 \displaystyle\frac{\partial L}{\partial w}=0 wL=0

∂ L ∂ w = ∂ ( 1 2 ∥ w ∥ 2 + ∑ i = 1 m α i ( 1 − y i ( w T x i + b ) ) ) ∂ w = ∂ 1 2 ∥ w ∥ 2 ∂ w + ∂ ( ∑ i = 1 m α i ) ∂ w − ∂ ( ∑ i = 1 m α i y i w T x i ) ∂ w − ∂ ( ∑ i = 1 m α i y i b ) ∂ w \begin{aligned} \displaystyle\frac{\partial L}{\partial w}&=\displaystyle\frac{\partial (\displaystyle\frac{1}{2}\|w\|^{2}+ \sum\limits_{i=1}^{m} \alpha_i(1-y_i(w^{T}x_i+b))) }{\partial w}\\ &=\frac{\partial \displaystyle\frac{1}{2}\|w\|^{2}}{{\partial w}}+\frac{\partial \displaystyle (\sum\limits_{i=1}^{m} \alpha_i)}{{\partial w}}-\frac{\partial \displaystyle (\sum\limits_{i=1}^{m} \alpha_iy_iw^{T}x_i)}{{\partial w}}-\frac{\partial \displaystyle (\sum\limits_{i=1}^{m} \alpha_iy_ib)}{{\partial w}} \end{aligned} wL=w(21w2+i=1mαi(1yi(wTxi+b)))=w21w2+w(i=1mαi)w(i=1mαiyiwTxi)w(i=1mαiyib)
因为 α i , y i , b \alpha_i,y_i,b αi,yi,b w w w无关,因此
∂ ( ∑ i = 1 m α i ) ∂ w = 0 \displaystyle \frac{\partial \displaystyle (\sum\limits_{i=1}^{m} \alpha_i)}{{\partial w}}=0 w(i=1mαi)=0
∂ ( ∑ i = 1 m α i y i b ) ∂ w = 0 \displaystyle \frac{\partial \displaystyle (\sum\limits_{i=1}^{m} \alpha_iy_ib)}{{\partial w}}=0 w(i=1mαiyib)=0
又因为
1 2 ∂ ∥ w ∥ 2 ∂ w = w \displaystyle\frac{1}{2}\frac{\partial \|w\|^{2}}{{\partial w}}=w 21ww2=w
所以
∂ L ∂ w = w − ∂ ( ∑ i = 1 m α i y i w T x i ) ∂ w = w − ∑ i = 1 m α i y i x i = 0 \begin{aligned} \displaystyle\frac{\partial L}{\partial w}&=w-\frac{\partial \displaystyle (\sum\limits_{i=1}^{m} \alpha_iy_iw^{T}x_i)}{{\partial w}}\\ &=w-\sum\limits_{i=1}^{m} \alpha_iy_ix_i\\ &=0 \end{aligned} wL=ww(i=1mαiyiwTxi)=wi=1mαiyixi=0

w = ∑ i = 1 m α i y i x i (6.9) w=\sum\limits_{i=1}^{m} \alpha_iy_ix_i \tag{6.9} w=i=1mαiyixi(6.9)
∂ L ∂ b = 0 \displaystyle\frac{\partial L}{\partial b}=0 bL=0时, α i , y i , x i , w T \alpha_i,y_i,x_i,w^T αi,yi,xi,wT b b b无关,因此
∂ L ∂ b = 0 − ∑ i = 1 m α i y i = 0 \displaystyle\frac{\partial L}{\partial b}=0-\sum\limits_{i=1}^{m} \alpha_iy_i=0 bL=0i=1mαiyi=0
所以 ∑ i = 1 m α i y i = 0 (6.10) \sum\limits_{i=1}^{m} \alpha_iy_i=0 \tag{6.10} i=1mαiyi=0(6.10)

总结

在支持向量机中需要太多的数学知识,做这个笔记是给自己一个记录。
本文参考了B站视频:
机器学习中的矩阵求导方法,https://www.bilibili.com/medialist/detail/ml1590616425?type=1&spm_id_from=333.999.0.0

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值