人工智能基础数学之矩阵求导快速入门

0.参考链接

矩阵求导

1.标量函数和向量函数

{ y 1 = w 1 x 11 + w 2 x 12 + ⋯ + w n x 1 n ⋮ y m = w 1 x m 1 + w 2 x m 2 + ⋯ + w n x m n \begin{cases} y_1=w_1x_{11}+w_2x_{12}+\cdots+w_nx_{1n} \\ \vdots \\ y_m=w_1x_{m1}+w_2x_{m2}+\cdots+w_nx_{mn} \end{cases} y1=w1x11+w2x12++wnx1nym=w1xm1+w2xm2++wnxmn

1.1标量函数:

1.输出为标量的函数为标量函数:
输入,输出都是标量:
f ( x ) = x 2 , R x → R x 2 f(x)=x^2,\R_x \rightarrow \R_{x^2} f(x)=x2,RxRx2
输入不是标量,输出是标量:
f ( x ) = x 1 2 + x 2 2 , R ( x 1 , x 2 ) T 2 → R x 1 2 + x 2 2 f(x)=x_1^2+x_2^2,\R^2_{(x_1,x_2)^T} \rightarrow \R_{x_1^2+x_2^2} f(x)=x12+x22,R(x1,x2)T2Rx12+x22

2.输出为向量的函数为向量函数:
输入是标量,输出是向量:
f ( x ) = [ f 1 ( x ) = x f 2 ( x ) = x 2 ] , R x → R ( x , x 2 ) T 2 f(x)=\left[ \begin{matrix}f_1(x)=x \\ f_2(x)=x^2\end{matrix} \right],\R_x \rightarrow \R^2_{(x,x^2)^T} f(x)=[f1(x)=xf2(x)=x2],RxR(x,x2)T2
输入,输出都是向量:
f ( x ) = [ f 11 ( x ) = x f 12 ( x ) = x 2 f 21 ( x ) = x 3 f 22 ( x ) = x 4 ] , R x → R 2 ∗ 2 f(x)=\left[ \begin{matrix}f_{11}(x)=x & f_{12}(x)=x^2\\f_{21}(x)=x^3 & f_{22}(x)=x^4\end{matrix} \right],\R_x \rightarrow \R^{2*2} f(x)=[f11(x)=xf21(x)=x3f12(x)=x2f22(x)=x4],RxR22

f ( x ) = [ f 11 ( x ) = x 1 + x 2 f 12 ( x ) = x 1 2 + x 2 2 f 21 ( x ) = x 1 3 + x 2 3 f 22 ( x ) = x 1 4 + x 2 4 ] , R ( x 1 , x 2 ) T 2 → R 2 ∗ 2 f(x)=\left[ \begin{matrix}f_{11}(x)=x_1+x_2 & f_{12}(x)=x_1^2+x_2^2\\f_{21}(x)=x_1^3+x_2^3 & f_{22}(x)=x_1^4+x_2^4\end{matrix} \right],\R^2_{(x_1,x_2)^T} \rightarrow \R^{2*2} f(x)=[f11(x)=x1+x2f21(x)=x13+x23f12(x)=x12+x22f22(x)=x14+x24],R(x1,x2)T2R22

1.2总结

x → 标量,向量,矩阵 f ( x ) → 标量,向量,矩阵 x\rightarrow 标量,向量,矩阵\\f(x)\rightarrow 标量,向量,矩阵 x标量,向量,矩阵f(x)标量,向量,矩阵
只看标量和向量,那 d f ( x ) d x \frac{df(x)}{dx} dxdf(x)就有四种可能。
矩阵求导的本质: d A d B \frac{dA}{dB} dBdA就是矩阵 A A A中的每一个元素对矩阵 B B B中的每一个元素求导。

1.3从求导后元素个数的角度

A 1 ∗ 1 , B 1 ∗ 1 → d A d B 1 ∗ 1 A_{1*1},B_{1*1} \rightarrow \frac{dA}{dB}_{1*1} A11,B11dBdA11
A 1 ∗ p , B 1 ∗ n → d A d B p ∗ n A_{1*p},B_{1*n} \rightarrow \frac{dA}{dB}_{p*n} A1p,B1ndBdApn,
A q ∗ p , B m ∗ n → d A d B p ∗ q ∗ m ∗ n A_{q*p},B_{m*n} \rightarrow \frac{dA}{dB}_{p*q*m*n} Aqp,BmndBdApqmn

2.求导法(YX拉伸法)(重点)

{ 1. 标量不变,向量拉伸 2. 前面横向拉,后面纵向拉 \begin{cases} 1.标量不变,向量拉伸 \\ 2.前面横向拉,后面纵向拉 \end{cases} {1.标量不变,向量拉伸2.前面横向拉,后面纵向拉
以下是三个例子:
例1: d f ( x ) d x , f ( x ) = f ( x 1 , ⋯   , x n ) 是标量函数 , x = [ x 1 , x 2 , ⋯   , x n ] T 是向量 \frac{df(x)}{dx},f(x)=f(x_1,\cdots,x_n)是标量函数,x=[x_1,x_2,\cdots,x_n]^T是向量 dxdf(x),f(x)=f(x1,,xn)是标量函数,x=[x1,x2,,xn]T是向量
d f ( x ) d x = [ ∂ f ( x ) ∂ x 1 ∂ f ( x ) ∂ x 2 ⋮ ∂ f ( x ) ∂ x n ] \frac{df(x)}{dx}=\left[ \begin{matrix}\frac{\partial f(x)}{\partial x_1} \\ \frac{\partial f(x)}{\partial x_2}\\ \vdots\\\frac{\partial f(x)}{\partial x_n}\end{matrix} \right] dxdf(x)= x1f(x)x2f(x)xnf(x) ,可以看出结果是 f ( x ) f(x) f(x)是标量不变, x x x是向量纵向拉伸,实际就是将多元函数的偏导数写在一个列向量中。

例2: d f ( x ) d x , f ( x ) = [ f 1 ( x ) f 2 ( x ) ⋮ f n ( x ) ] 是向量函数 , x 是标量 \frac{df(x)}{dx},f(x)=\left[ \begin{matrix}f_1(x)\\ f_2(x)\\\vdots\\f_n(x)\end{matrix} \right]是向量函数,x是标量 dxdf(x),f(x)= f1(x)f2(x)fn(x) 是向量函数,x是标量 d f ( x ) d x = [ ∂ f 1 ( x ) ∂ x ∂ f 2 ( x ) ∂ x ⋯ ∂ f n ( x ) ∂ x ] \frac{df(x)}{dx}=\left[ \begin{matrix}\frac{\partial f_1(x)}{\partial x} &\frac{\partial f_2(x)}{\partial x}& \cdots&\frac{\partial f_n(x)}{\partial x}\end{matrix} \right] dxdf(x)=[xf1(x)xf2(x)xfn(x)],符合前面横向拉伸。

例3: d f ( x ) d x , f ( x ) = [ f 1 ( x ) f 2 ( x ) ⋮ f n ( x ) ] 是向量函数 , [ x 1 , x 2 , ⋯   , x n ] T 是向量 \frac{df(x)}{dx},f(x)=\left[ \begin{matrix}f_1(x)\\ f_2(x)\\\vdots\\f_n(x)\end{matrix} \right]是向量函数,[x_1,x_2,\cdots,x_n]^T是向量 dxdf(x),f(x)= f1(x)f2(x)fn(x) 是向量函数,[x1,x2,,xn]T是向量
d f ( x ) d x = [ ∂ f ( x ) ∂ x 1 ∂ f ( x ) ∂ x 2 ⋮ ∂ f ( x ) ∂ x n ] \frac{df(x)}{dx} =\left[ \begin{matrix} \frac{\partial f(x)}{\partial x_1} \\ \frac{\partial f(x)}{\partial x_2}\\ \vdots\\ \frac{\partial f(x)}{\partial x_n} \end{matrix} \right] dxdf(x)= x1f(x)x2f(x)xnf(x) ,此时 ∂ f ( x ) \partial f(x) f(x)是向量,而 x 1 x_1 x1等是标量,所以 d f ( x ) d x = [ ∂ f ( x ) ∂ x 1 ∂ f ( x ) ∂ x 2 ⋮ ∂ f ( x ) ∂ x n ] = [ ∂ f 1 ( x ) ∂ x 1 ∂ f 2 ( x ) ∂ x 1 ⋯ ∂ f n ( x ) ∂ x 1 ∂ f 1 ( x ) ∂ x 2 ∂ f 2 ( x ) ∂ x 2 ⋯ ∂ f n ( x ) ∂ x 2 ⋮ ⋮ ⋯ ⋮ ∂ f 1 ( x ) ∂ x n ∂ f 2 ( x ) ∂ x n ⋯ ∂ f n ( x ) ∂ x n ] \frac{df(x)}{dx} =\left[\begin{matrix} \frac{\partial f(x)}{\partial x_1} \\ \frac{\partial f(x)}{\partial x_2}\\ \vdots\\ \frac{\partial f(x)}{\partial x_n} \end{matrix} \right] =\left[ \begin{matrix} \frac{\partial f_1(x)}{\partial x_1} & \frac{\partial f_2(x)}{\partial x_1} & \cdots &\frac{\partial f_n(x)}{\partial x_1} &\\ \frac{\partial f_1(x)}{\partial x_2}& \frac{\partial f_2(x)}{\partial x_2}&\cdots & \frac{\partial f_n(x)}{\partial x_2}&\\ \vdots & \vdots & \cdots & \vdots\\ \frac{\partial f_1(x)}{\partial x_n} & \frac{\partial f_2(x)}{\partial x_n} & \cdots & \frac{\partial f_n(x)}{\partial x_n}\end{matrix} \right] dxdf(x)= x1f(x)x2f(x)xnf(x) = x1f1(x)x2f1(x)xnf1(x)x1f2(x)x2f2(x)xnf2(x)x1fn(x)x2fn(x)xnfn(x) ,符合先纵向拉伸,再横向拉伸。

3.常见的矩阵求导公式推导

例1: f ( x ) = A T X , A = [ a 1 a 2 ⋮ a n ] n ∗ 1 , X = [ x 1 x 2 ⋮ x n ] n ∗ 1 , 求 d f ( x ) d x f(x)=A^TX,A=\left[\begin{matrix}a_1\\a_2\\\vdots\\a_n\end{matrix} \right]_{n*1},X=\left[\begin{matrix}x_1\\x_2\\\vdots\\x_n\end{matrix} \right]_{n*1},求\frac{df(x)}{dx} f(x)=ATX,A= a1a2an n1,X= x1x2xn n1,dxdf(x)
解: f ( x ) 是标量函数, X 是向量, f ( x ) = A T X = ∑ i = 1 n a i x i f(x)是标量函数,X是向量,f(x)=A^TX=\sum_{i=1}^na_ix_i f(x)是标量函数,X是向量,f(x)=ATX=i=1naixi
d f ( x ) d X = [ ∂ f ( x ) ∂ x 1 ∂ f ( x ) ∂ x 2 ⋮ ∂ f ( x ) ∂ x n ] = [ a 1 a 2 ⋮ a n ] = A \frac{df(x)}{dX}=\left[ \begin{matrix}\frac{\partial f(x)}{\partial x_1} \\ \frac{\partial f(x)}{\partial x_2}\\ \vdots\\\frac{\partial f(x)}{\partial x_n}\end{matrix} \right]=\left[ \begin{matrix}a_1 \\ a_2\\ \vdots \\ a_n\end{matrix} \right]=A dXdf(x)= x1f(x)x2f(x)xnf(x) = a1a2an =A,注意到, f ( x ) = A T X = X T A (标 量 T = 标量) f(x)=A^TX=X^TA(标量^T=标量) f(x)=ATX=XTA(标T=标量),所以 d A T X d X = d X T A d X = A \frac{dA^TX}{dX}=\frac{dX^TA}{dX}=A dXdATX=dXdXTA=A

例2, f ( x ) = X T A X , X = [ x 1 x 2 ⋮ x n ] n ∗ 1 , A = [ a 11 a 12 ⋯ a 1 n a 21 a 22 ⋯ a 2 n ⋮ ⋮ ⋯ ⋮ a n 1 a n 2 ⋯ a n n ] n ∗ n , 求 d f ( x ) d X f(x)=X^TAX,X=\left[\begin{matrix}x_1\\x_2\\\vdots\\x_n\end{matrix} \right]_{n*1},A=\left[\begin{matrix}a_{11}& a_{12}&\cdots&a_{1n}\\a_{21}& a_{22}&\cdots&a_{2n}\\\vdots&\vdots&\cdots&\vdots\\a_{n1}& a_{n2}&\cdots&a_{nn}\end{matrix} \right]_{n*n},求\frac{df(x)}{dX} f(x)=XTAX,X= x1x2xn n1,A= a11a21an1a12a22an2a1na2nann nn,dXdf(x)
解: f ( x ) = X T A X 是标量函数 f(x)=X^TAX是标量函数 f(x)=XTAX是标量函数
f ( x ) = [ x 1 x 2 ⋯ x n ] ⋅ [ a 11 a 12 ⋯ a 1 n a 21 a 22 ⋯ a 2 n ⋮ ⋮ ⋯ ⋮ a n 1 a n 2 ⋯ a n n ] ⋅ [ x 1 x 2 ⋮ x n ] f(x)= \left[\begin{matrix} x_1&x_2&\cdots&x_n \end{matrix} \right] \cdot \left[\begin{matrix} a_{11}& a_{12}&\cdots&a_{1n}\\ a_{21}& a_{22}&\cdots&a_{2n}\\ \vdots&\vdots&\cdots&\vdots\\ a_{n1}& a_{n2}&\cdots&a_{nn} \end{matrix} \right] \cdot \left[\begin{matrix} x_1\\ x_2\\ \vdots\\ x_n \end{matrix} \right] f(x)=[x1x2xn] a11a21an1a12a22an2a1na2nann x1x2xn

f ( x ) = ∑ i = 1 n ∑ j = 1 n a i j x i x j f(x)=\sum_{i=1}^n\sum_{j=1}^na_{ij}x_ix_j f(x)=i=1nj=1naijxixj,

d f ( x ) d X = [ ∂ f ( x ) ∂ x 1 ∂ f ( x ) ∂ x 2 ⋮ ∂ f ( x ) ∂ x n ] = [ ∑ j = 1 n a 1 j x j + ∑ j = 1 n a i 1 x i ∑ j = 1 n a 2 j x j + ∑ i = 1 n a i 2 x i ⋮ ∑ j = 1 n a n j x j + ∑ i = 1 n a i n x i ] = [ ∑ j = 1 n a 1 j x j ∑ j = 1 n a 2 j x j ⋮ ∑ j = 1 n a n j x j ] + [ ∑ j = 1 n a i 1 x i ∑ i = 1 n a i 2 x i ⋮ ∑ i = 1 n a i n x i ] = [ a 11 a 12 ⋯ a 1 n a 21 a 22 ⋯ a 2 n ⋮ ⋮ ⋯ ⋮ a n 1 a n 2 ⋯ a n n ] ⋅ [ x 1 x 2 ⋮ x n ] + [ a 11 a 12 ⋯ a 1 n a 21 a 22 ⋯ a 2 n ⋮ ⋮ ⋯ ⋮ a n 1 a n 2 ⋯ a n n ] T ⋅ [ x 1 x 2 ⋮ x n ] = A X + A T X = ( A + A T ) X \begin{aligned}\frac{df(x)}{dX}& =\left[ \begin{matrix} \frac{\partial f(x)}{\partial x_1} \\ \frac{\partial f(x)}{\partial x_2}\\ \vdots\\ \frac{\partial f(x)}{\partial x_n} \end{matrix} \right]\\& =\left[ \begin{matrix} \sum_{j=1}^na_{1j}x_j+\sum_{j=1}^na_{i1}x_i \\ \sum_{j=1}^na_{2j}x_j+\sum_{i=1}^na_{i2}x_i\\ \vdots \\ \sum_{j=1}^na_{nj}x_j+\sum_{i=1}^na_{in}x_i \end{matrix} \right]\\& =\left[ \begin{matrix} \sum_{j=1}^na_{1j}x_j\\ \sum_{j=1}^na_{2j}x_j\\ \vdots \\ \sum_{j=1}^na_{nj}x_j \end{matrix} \right] + \left[ \begin{matrix} \sum_{j=1}^na_{i1}x_i \\ \sum_{i=1}^na_{i2}x_i\\ \vdots \\ \sum_{i=1}^na_{in}x_i \end{matrix} \right]\\& =\left[\begin{matrix} a_{11}& a_{12}&\cdots&a_{1n}\\ a_{21}& a_{22}&\cdots&a_{2n}\\ \vdots&\vdots&\cdots&\vdots\\ a_{n1}& a_{n2}&\cdots&a_{nn} \end{matrix} \right] \cdot \left[\begin{matrix} x_1\\ x_2\\ \vdots\\ x_n \end{matrix} \right]\\& + \left[\begin{matrix} a_{11}& a_{12}&\cdots&a_{1n}\\ a_{21}& a_{22}&\cdots&a_{2n}\\ \vdots&\vdots&\cdots&\vdots\\ a_{n1}& a_{n2}&\cdots&a_{nn} \end{matrix} \right]^T \cdot \left[\begin{matrix} x_1\\ x_2\\ \vdots\\ x_n \end{matrix} \right]\\& =AX+A^TX=(A+A^T)X \end{aligned} dXdf(x)= x1f(x)x2f(x)xnf(x) = j=1na1jxj+j=1nai1xij=1na2jxj+i=1nai2xij=1nanjxj+i=1nainxi = j=1na1jxjj=1na2jxjj=1nanjxj + j=1nai1xii=1nai2xii=1nainxi = a11a21an1a12a22an2a1na2nann x1x2xn + a11a21an1a12a22an2a1na2nann T x1x2xn =AX+ATX=(A+AT)X,所以 d X T A X d X = ( A + A T ) X \frac{dX^TAX}{dX}=(A+A^T)X dXdXTAX=(A+AT)X,特别地, ( X T A X ) T = X T A T X , d X T A T X d X = ( A + A T ) X (X^TAX)^T=X^TA^TX,\frac{dX^TA^TX}{dX}=(A+A^T)X (XTAX)T=XTATXdXdXTATX=(A+AT)X

4.矩阵求导布局

{ 分母布局 → Y X 拉伸 分子布局 → X Y 拉伸 \begin{cases} 分母布局\rightarrow YX拉伸\\ 分子布局 \rightarrow XY拉伸 \end{cases} {分母布局YX拉伸分子布局XY拉伸
区别:总的不变:前面横向拉,后面纵向拉。
YX拉伸(分母布局),Y横向(f(x)),X纵向拉。
XY拉伸(分子布局),相反
通常 ( 分母布局 ) T = ( 分子布局 ) (分母布局)^T=(分子布局) (分母布局)T=(分子布局)
例: f ( x ) = X T X , X = [ x 1 , x 2 , ⋯   , x n ] T f(x)=X^TX,X=[x_1,x_2,\cdots,x_n]^T f(x)=XTX,X=[x1,x2,,xn]T
分母布局:
d f ( x ) d x = [ ∂ f ( x ) ∂ x 1 ∂ f ( x ) ∂ x 2 ⋮ ∂ f ( x ) ∂ x n ] = [ 2 x 1 2 x 2 ⋮ 2 x n ] = 2 X \frac{df(x)}{dx} =\left[ \begin{matrix} \frac{\partial f(x)}{\partial x_1} \\ \frac{\partial f(x)}{\partial x_2}\\ \vdots\\ \frac{\partial f(x)}{\partial x_n} \end{matrix} \right] =\left[ \begin{matrix} 2 x_1 \\ 2 x_2 \\ \vdots\\ 2 x_n \end{matrix} \right] =2X dxdf(x)= x1f(x)x2f(x)xnf(x) = 2x12x22xn =2X
分子布局:
d f ( x ) d x = [ ∂ f ( x ) ∂ x 1 ∂ f ( x ) ∂ x 2 ⋯ ∂ f ( x ) ∂ x n ] = [ 2 x 1 2 x 2 ⋯ 2 x n ] = 2 X T \frac{df(x)}{dx} =\left[ \begin{matrix} \frac{\partial f(x)}{\partial x_1} & \frac{\partial f(x)}{\partial x_2}& \cdots & \frac{\partial f(x)}{\partial x_n} \end{matrix} \right] =\left[ \begin{matrix} 2x_1 & 2x_2 & \cdots & 2x_n \end{matrix} \right] =2X^T dxdf(x)=[x1f(x)x2f(x)xnf(x)]=[2x12x22xn]=2XT

5.矩阵求导的乘法和加法公式

U = [ u 1 ( x ) u 2 ( x ) ⋮ u n ( x ) ] n ∗ 1 , V = [ v 1 ( x ) v 2 ( x ) ⋮ v n ( x ) ] n ∗ 1 , X = [ x 1 x 2 ⋮ x n ] n ∗ 1 U=\left[ \begin{matrix} u_1(x)\\ u_2(x) \\ \vdots\\ u_n(x) \end{matrix} \right]_{n*1}, V=\left[ \begin{matrix} v_1(x)\\ v_2(x) \\ \vdots\\ v_n(x) \end{matrix} \right]_{n*1}, X=\left[ \begin{matrix} x_1\\ x_2 \\ \vdots\\ x_n \end{matrix} \right]_{n*1} U= u1(x)u2(x)un(x) n1,V= v1(x)v2(x)vn(x) n1,X= x1x2xn n1
注意到 U T V U^TV UTV是标量
d U T V d X = ∂ U ∂ X V + ∂ V ∂ X U \frac{dU^TV}{dX} =\frac{\partial U} {\partial X}V +\frac{\partial V} {\partial X}U dXdUTV=XUV+XVU

d ( U + V ) d X = d U d X + d V d X \frac{d(U+V)}{dX}=\frac{dU}{dX}+\frac{dV}{dX} dXd(U+V)=dXdU+dXdV

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值