矩阵向量求导之微分法

矩阵向量求导之微分法

标量微分

d f = ∂ f ∂ x d x df = \frac {\partial f}{\partial x} dx df=xfdx

向量微分

d f = ∑ i = 1 n ∂ f x i d x i = [ ∂ f x 1 ∂ f x 2 ∂ f x 3 . . . ∂ f x n ] [ d x 1 d x 2 d x 3 . . . . d x 1 ] = ( ∂ f ∂ x ) T d x df = \displaystyle\sum_{i=1}^n\frac{\partial f}{x_i}dx_i= \begin{bmatrix} \frac{\partial f}{x_1}\\ \\ \frac{\partial f}{x_2}\\ \\ \frac{\partial f}{x_3}\\ .\\ .\\ .\\ \\ \frac{\partial f}{x_n}\\ \end{bmatrix} \begin{bmatrix} d_{x1}&d_{x2}&d_{x3}&....&d_{x1}\end{bmatrix} =\begin{pmatrix}\frac {\partial f}{\partial \Large x} \end{pmatrix}^Td\Large x df=i=1nxifdxi= x1fx2fx3f...xnf [dx1dx2dx3....dx1]=(xf)Tdx

矩阵微分

d f = ∑ i = 1 m ∑ j = 1 n ∂ f ∂ x i j d X i j df = \displaystyle\sum_{i=1}^m\displaystyle\sum_{j=1}^n\frac {\partial f}{\partial \LARGE x_{ij}}d\LARGE X_{ij} df=i=1mj=1nxijfdXij
[ ∂ f x 11 ∂ f x 21 ∂ f x 31 . . . x m 1 ∂ f x 12 ∂ f x 22 ∂ f x 32 . . . x m 2 . . . ∂ f x 1 n ∂ f x 2 n ∂ f x 3 n . . . x m n ] [ d x 11 d x 12 d x 13 . . . d x 1 n d x 21 d x 22 d x 23 . . . d x 2 n . . . d x m 1 d x m 2 d x m 3 . . . d x m n ] = [ ∂ f x 11 d x 11 + ∂ f x 21 d x 21 + ∂ f x 31 d x 31 + . . . + ∂ f x m 1 d x m 1 ∂ f x 11 d x 12 + ∂ f x 21 d x 22 + ∂ f x 31 d x 32 + . . . + ∂ f x m 1 d x m 2 ∂ f x 11 d x 13 + ∂ f x 21 d x 23 + ∂ f x 31 d x 33 + . . . + ∂ f x m 1 d x m 3 . . . ∂ f x 11 d x 1 n + ∂ f x 21 d x 2 n + ∂ f x 31 d x 3 n + . . . + ∂ f x m 1 d x m n ∂ f x 12 d x 11 + ∂ f x 22 d x 21 + ∂ f x 32 d x 31 + . . . + ∂ f x m 2 d x m 1 ∂ f x 12 d x 12 + ∂ f x 22 d x 21 + ∂ f x 32 d x 32 + . . . + ∂ f x m 2 d x m 2 ∂ f x 12 d x 13 + ∂ f x 22 d x 23 + ∂ f x 32 d x 33 + . . . + ∂ f x m 2 d x m 3 . . . ∂ f x 12 d x 1 n + ∂ f x 22 d x 2 n + ∂ f x 32 d x 3 n + . . . + ∂ f x m 2 d x m n . . . ∂ f x 1 n d x 11 + ∂ f x 2 n d x 21 + ∂ f x 3 n d x 31 + . . . + ∂ f x m n d x m 1 ∂ f x 1 n d x 12 + ∂ f x 2 n d x 22 + ∂ f x 3 n d x 32 + . . . + ∂ f x m n d x m 2 ∂ f x 1 n d x 13 + ∂ f x 2 n d x 23 + ∂ f x 3 n d x 33 + . . . + ∂ f x m n d x m 3 . . . ∂ f x 1 n d x 1 n + ∂ f x 2 n d x 2 n + ∂ f x 3 n d x 3 n + . . . + ∂ f x m n d x m n ] \begin{bmatrix} \frac{\partial f}{x_{11}}&\frac{\partial f}{x_{21}}&\frac{\partial f}{x_{31}}&...&{x_{m1}}\\ \\ \frac{\partial f}{x_{12}} &\frac{\partial f}{x_{22}}&\frac{\partial f}{x_{32}}&...&{x_{m2}}\\ .\\ .\\ .\\ \frac{\partial f}{x_{1n}}&\frac{\partial f}{x_{2n}}&\frac{\partial f}{x_{3n}}&...&{x_{mn}}\\ \end{bmatrix} \begin{bmatrix} dx_{11}&dx_{12}&dx_{13}&...&dx_{1n}\\ \\ dx_{21}&dx_{22}&dx_{23}&...&dx_{2n}\\ .\\ .\\ .\\ dx_{m1}&dx_{m2}&dx_{m3}&...&dx_{mn}\\ \end{bmatrix} = \begin{bmatrix} \frac{\partial f}{x_{11}}dx_{11}+ \frac{\partial f}{x_{21}}dx_{21}+ \frac{\partial f}{x_{31}}dx_{31}+ . . . + \frac{\partial f}{x_{m1}}dx_{m1} & \frac{\partial f}{x_{11}}dx_{12}+ \frac{\partial f}{x_{21}}dx_{22}+ \frac{\partial f}{x_{31}}dx_{32}+ . . . + \frac{\partial f}{x_{m1}}dx_{m2} & \frac{\partial f}{x_{11}}dx_{13}+ \frac{\partial f}{x_{21}}dx_{23}+ \frac{\partial f}{x_{31}}dx_{33}+ . . . + \frac{\partial f}{x_{m1}}dx_{m3}&...& \frac{\partial f}{x_{11}}dx_{1n}+ \frac{\partial f}{x_{21}}dx_{2n}+ \frac{\partial f}{x_{31}}dx_{3n}+ . . . + \frac{\partial f}{x_{m1}}dx_{mn}\\ \\ \frac{\partial f}{x_{12}}dx_{11}+ \frac{\partial f}{x_{22}}dx_{21}+ \frac{\partial f}{x_{32}}dx_{31}+ . . . + \frac{\partial f}{x_{m2}}dx_{m1} & \frac{\partial f}{x_{12}}dx_{12}+ \frac{\partial f}{x_{22}}dx_{21}+ \frac{\partial f}{x_{32}}dx_{32}+ . . . + \frac{\partial f}{x_{m2}}dx_{m2}& \frac{\partial f}{x_{12}}dx_{13}+ \frac{\partial f}{x_{22}}dx_{23}+ \frac{\partial f}{x_{32}}dx_{33}+ . . . + \frac{\partial f}{x_{m2}}dx_{m3}&...& \frac{\partial f}{x_{12}}dx_{1n}+ \frac{\partial f}{x_{22}}dx_{2n}+ \frac{\partial f}{x_{32}}dx_{3n}+ . . . + \frac{\partial f}{x_{m2}}dx_{mn}\\ .\\ .\\ .\\ \frac{\partial f}{x_{1n}}dx_{11}+ \frac{\partial f}{x_{2n}}dx_{21}+ \frac{\partial f}{x_{3n}}dx_{31}+ . . . + \frac{\partial f}{x_{mn}}dx_{m1}& \frac{\partial f}{x_{1n}}dx_{12}+ \frac{\partial f}{x_{2n}}dx_{22}+ \frac{\partial f}{x_{3n}}dx_{32}+ . . . + \frac{\partial f}{x_{mn}}dx_{m2}& \frac{\partial f}{x_{1n}}dx_{13}+ \frac{\partial f}{x_{2n}}dx_{23}+ \frac{\partial f}{x_{3n}}dx_{33}+ . . . + \frac{\partial f}{x_{mn}}dx_{m3}&...& \frac{\partial f}{x_{1n}}dx_{1n}+ \frac{\partial f}{x_{2n}}dx_{2n}+ \frac{\partial f}{x_{3n}}dx_{3n}+ . . . + \frac{\partial f}{x_{mn}}dx_{mn} \end{bmatrix} x11fx12f...x1nfx21fx22fx2nfx31fx32fx3nf.........xm1xm2xmn dx11dx21...dxm1dx12dx22dxm2dx13dx23dxm3.........dx1ndx2ndxmn = x11fdx11+x21fdx21+x31fdx31+...+xm1fdxm1x12fdx11+x22fdx21+x32fdx31+...+xm2fdxm1...x1nfdx11+x2nfdx21+x3nfdx31+...+xmnfdxm1x11fdx12+x21fdx22+x31fdx32+...+xm1fdxm2x12fdx12+x22fdx21+x32fdx32+...+xm2fdxm2x1nfdx12+x2nfdx22+x3nfdx32+...+xmnfdxm2x11fdx13+x21fdx23+x31fdx33+...+xm1fdxm3x12fdx13+x22fdx23+x32fdx33+...+xm2fdxm3x1nfdx13+x2nfdx23+x3nfdx33+...+xmnfdxm3.........x11fdx1n+x21fdx2n+x31fdx3n+...+xm1fdxmnx12fdx1n+x22fdx2n+x32fdx3n+...+xm2fdxmnx1nfdx1n+x2nfdx2n+x3nfdx3n+...+xmnfdxmn
所以矩阵的微分可以表示为
d f = t r ( ( ∂ f ∂ X ) T d x ) df = tr\begin{pmatrix} \begin{pmatrix} \frac {\partial f}{\partial \Large X} \end{pmatrix}^Td\Large x \end{pmatrix} df=tr((Xf)Tdx)
向量微分也可以用矩阵微分来表示
d f = t r ( ( ∂ f ∂ x ) T d x ) df = tr \begin{pmatrix} \begin{pmatrix} \frac{\partial f}{\partial \Large x} \end{pmatrix}^T d\Large x \end{pmatrix} df=tr((xf)Tdx)

矩阵微分的性质

我们在讨论如何使用矩阵微分来求导前,先看看矩阵微分的性质
d ( X + Y ) = d Y + d X , d ( X − Y ) = d X − d Y d\begin{pmatrix} \LARGE X + \LARGE Y \end{pmatrix}= d \LARGE Y + d \LARGE {X} , d\begin{pmatrix} \LARGE X - \LARGE Y \end {pmatrix}= d\LARGE X - d\LARGE Y d(X+Y)=dY+dX,d(XY)=dXdY
d ( X Y ) = X d ( Y ) + Y d ( X ) \LARGE d\begin{pmatrix} \LARGE X \LARGE Y \end{pmatrix} =\LARGE X \LARGE d\begin{pmatrix} \LARGE Y \end{pmatrix}+ \LARGE Y \LARGE d \begin{pmatrix} \LARGE X \end{pmatrix} d(XY)=Xd(Y)+Yd(X)

d ( X T ) = ( d X ) T d \begin{pmatrix} \LARGE X^T \end{pmatrix}= \begin{pmatrix} \LARGE {dX} \end{pmatrix}^T d(XT)=(dX)T
d t r ( X ) = t r ( d X ) d tr \begin{pmatrix} \LARGE X \end{pmatrix}=tr \begin{pmatrix} \LARGE {dX} \end{pmatrix} dtr(X)=tr(dX)

d ( X ⨀ Y ) = X ⨀ d ( Y ) + Y ⨀ d ( X ) d \begin{pmatrix} \LARGE X \normalsize \bigodot \LARGE Y \end{pmatrix}=\LARGE X \normalsize \bigodot d \begin{pmatrix} \LARGE Y \end{pmatrix} +\LARGE Y \normalsize \bigodot d \begin{pmatrix} \LARGE X \end{pmatrix} d(XY)=Xd(Y)+Yd(X)

d δ ( X ) = δ ′ ( X ) ⨀ d X d \delta \begin{pmatrix} \LARGE X \end{pmatrix}=\delta ^{\prime} \begin{pmatrix} \LARGE X \end{pmatrix} \bigodot d \LARGE X dδ(X)=δ(X)dX

d X − 1 = − X − 1 d X X − 1 d\LARGE X^{\normalsize -1} = \LARGE -X^{-1}\normalsize d\LARGE X \LARGE X^{-1} dX1=X1dXX1

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值