矩阵微分笔记(3)

前言

这篇笔记的内容是基于参考的文章写出的,公式部分可以会沿用文章本来的式,但会加入我自己的一些思考以及注释,如果读者认为我写的不够好得话可以参考原文章~

本文介绍向量变元的实值标量函数、矩阵变元的实值标量函数中进阶的矩阵求导的技巧:矩阵的迹与一阶实矩阵微分 。(本笔记的推导过程会使用到矩阵变元的实值矩阵函数,但矩阵变元的实矩阵函数的求导本笔记不会涉及)看懂本博文需要了解前几篇博文所提及的知识,以及了解线性代数中矩阵乘法、向量内积的知识。

下面有一个求矩阵导数的网站,可以用来验证求导结果是否正确:Matrix Calculus

矩阵求导

1. 矩阵的迹

1.1 定义

n × n n\times n n×n 的方阵 A n × n A_{n\times n} An×n 的主对角线元素之和就叫矩阵 A A A 的迹(trace), 记作 tr ⁡ ( A ) \operatorname{tr}(\boldsymbol{A}) tr(A) ,即 A A A 的迹为: tr ⁡ ( A ) = a 11 + a 22 + ⋯ + a n n = ∑ i = 1 n a i i \operatorname{tr}(\boldsymbol{A})=a_{11}+a_{22}+\cdots+a_{nn}=\sum_{i=1}^na_{ii} tr(A)=a11+a22++ann=i=1naii注意:根据矩阵迹的定义可以知道,只有方阵才有迹

1.2 迹的性质

以下不加证明地给出几条矩阵的性质,虽然是以两个矩阵给出的,但是可以推广到多个矩阵时也同样适用

(1):标量的迹

对于一个标量 x x x,由于标量可以看成 1 × 1 1 \times 1 1×1 的矩阵,因此标量的迹就是自身,即 t r ( x ) = x \mathrm{tr}(x)=x tr(x)=x

(2):线性法则

矩阵的迹遵循线性可加原则:tr内的加法可以提到tr的外面,即 t r ( c 1 A + c 2 B ) = c 1 t r ( A ) + c 2 t r ( B ) (3) \mathbb{tr}(c_1\pmb{A}+c_2\pmb{B}) = c_1\mathbb{tr}(\pmb{A})+c_2\mathbb{tr}(\pmb{B}) \\\\ \tag{3} tr(c1A+c2B)=c1tr(A)+c2tr(B)(3)其中 c 1 c_1 c1 c 2 c_2 c2 是标量

(3):转置法则

矩阵的转置不会改变矩阵的痕,即 t r ( A ) = t r ( A T ) \mathbb{tr}(\pmb{A})=\mathbb{tr}(\pmb{A}^T) tr(A)=tr(AT)

(4):交换法则

对于两个维数都是 m × n m\times n m×n 的矩阵 A m × n A_{m\times n} Am×n B m × n B_{m\times n} Bm×n,其中一个矩阵乘以(左乘右乘都可以)另一个矩阵的转置的迹的记过是两个矩阵对应位置的元素相乘的加和,因此我们将矩阵的迹可以理解为向量的点积在矩阵上的推广,即: t r ( A B T ) = a 11 b 11 + a 12 b 12 + ⋯ + a 1 n b 1 n + a 21 b 21 + a 22 b 22 + ⋯ + a 2 n b 2 n + ⋯ + a m 1 b m 1 + a m 2 b m 2 + ⋯ + a m n b m n (6) \begin{aligned} \mathbb{tr}(\pmb{A}\pmb{B}^T) &= a_{11}b_{11}+a_{12}b_{12}+\cdots+a_{1n}b_{1n}\\ &+ a_{21}b_{21}+a_{22}b_{22}+\cdots+a_{2n}b_{2n}\\ &+ \cdots \\ &+ a_{m1}b_{m1}+a_{m2}b_{m2}+\cdots+a_{mn}b_{mn} \end{aligned} \\\\ \tag{6} tr(ABT)=a11b11+a12b12++a1nb1n+a21b21+a22b22++a2nb2n++am1bm1+am2bm2++amnbmn(6)这是因为: t r ( A B T ) = t r ( [ a 11 a 12 ⋯ a 1 n a 21 a 22 ⋯ a 2 n ⋮ ⋮ ⋮ ⋮ a m 1 a m 2 ⋯ a m n ] [ b 11 b 21 ⋯ b m 1 b 12 b 22 ⋯ b m 2 ⋮ ⋮ ⋮ ⋮ b 1 n b 2 n ⋯ b m n ] ) = t r [ a 11 b 11 + a 12 b 12 + ⋯ + a 1 n b 1 n ∗ ⋯ ∗ ∗ a 21 b 21 + a 22 b 22 + ⋯ + a 2 n b 2 n ⋯ ∗ ⋮ ⋮ ⋱ ⋮ ∗ ∗ ⋯ a m 1 b m 1 + a m 2 b m 2 + ⋯ + a m n b m n ] m × m = a 11 b 11 + a 12 b 12 + ⋯ + a 1 n b 1 n + a 21 b 21 + a 22 b 22 + ⋯ + a 2 n b 2 n + ⋯ + a m 1 b m 1 + a m 2 b m 2 + ⋯ + a m n b m n (7) \begin{aligned} \mathbb{tr}(\pmb{A}\pmb{B}^T) &=\mathbb{tr}( \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \vdots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \\ \end{bmatrix} \begin{bmatrix} b_{11} & b_{21} & \cdots & b_{m1} \\ b_{12} & b_{22} & \cdots & b_{m2} \\ \vdots & \vdots & \vdots & \vdots \\ b_{1n} & b_{2n} & \cdots & b_{mn} \\ \end{bmatrix} ) \\\\ &= \mathbb{tr} \begin{bmatrix} a_{11}b_{11}+a_{12}b_{12}+\cdots+a_{1n}b_{1n} & * & \cdots & * \\ * & a_{21}b_{21}+a_{22}b_{22}+\cdots+a_{2n}b_{2n} & \cdots & *\\ \vdots & \vdots & \ddots & \vdots \\ * & * & \cdots & a_{m1}b_{m1}+a_{m2}b_{m2}+\cdots+a_{mn}b_{mn} \\ \end{bmatrix}_{m \times m} \\\\ &= a_{11}b_{11}+a_{12}b_{12}+\cdots+a_{1n}b_{1n}\\ &+ a_{21}b_{21}+a_{22}b_{22}+\cdots+a_{2n}b_{2n}\\ &+ \cdots \\ &+ a_{m1}b_{m1}+a_{m2}b_{m2}+\cdots+a_{mn}b_{mn} \end{aligned} \\\\ \tag{7} tr(ABT)=tr( a11a21am1a12a22am2a1na2namn b11b12b1nb21b22b2nbm1bm2bmn )=tr a11b11+a12b12++a1nb1na21b21+a22b22++a2nb2nam1bm1+am2bm2++amnbmn m×m=a11b11+a12b12++a1nb1n+a21b21+a22b22++a2nb2n++am1bm1+am2bm2++amnbmn(7)于是由上述的过程以及矩阵乘积 A B AB AB B A BA BA 的计算过程,我们可以知道矩阵的迹是遵守交换律的,即 t r ( A B ) = t r ( B A ) \mathbb{tr}(\pmb{A}\pmb{B})= \mathbb{tr}(\pmb{B}\pmb{A}) tr(AB)=tr(BA)其中 A A A m × n m \times n m×n 维的,而 B B B n × m n \times m n×m 维的

2. 矩阵微分的几种情况

2.1 向量变元的实值标量函数

f ( x ⃗ ) , x ⃗ = [ x 1 , x 2 , ⋯   , x n ] T f(\vec{x}),\vec{x}=[x_1,x_2,\cdots,x_n]^T f(x ),x =[x1,x2,,xn]T实际上 f ( x ⃗ ) f(\vec{x}) f(x ) 就是多元函数,如果 f ( x ⃗ ) f(\vec{x}) f(x ) 可微,则全微分为 d f ( x ⃗ ) = ∂ f ∂ x 1 d x 1 + ∂ f ∂ x 2 d x 2 + ⋯ + ∂ f ∂ x n d x n = ( ∂ f ∂ x 1 , ∂ f ∂ x 2 , ⋯   , ∂ f ∂ x n ) [ d x 1 d x 2 ⋮ d x n ] =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ 由标量的性质 t r ( ( ∂ f ∂ x 1 , ∂ f ∂ x 2 , ⋯   , ∂ f ∂ x n ) [ d x 1 d x 2 ⋮ d x n ] ) \begin{aligned}\mathrm{d}f(\vec{x})&={\frac{\partial f}{\partial x_{1}}}\mathrm{d}x_{1}+{\frac{\partial f}{\partial x_{2}}}\mathrm{d}x_{2}+\cdots+{\frac{\partial f}{\partial x_{n}}}\mathrm{d}x_{n} \\ &=(\frac{\partial f}{\partial x_1},\frac{\partial f}{\partial x_2},\cdots,\frac{\partial f}{\partial x_n})\begin{bmatrix}\mathrm{d}x_1\\\mathrm{d}x_2\\\vdots\\\mathrm{d}x_n\end{bmatrix}\\&\overset{\text{由标量的性质}}{=\!=\!=\!=\!=\!=\!=\!=\!}\mathrm{tr}((\frac{\partial f}{\partial x_1},\frac{\partial f}{\partial x_2},\cdots,\frac{\partial f}{\partial x_n})\begin{bmatrix}\mathrm{d}x_1\\\mathrm{d}x_2\\\vdots\\\mathrm{d}x_n\end{bmatrix})\end{aligned} df(x )=x1fdx1+x2fdx2++xnfdxn=(x1f,x2f,,xnf) dx1dx2dxn ========由标量的性质tr((x1f,x2f,,xnf) dx1dx2dxn )

2.2 矩阵变元的实值标量函数

f ( X ) , X m × n = ( x i j ) i = 1 , j = 1 m , n f(\pmb{X}),\pmb{X}_{m\times n}=(x_{ij})_{i=1,j=1}^{m,n} f(X),Xm×n=(xij)i=1,j=1m,n实际上这仍然是多元函数,设其可微,则全微分为

d f ( X ) = ∂ f ∂ x 11 d x 11 + ∂ f ∂ x 12 d x 12 + ⋯ + ∂ f ∂ x 1 n d x 1 n + ∂ f ∂ x 21 d x 21 + ∂ f ∂ x 22 d x 22 + ⋯ + ∂ f ∂ x 2 n d x 2 n + ⋯ + ∂ f ∂ x m 1 d x m 1 + ∂ f ∂ x m 2 d x m 2 + ⋯ + ∂ f ∂ x m n d x m n   (19) \begin{aligned} \mathbb{d}f(\pmb{X}) &=\frac{\partial f}{\partial x_{11}}\mathbb{d}x_{11}+\frac{\partial f}{\partial x_{12}}\mathbb{d}x_{12} + \cdots+\frac{\partial f}{\partial x_{1n}}\mathbb{d}x_{1n}\\ &+\frac{\partial f}{\partial x_{21}}\mathbb{d}x_{21}+\frac{\partial f}{\partial x_{22}}\mathbb{d}x_{22} + \cdots+\frac{\partial f}{\partial x_{2n}}\mathbb{d}x_{2n}\\ &+\cdots\\ &+\frac{\partial f}{\partial x_{m1}}\mathbb{d}x_{m1}+\frac{\partial f}{\partial x_{m2}}\mathbb{d}x_{m2} + \cdots+\frac{\partial f}{\partial x_{mn}}\mathbb{d}x_{mn} \end{aligned} \\\ \tag{19} df(X)=x11fdx11+x12fdx12++x1nfdx1n+x21fdx21+x22fdx22++x2nfdx2n++xm1fdxm1+xm2fdxm2++xmnfdxmn (19)

我们从上述的结果可以发现,它其实就是矩阵 ( ∂ f ∂ x i j ) i = 1 , j = 1 m , n (\frac{\partial f}{\partial x_{ij}})_{i=1,j=1}^{m,n} (xijf)i=1,j=1m,n 与矩阵 ( d x i j ) i = 1 , j = 1 m , n (\mathrm{d}x_{ij})_{i=1,j=1}^{m,n} (dxij)i=1,j=1m,n 对应位置的元素相乘的加和,于是利用矩阵的迹的性质我们可以将这个式写成迹的形式,即: d f ( X ) = ∂ f ∂ x 11 d x 11 + ∂ f ∂ x 12 d x 12 + ⋯ + ∂ f ∂ x 1 n d x 1 n + ∂ f ∂ x 21 d x 21 + ∂ f ∂ x 22 d x 22 + ⋯ + ∂ f ∂ x 2 n d x 2 n + ⋯ + ∂ f ∂ x m 1 d x m 1 + ∂ f ∂ x m 2 d x m 2 + ⋯ + ∂ f ∂ x m n d x m n = t r ( [ ∂ f ∂ x 11 ∂ f ∂ x 21 ⋯ ∂ f ∂ x m 1 ∂ f ∂ x 12 ∂ f ∂ x 22 ⋯ ∂ f ∂ x m 2 ⋮ ⋮ ⋮ ⋮ ∂ f ∂ x 1 n ∂ f ∂ x 2 n ⋯ ∂ f ∂ x m n ] n × m [ d x 11 d x 12 ⋯ d x 1 n d x 21 d x 22 ⋯ d x 2 n ⋮ ⋮ ⋮ ⋮ d x m 1 d x m 2 ⋯ d x m n ] m × n )   (20) \begin{aligned} \mathbb{d}f(\pmb{X}) &=\frac{\partial f}{\partial x_{11}}\mathbb{d}x_{11}+\frac{\partial f}{\partial x_{12}}\mathbb{d}x_{12} + \cdots+\frac{\partial f}{\partial x_{1n}}\mathbb{d}x_{1n}\\ &+\frac{\partial f}{\partial x_{21}}\mathbb{d}x_{21}+\frac{\partial f}{\partial x_{22}}\mathbb{d}x_{22} + \cdots+\frac{\partial f}{\partial x_{2n}}\mathbb{d}x_{2n}\\ &+\cdots\\ &+\frac{\partial f}{\partial x_{m1}}\mathbb{d}x_{m1}+\frac{\partial f}{\partial x_{m2}}\mathbb{d}x_{m2} + \cdots+\frac{\partial f}{\partial x_{mn}}\mathbb{d}x_{mn} \\\\ &=\mathbb{tr}( \begin{bmatrix} \frac{\partial f}{\partial x_{11}}&\frac{\partial f}{\partial x_{21}}&\cdots&\frac{\partial f}{\partial x_{m1}} \\ \frac{\partial f}{\partial x_{12}}&\frac{\partial f}{\partial x_{22}}& \cdots & \frac{\partial f}{\partial x_{m2}}\\ \vdots&\vdots&\vdots&\vdots\\ \frac{\partial f} {\partial x_{1n}}&\frac{\partial f}{\partial x_{2n}}&\cdots&\frac{\partial f}{\partial x_{mn}} \end{bmatrix}_{n\times m} \begin{bmatrix} \mathbb{d}x_{11} & \mathbb{d}x_{12} & \cdots & \mathbb{d}x_{1n} \\ \mathbb{d}x_{21} & \mathbb{d}x_{22} & \cdots & \mathbb{d}x_{2n} \\ \vdots&\vdots&\vdots&\vdots\\ \mathbb{d}x_{m1} & \mathbb{d}x_{m2} & \cdots & \mathbb{d}x_{mn} \end{bmatrix}_{m \times n} ) \end{aligned} \\\ \tag{20} df(X)=x11fdx11+x12fdx12++x1nfdx1n+x21fdx21+x22fdx22++x2nfdx2n++xm1fdxm1+xm2fdxm2++xmnfdxmn=tr( x11fx12fx1nfx21fx22fx2nfxm1fxm2fxmnf n×m dx11dx21dxm1dx12dx22dxm2dx1ndx2ndxmn m×n) (20)

2.3 矩阵变元的实矩阵函数

矩阵变元的实矩阵函数,其每个元素其实就是一个矩阵变元的实值标量函数 f i j ( X ) f_{ij}(X) fij(X)

我们定义矩阵变元的实矩阵函数的微分如下:设 f i j ( X ) f_{ij}(X) fij(X) 可微,则矩阵变元的实矩阵函数的矩阵微分,就是对每个位置的元素 的全微分,且结果布局不变,即: d F p × q ( X ) = [ d f 11 ( X ) d f 12 ( X ) ⋯ d f 1 q ( X ) d f 21 ( X ) d f 22 ( X ) ⋯ d f 2 q ( X ) ⋮ ⋮ ⋮ ⋮ d f p 1 ( X ) d f p 2 ( X ) ⋯ d f p q ( X ) ] p × q   (21) \begin{aligned} \mathbb{d}\pmb{F}_{p \times q}(\pmb{X}) &= \begin{bmatrix} \mathbb{d}f_{11}(\pmb{X})& \mathbb{d}f_{12}(\pmb{X}) & \cdots & \mathbb{d}f_{1q}(\pmb{X}) \\ \mathbb{d}f_{21}(\pmb{X})& \mathbb{d}f_{22}(\pmb{X}) & \cdots & \mathbb{d}f_{2q}(\pmb{X}) \\ \vdots&\vdots&\vdots&\vdots \\ \mathbb{d}f_{p1}(\pmb{X})& \mathbb{d}f_{p2}(\pmb{X}) & \cdots & \mathbb{d}f_{pq}(\pmb{X}) \end{bmatrix}_{p \times q} \end{aligned} \\\ \tag{21} dFp×q(X)= df11(X)df21(X)dfp1(X)df12(X)df22(X)dfp2(X)df1q(X)df2q(X)dfpq(X) p×q (21)相应地,类似于矩阵变元的实值标量函数和向量变元的实值标量函数,对于矩阵变元的实矩阵函数的微分也有四条法则:

常数矩阵的矩阵微分 d A m × n = 0 m × n \mathrm{d}\boldsymbol{A}_{m\times n}=\mathbf{0}_{m\times n} dAm×n=0m×n

线性法则

相加再微分等于微分再相加,即常数(标量)可以放到括号外: d ( c 1 F ( X ) + c 2 G ( X ) ) = c 1 d F ( X ) + c 2 d G ( X ) \mathrm{d}(c_1\boldsymbol{F}(\boldsymbol{X})+c_2\boldsymbol{G}(\boldsymbol{X}))=c_1\mathrm{d}\boldsymbol{F}(\boldsymbol{X})+c_2\mathrm{d}\boldsymbol{G}(\boldsymbol{X}) d(c1F(X)+c2G(X))=c1dF(X)+c2dG(X)其中 c 1 c_1 c1 c 2 c_2 c2 是常数(标量)

乘法法则 d ⁡ ( F ( X ) G ( X ) ) = d ⁡ ( F ( X ) ) G ( X ) + F ( X ) d G ( X ) \begin{aligned}\operatorname{d}(\boldsymbol{F}(\boldsymbol{X})\boldsymbol{G}(\boldsymbol{X}))&=\operatorname{d}(\boldsymbol{F}(\boldsymbol{X}))\boldsymbol{G}(\boldsymbol{X})+\boldsymbol{F}(\boldsymbol{X})\mathrm{d}\boldsymbol{G}(\boldsymbol{X})\end{aligned} d(F(X)G(X))=d(F(X))G(X)+F(X)dG(X)其中 F ( X ) \boldsymbol{F}(\boldsymbol{X}) F(X) p × q {p\times q} p×q 维的, G ( X ) \boldsymbol{G}(\boldsymbol{X}) G(X) q × s {q\times s} q×s 维的

注意:根据线性代数的内容,由于此时的微分是矩阵,不能交换乘积的左右顺序。

证明: F ( X ) G ( X ) F(X)\boldsymbol{G}(X) F(X)G(X) 的每个元素都是 ∑ k = 1 q [ f i k ( X ) g k j ( X ) ] \sum_{k=1}^q[f_{ik}(\boldsymbol{X})g_{kj}(\boldsymbol{X})] k=1q[fik(X)gkj(X)],而每个元素的全微分是 d ( ∑ k = 1 q [ f i k ( X ) g k j ( X ) ] ) = ∑ k = 1 q d ( f i k ( X ) g k j ( X ) ) = ∑ k = 1 q [ d ( f i k ( X ) ) g k j ( X ) + f i k ( X ) d g k j ( X ) ] = ∑ k = 1 q [ d ( f i k ( X ) ) g k j ( X ) ] + ∑ k = 1 q [ f i k ( X ) d g k j ( X ) ] \begin{aligned} \mathbb{d}\left( \sum_{k=1}^q[f_{ik}(\pmb{X})g_{kj}(\pmb{X})] \right) &=\sum_{k=1}^q \mathbb{d}(f_{ik}(\pmb{X})g_{kj}(\pmb{X})) \\\\ &= \sum_{k=1}^q[\mathbb{d}(f_{ik}(\pmb{X}))g_{kj}(\pmb{X})+f_{ik}(\pmb{X})\mathbb{d}g_{kj}(\pmb{X})] \\\\ &= \sum_{k=1}^q[\mathbb{d}(f_{ik}(\pmb{X}))g_{kj}(\pmb{X})]+ \sum_{k=1}^q[f_{ik}(\pmb{X})\mathbb{d}g_{kj}(\pmb{X})] \end{aligned} d(k=1q[fik(X)gkj(X)])=k=1qd(fik(X)gkj(X))=k=1q[d(fik(X))gkj(X)+fik(X)dgkj(X)]=k=1q[d(fik(X))gkj(X)]+k=1q[fik(X)dgkj(X)]我们发现在上式中左边的求和式,就是 d ( F ( X ) G ( X ) ) \mathrm{d}(F(X)\boldsymbol{G}(X)) d(F(X)G(X)) 的每个元素,结果右边的求和式,就是 就是 d ( G ( X ) F ( X ) ) \mathrm{d}(G(X)\boldsymbol{F}(X)) d(G(X)F(X)) 的每个元素。

转置法则

即转置的矩阵微分等于矩阵微分的转置
d F p × q T ( X ) = ( d F p × q ( X ) ) T \mathrm{d}\boldsymbol{F}_{p\times q}^T(\boldsymbol{X})=(\mathrm{d}\boldsymbol{F}_{p\times q}(\boldsymbol{X}))^T dFp×qT(X)=(dFp×q(X))T证明: d F p × q T ( X ) = d [ f 11 ( X ) f 21 ( X ) ⋯ f p 1 ( X ) f 12 ( X ) f 22 ( X ) ⋯ f p 2 ( X ) ⋮ ⋮ ⋮ ⋮ f 1 q ( X ) f 2 q ( X ) ⋯ f p q ( X ) ] q × p = [ d f 11 ( X ) d f 21 ( X ) ⋯ d f p 1 ( X ) d f 12 ( X ) d f 22 ( X ) ⋯ d f p 2 ( X ) ⋮ ⋮ ⋮ ⋮ d f 1 q ( X ) d f 2 q ( X ) ⋯ d f p q ( X ) ] q × p = [ d f 11 ( X ) d f 12 ( X ) ⋯ d f 1 q ( X ) d f 21 ( X ) d f 22 ( X ) ⋯ d f 2 q ( X ) ⋮ ⋮ ⋮ ⋮ d f p 1 ( X ) d f p 2 ( X ) ⋯ d f p q ( X ) ] p × q T = ( d F p × q ( X ) ) T \begin{aligned} \mathbb{d}\pmb{F}^T_{p \times q}(\pmb{X}) &= \mathbb{d} \begin{bmatrix} f_{11}(\pmb{X})& f_{21}(\pmb{X}) & \cdots & f_{p1}(\pmb{X}) \\ f_{12}(\pmb{X})& f_{22}(\pmb{X}) & \cdots & f_{p2}(\pmb{X}) \\ \vdots&\vdots&\vdots&\vdots \\ f_{1q}(\pmb{X})&f_{2q}(\pmb{X}) & \cdots & f_{pq}(\pmb{X}) \end{bmatrix}_{q \times p} \\\\ &= \begin{bmatrix} \mathbb{d}f_{11}(\pmb{X})& \mathbb{d}f_{21}(\pmb{X}) & \cdots & \mathbb{d}f_{p1}(\pmb{X}) \\ \mathbb{d}f_{12}(\pmb{X})& \mathbb{d}f_{22}(\pmb{X}) & \cdots & \mathbb{d}f_{p2}(\pmb{X}) \\ \vdots&\vdots&\vdots&\vdots \\ \mathbb{d}f_{1q}(\pmb{X})&\mathbb{d}f_{2q}(\pmb{X}) & \cdots & \mathbb{d}f_{pq}(\pmb{X}) \end{bmatrix}_{q \times p} \\\\ &= \begin{bmatrix} \mathbb{d}f_{11}(\pmb{X})& \mathbb{d}f_{12}(\pmb{X}) & \cdots & \mathbb{d}f_{1q}(\pmb{X}) \\ \mathbb{d}f_{21}(\pmb{X})& \mathbb{d}f_{22}(\pmb{X}) & \cdots & \mathbb{d}f_{2q}(\pmb{X}) \\ \vdots&\vdots&\vdots&\vdots \\ \mathbb{d}f_{p1}(\pmb{X})& \mathbb{d}f_{p2}(\pmb{X}) & \cdots & \mathbb{d}f_{pq}(\pmb{X}) \end{bmatrix}_{p \times q}^T \\\\ &= (\mathbb{d}\pmb{F}_{p \times q}(\pmb{X}))^T \end{aligned} dFp×qT(X)=d f11(X)f12(X)f1q(X)f21(X)f22(X)f2q(X)fp1(X)fp2(X)fpq(X) q×p= df11(X)df12(X)df1q(X)df21(X)df22(X)df2q(X)dfp1(X)dfp2(X)dfpq(X) q×p= df11(X)df21(X)dfp1(X)df12(X)df22(X)dfp2(X)df1q(X)df2q(X)dfpq(X) p×qT=(dFp×q(X))T

3. 矩阵微分

3.1 矩阵微分的意义

X m × n X_{m\times n} Xm×n 可以看成是以自身为矩阵变元的实矩阵函数,它的每个元素是 x i j x_{ij} xij 。每个元素的全微分是 d x i j \mathrm{d}x_{ij} dxij ,因此矩阵 X m × n X_{m\times n} Xm×n 的全微分为: d X m × n = [ d x 11 d x 12 ⋯ d x 1 n d x 21 d x 22 ⋯ d x 2 n ⋮ ⋮ ⋮ ⋮ d x m 1 d x m 2 ⋯ d x m n ] m × n \mathrm{d}{X}_{m\times n}=\begin{bmatrix}\mathrm{d}x_{11}&\mathrm{d}x_{12}&\cdots&\mathrm{d}x_{1n}\\\mathrm{d}x_{21}&\mathrm{d}x_{22}&\cdots&\mathrm{d}x_{2n}\\\vdots&\vdots&\vdots&\vdots\\\mathrm{d}x_{m1}&\mathrm{d}x_{m2}&\cdots&\mathrm{d}x_{mn}\end{bmatrix}_{m\times n} dXm×n= dx11dx21dxm1dx12dx22dxm2dx1ndx2ndxmn m×n前面我们提到过,向量可以看成是一种特殊的矩阵,那么向量 x ⃗ = [ x 1 , x 2 , ⋯   , x n ] T \vec{x}=[x_1,x_2,\cdots,x_n]^T x =[x1,x2,,xn]T 的矩阵微分为 d x ⃗ = [ d x 1 d x 2 ⋮ d x n ] n × 1 \mathrm{d}\vec{x}=\begin{bmatrix}\mathrm{d}x_1\\\mathrm{d}x_2\\\vdots\\\mathrm{d}x_n\end{bmatrix}_{n\times1} dx = dx1dx2dxn n×1前面提到过的矩阵微分的基本运算法则,对于矩阵 X m × n X_{m\times n} Xm×n 和 向量 x ⃗ = [ x 1 , x 2 , ⋯   , x n ] T \vec{x}=[x_1,x_2,\cdots,x_n]^T x =[x1,x2,,xn]T 的微分也是适用的。

现在回到矩阵变元的实值标量函数的全微分,即: d f ( X ) = ∂ f ∂ x 11 d x 11 + ∂ f ∂ x 12 d x 12 + ⋯ + ∂ f ∂ x 1 n d x 1 n + ∂ f ∂ x 21 d x 21 + ∂ f ∂ x 22 d x 22 + ⋯ + ∂ f ∂ x 2 n d x 2 n + ⋯ + ∂ f ∂ x m 1 d x m 1 + ∂ f ∂ x m 2 d x m 2 + ⋯ + ∂ f ∂ x m n d x m n = t r ( [ ∂ f ∂ x 11 ∂ f ∂ x 21 ⋯ ∂ f ∂ x m 1 ∂ f ∂ x 12 ∂ f ∂ x 22 ⋯ ∂ f ∂ x m 2 ⋮ ⋮ ⋮ ⋮ ∂ f ∂ x 1 n ∂ f ∂ x 2 n ⋯ ∂ f ∂ x m n ] n × m [ d x 11 d x 12 ⋯ d x 1 n d x 21 d x 22 ⋯ d x 2 n ⋮ ⋮ ⋮ ⋮ d x m 1 d x m 2 ⋯ d x m n ] m × n )   (20) \begin{aligned} \mathbb{d}f(\pmb{X}) &=\frac{\partial f}{\partial x_{11}}\mathbb{d}x_{11}+\frac{\partial f}{\partial x_{12}}\mathbb{d}x_{12} + \cdots+\frac{\partial f}{\partial x_{1n}}\mathbb{d}x_{1n}\\ &+\frac{\partial f}{\partial x_{21}}\mathbb{d}x_{21}+\frac{\partial f}{\partial x_{22}}\mathbb{d}x_{22} + \cdots+\frac{\partial f}{\partial x_{2n}}\mathbb{d}x_{2n}\\ &+\cdots\\ &+\frac{\partial f}{\partial x_{m1}}\mathbb{d}x_{m1}+\frac{\partial f}{\partial x_{m2}}\mathbb{d}x_{m2} + \cdots+\frac{\partial f}{\partial x_{mn}}\mathbb{d}x_{mn} \\\\ &=\mathbb{tr}( \begin{bmatrix} \frac{\partial f}{\partial x_{11}}&\frac{\partial f}{\partial x_{21}}&\cdots&\frac{\partial f}{\partial x_{m1}} \\ \frac{\partial f}{\partial x_{12}}&\frac{\partial f}{\partial x_{22}}& \cdots & \frac{\partial f}{\partial x_{m2}}\\ \vdots&\vdots&\vdots&\vdots\\ \frac{\partial f} {\partial x_{1n}}&\frac{\partial f}{\partial x_{2n}}&\cdots&\frac{\partial f}{\partial x_{mn}} \end{bmatrix}_{n\times m} \begin{bmatrix} \mathbb{d}x_{11} & \mathbb{d}x_{12} & \cdots & \mathbb{d}x_{1n} \\ \mathbb{d}x_{21} & \mathbb{d}x_{22} & \cdots & \mathbb{d}x_{2n} \\ \vdots&\vdots&\vdots&\vdots\\ \mathbb{d}x_{m1} & \mathbb{d}x_{m2} & \cdots & \mathbb{d}x_{mn} \end{bmatrix}_{m \times n} ) \end{aligned} \\\ \tag{20} df(X)=x11fdx11+x12fdx12++x1nfdx1n+x21fdx21+x22fdx22++x2nfdx2n++xm1fdxm1+xm2fdxm2++xmnfdxmn=tr( x11fx12fx1nfx21fx22fx2nfxm1fxm2fxmnf n×m dx11dx21dxm1dx12dx22dxm2dx1ndx2ndxmn m×n) (20)观察上式的结果能够发现,tr 中实际就是前面提到过的Jacabian矩阵的形式,即: D X f ( X ) = ∂ f ( X ) ∂ X m × n T = [ ∂ f ∂ x 11 ∂ f ∂ x 21 ⋯ ∂ f ∂ x m 1 ∂ f ∂ x 12 ∂ f ∂ x 22 ⋯ ∂ f ∂ x m 2 ⋮ ⋮ ⋮ ⋮ ∂ f ∂ x 1 n ∂ f ∂ x 2 n ⋯ ∂ f ∂ x m n ] n × m \begin{aligned} \text{D}_{\pmb{X}}f(\pmb{X})&= \frac{\partial f(\pmb{X})}{\partial \pmb{X}^T_{m\times n}} \\\\ &= \begin{bmatrix} \frac{\partial f}{\partial x_{11}}&\frac{\partial f}{\partial x_{21}}&\cdots&\frac{\partial f}{\partial x_{m1}} \\ \frac{\partial f}{\partial x_{12}}&\frac{\partial f}{\partial x_{22}}& \cdots & \frac{\partial f}{\partial x_{m2}}\\ \vdots&\vdots&\vdots&\vdots\\ \frac{\partial f} {\partial x_{1n}}&\frac{\partial f}{\partial x_{2n}}&\cdots&\frac{\partial f}{\partial x_{mn}} \end{bmatrix}_{n\times m} \end{aligned} DXf(X)=Xm×nTf(X)= x11fx12fx1nfx21fx22fx2nfxm1fxm2fxmnf n×m而第二项实际上就是矩阵 X m × n X_{m\times n} Xm×n 的全微分: d X m × n = [ d x 11 d x 12 ⋯ d x 1 n d x 21 d x 22 ⋯ d x 2 n ⋮ ⋮ ⋮ ⋮ d x m 1 d x m 2 ⋯ d x m n ] m × n \mathrm{d}{X}_{m\times n}=\begin{bmatrix}\mathrm{d}x_{11}&\mathrm{d}x_{12}&\cdots&\mathrm{d}x_{1n}\\\mathrm{d}x_{21}&\mathrm{d}x_{22}&\cdots&\mathrm{d}x_{2n}\\\vdots&\vdots&\vdots&\vdots\\\mathrm{d}x_{m1}&\mathrm{d}x_{m2}&\cdots&\mathrm{d}x_{mn}\end{bmatrix}_{m\times n} dXm×n= dx11dx21dxm1dx12dx22dxm2dx1ndx2ndxmn m×n因此,矩阵变元的实值标量函数的全微分可以写成: d f ( X ) = t r ( ∂ f ( X ) ∂ X T d X ) \mathrm{d}f(\boldsymbol{X})=\mathrm{tr}(\frac{\partial f(\boldsymbol{X})}{\partial\boldsymbol{X}^T}\mathrm{d}\boldsymbol{X}) df(X)=tr(XTf(X)dX)对于一个矩阵变元的实值标量函数而言,要求其微分,其实就是求 ∂ f ( X ) ∂ X T \frac{\partial f(\boldsymbol{X})}{\partial\boldsymbol{X}^T} XTf(X),如果能够写成上述形式,那么就完成了计算的过程,但可能你还会想这个分解式是否是唯一的呢?实际上确实就是唯一的,即如果 d f ( X ) = t r ( A 1 d X ) = t r ( A 2 d X ) \mathrm{d}f(\boldsymbol{X})=\mathrm{tr}(\boldsymbol{A}_1\mathrm{d}\boldsymbol{X})=\mathrm{tr}(\boldsymbol{A}_2\mathrm{d}\boldsymbol{X}) df(X)=tr(A1dX)=tr(A2dX),则有 A 1 = A 2 \boldsymbol{A}_1=\boldsymbol{A}_2 A1=A2

实际上,由于向量可以看成一种特殊的矩阵,因此向量变元的实值标量函数的全微分可以写成: d f ( x ⃗ ) = t r ( ∂ f ( x ⃗ ) ∂ x ⃗ T d x ⃗ ) \mathrm{d}f(\vec{x})=\mathrm{tr}(\frac{\partial f(\vec{x})}{\partial\vec{x}^T}\mathrm{d}\vec{x}) df(x )=tr(x Tf(x )dx )当矩阵变元 X X X 退化为一个列向量 x ⃗ \vec{x} x 时,则有 ∂ f ( X ) ∂ X T = ∂ f ( x ⃗ ) ∂ x ⃗ T , d X = d x ⃗ \frac{\partial f(\boldsymbol{X})}{\partial\boldsymbol{X}^T}=\frac{\partial f(\vec{x})}{\partial\vec{x}^T},\mathrm{d}X=\mathrm{d}\vec{x} XTf(X)=x Tf(x )dX=dx 那么对于实值标量函数而言,不论变元是向量还是矩阵,都可以用如下形式来求解微分: d f ( X ) = t r ( ∂ f ( X ) ∂ X T d X ) \mathrm{d}f(\boldsymbol{X})=\mathrm{tr}(\frac{\partial f(\boldsymbol{X})}{\partial\boldsymbol{X}^T}\mathrm{d}\boldsymbol{X}) df(X)=tr(XTf(X)dX)

下面来看看几个常用的公式,最好能够记忆(因为在各种相关论文中都经常见到)

夹心公式
d ( A X B ) = A d ( X ) B \mathrm{d}(\boldsymbol{A}\boldsymbol{X}\boldsymbol{B})=\boldsymbol{A}\mathrm{d}(\boldsymbol{X})\boldsymbol{B} d(AXB)=Ad(X)B 其中, A p × m , B n × q A_{p\times m},\boldsymbol{B}_{n\times q} Ap×m,Bn×q 是常数矩阵。

证明: 由乘积法则得:
d ( A X B ) = d ( A ) X B + A d ( X ) B + A X d B \mathrm{d}(AXB)=\mathrm{d}(A)XB+A\mathrm{d}(X)B+AX\mathrm{d}B d(AXB)=d(A)XB+Ad(X)B+AXdB再由常数矩阵微分为 0 可以得:
d A = 0 p × m , d B = 0 n × q \mathrm{d}\boldsymbol{A}=\mathbf{0}_{p\times m},\mathrm{d}\boldsymbol{B}=\mathbf{0}_{n\times q} dA=0p×m,dB=0n×q由此结果,代入 d ( A ) X B + A d ( X ) B + A X d B \mathrm{d}(A)XB+A\mathrm{d}(X)B+AX\mathrm{d}B d(A)XB+Ad(X)B+AXdB 即可证明

行列式

d ∣ X ∣ = ∣ X ∣ t r ( X − 1 d X ) = t r ( ∣ X ∣ X − 1 d X ) \begin{aligned}\mathrm{d}|\boldsymbol{X}|&=|\boldsymbol{X}|\mathrm{tr}(\boldsymbol{X}^{-1}\mathrm{d}\boldsymbol{X})=\mathrm{tr}(|\boldsymbol{X}|\boldsymbol{X}^{-1}\mathrm{d}\boldsymbol{X})\end{aligned} dX=Xtr(X1dX)=tr(XX1dX)其中 X n × n \boldsymbol{X}_{n\times n} Xn×n n × n n \times n n×n 维的

由于行列式是一个实值标量函数,那么我们便可以应用矩阵变元的实值标量函数的公式,即: d f ( X ) = t r ( ∂ f ( X ) ∂ X T d X ) \mathrm{d}f(\boldsymbol{X})=\mathrm{tr}(\frac{\partial f(\boldsymbol{X})}{\partial\boldsymbol{X}^T}\mathrm{d}\boldsymbol{X}) df(X)=tr(XTf(X)dX)

由线性代数的知识,行列式可以按照一行展开,即一行中每个元素乘以它的代数余子式然后求和,我们从矩阵 X X X 的第 i i i 行展开:
∣ X ∣ = x i 1 A i 1 + x i 2 A i 2 + ⋯ + x i n A i n |\boldsymbol{X}|=x_{i1}A_{i1}+x_{i2}A_{i2}+\cdots+x_{in}A_{in} X=xi1Ai1+xi2Ai2++xinAin

因此,行列式对元素 x i j x_{ij} xij 的偏导,即为该元素对应的代数余子式,也就是有: ∂ ∣ X ∣ ∂ x i j = A i j \frac{\partial|\boldsymbol{X}|}{\partial x_{ij}}=A_{ij} xijX=Aij从而行列式对矩阵求导的结果为 ∂ ∣ X ∣ ∂ X T = [ A 11 A 21 ⋯ A n 1 A 12 A 22 ⋯ A n 2 ⋮ ⋮ ⋱ ⋮ A 1 n A 2 n ⋯ A n n ] \begin{aligned}\frac{\partial|\boldsymbol{X}|}{\partial\boldsymbol{X}^T}=\begin{bmatrix}A_{11}&A_{21}&\cdots&A_{n1}\\A_{12}&A_{22}&\cdots&A_{n2}\\\vdots&\vdots&\ddots&\vdots\\A_{1n}&A_{2n}&\cdots&A_{nn}\end{bmatrix}\end{aligned} XTX= A11A12A1nA21A22A2nAn1An2Ann 这个结果其实就是伴随矩阵,利用伴随矩阵和行列式以及逆矩阵的关系,有: X − 1 = X ∗ ∣ X ∣ X^{-1}=\frac{X^*}{|X|} X1=XX将该式带入到矩阵变元对标量实值函数的求导公式 d f ( X ) = t r ( ∂ f ( X ) ∂ X T d X ) \mathrm{d}f(\boldsymbol{X})=\mathrm{tr}(\frac{\partial f(\boldsymbol{X})}{\partial\boldsymbol{X}^T}\mathrm{d}\boldsymbol{X}) df(X)=tr(XTf(X)dX) 中后,可以得到
d ∣ X ∣ = tr ⁡ ( ∂ ∣ X ∣ ∂ X T d X ) = tr ⁡ ( ∣ X ∣ X − 1 d X ) \begin{aligned}\mathrm{d}|X|& =\operatorname{tr}(\frac{\partial|\boldsymbol{X}|}{\partial\boldsymbol{X}^T}\mathrm{d}\boldsymbol{X}) \\ &=\operatorname{tr}(|\boldsymbol{X}|\boldsymbol{X}^{-1}\mathrm{d}\boldsymbol{X}) \end{aligned} dX=tr(XTXdX)=tr(XX1dX)又因为行列式是标量,由矩阵的迹的性质可以将标量提到迹外,也就是:
d ∣ X ∣ = ∣ X ∣ t r ( X − 1 d X ) = t r ( ∣ X ∣ X − 1 d X ) \mathrm{d}|\boldsymbol{X}|=|\boldsymbol{X}|\mathrm{tr}(\boldsymbol{X}^{-1}\mathrm{d}\boldsymbol{X})=\mathrm{tr}(|\boldsymbol{X}|\boldsymbol{X}^{-1}\mathrm{d}\boldsymbol{X}) dX=Xtr(X1dX)=tr(XX1dX)如上所述,证明完毕

逆矩阵

d ( X − 1 ) = − X − 1 d ( X ) X − 1 \mathrm{d}(\boldsymbol{X}^{-1})=-\boldsymbol{X}^{-1}\mathrm{d}(\boldsymbol{X})\boldsymbol{X}^{-1} d(X1)=X1d(X)X1其中 X n × n \boldsymbol{X}_{n\times n} Xn×n n × n n \times n n×n 维的

因为

X X − 1 = E XX^{-1}=\boldsymbol{E} XX1=E

而常数矩阵微分为 0 , 两边同时取矩阵微分得:

d ⁡ ( X ) X − 1 + X d ⁡ ( X − 1 ) = 0 \operatorname{d}(X)X^{-1}+X\operatorname{d}(X^{-1})=\mathbf{0} d(X)X1+Xd(X1)=0

等式两边左乘 X − 1 X^{-1} X1 即得到结果。

3.2 矩阵微分示范

对于实值标量函数
f ( X ) f(\boldsymbol{X}) f(X),满足 tr ⁡ ( f ( X ) ) = f ( X ) , d ⁡ f ( X ) = tr ⁡ ( d ⁡ f ( X ) ) \operatorname{tr}(f(\boldsymbol{X}))=f(\boldsymbol{X})\text{,}\operatorname{d}f(\boldsymbol{X})=\operatorname{tr}(\operatorname{d}f(\boldsymbol{X})) tr(f(X))=f(X),df(X)=tr(df(X)),所以有 d f ( X ) = d ( t r f ( X ) ) = t r ( d f ( X ) ) \mathrm df(\boldsymbol{X})=\mathrm d(\mathrm trf(\boldsymbol{X}))=\mathrm t\mathrm r(\mathrm df(\boldsymbol{X})) df(X)=d(trf(X))=tr(df(X))而如果实值标量函数本身就是某个矩阵函数, F p × p ( X ) \boldsymbol{F}_{p\times p}(\boldsymbol{X}) Fp×p(X) 的迹,如 t r F ( X ) \mathrm{tr}\boldsymbol{F}(\boldsymbol{X}) trF(X),则由全微分的线性法则有 d ( t r F p × p ( X ) ) = d ( ∑ i = 1 p f i i ( X ) ) = ∑ i = 1 p d ( f i i ( X ) ) = t r ( d F p × p ( X ) ) \mathrm{d}(\mathrm{tr}\boldsymbol{F}_{p\times p}(\boldsymbol{X}))=\mathrm{d}(\sum_{i=1}^pf_{ii}(\boldsymbol{X}))=\sum_{i=1}^p\mathrm{d}(f_{ii}(\boldsymbol{X}))=\mathrm{tr}(\mathrm{d}F_{p\times p}(\boldsymbol{X})) d(trFp×p(X))=d(i=1pfii(X))=i=1pd(fii(X))=tr(dFp×p(X))我下面以几个例子作为示范,来看看如何使用矩阵微分求导。

例子1
∂ ( a ⃗ T X b ⃗ ) ∂ X = a ⃗ b ⃗ T \frac{\partial(\vec{a}^TX\vec{b})}{\partial{X}}=\vec{a}\vec{b}^T X(a TXb )=a b T

其中, a ⃗ m × 1 , b ⃗ n × 1 \vec{a}_{m\times1},\vec{b}_{n\times1} a m×1,b n×1 为常数向量, a ⃗ = ( a 1 , a 2 , ⋯   , a m ) T , b ⃗ = ( b 1 , b 2 , ⋯   , b n ) T \vec{a}_=(a_1,a_2,\cdots,a_m)^T,\vec{b}=(b_1,b_2,\cdots,b_n)^T a =(a1,a2,,am)T,b =(b1,b2,,bn)T,矩阵 X X X m × n m \times n m×n

证:首先我们由笔记(1)中的内容可以知道上述情况是分子为标量,而分母为矩阵的形式,因此结果的维度是 m × n m \times n m×n

d ( a ⃗ T X X T b ⃗ ) =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ 按tr对标量微分的性质 t r ( d ( a ⃗ T X X T b ⃗ ) ) =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ 由夹心公式 t r ( a ⃗ T d ( X X T ) b ⃗ ) =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ 矩阵乘积微分法则 tr ⁡ [ a ⃗ T ( d ⁡ ( X ) X T + X d X T ) b ⃗ ] =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ tr的线性法则 t r ( a ⃗ T d ( X ) X T b ⃗ ) + t r ( a ⃗ T X d ( X T ) b ⃗ ) =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ 矩阵求导的转置法则 t r ( a ⃗ T d ( X ) X T b ⃗ ) + t r ( a ⃗ T X ( d X ) T b ⃗ ) =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ tr的转置法则 tr ⁡ ( X T b ⃗ a ⃗ T d X ) + tr ⁡ ( X T a ⃗ b ⃗ T d X ) =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ tr的线性法则 tr ⁡ ( ( X T b ⃗ a ⃗ T + X T a ⃗ b ⃗ T ) d X ) \begin{aligned}\mathrm{d}(\vec{a}^T{X}{X}^T\vec{b})&\overset{\text{按tr对标量微分的性质}}{=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!}\mathrm{tr}(\mathrm{d}(\vec{a}^T{X}{X}^T\vec{b}))\\ &\overset{\text{由夹心公式}}{=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!}\mathrm{tr}(\vec{a}^T\mathrm{d}({X}{X}^T)\vec{b})\\ &\overset{\text{矩阵乘积微分法则}}{=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!}\operatorname{tr}[\vec{a}^T(\operatorname{d}({X}){X}^T+{X}\mathrm{d}{X}^T)\vec{b}]\\ &\overset{\text{tr的线性法则}}{=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!}\mathrm{tr}(\vec{a}^T\mathrm{d}({X}){X}^T\vec{b})+\mathrm{tr}(\vec{a}^T{X}\mathrm{d}({X}^T)\vec{b})\\ &\overset{\text{矩阵求导的转置法则}}{=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!}\mathrm{tr}(\vec{a}^T\mathrm{d}({X}){X}^T\vec{b})+\mathrm{tr}(\vec{a}^T{X}(\mathrm{d}X)^T\vec{b})\\ &\overset{\text{tr的转置法则}}{=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!}\operatorname{tr}({X}^T\vec{b}\vec{a}^T\mathrm{d}{X})+\operatorname{tr}({X}^T\vec{a}\vec{b}^T\mathrm{d}{X})\\ &\overset{\text{tr的线性法则}}{=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!}\operatorname{tr}(({X}^T\vec{b}\vec{a}^T+{X}^T\vec{a}\vec{b}^T)\mathrm{d}{X}) \end{aligned} d(a TXXTb )============tr对标量微分的性质tr(d(a TXXTb ))============由夹心公式tr(a Td(XXT)b )============矩阵乘积微分法则tr[a T(d(X)XT+XdXT)b ]============tr的线性法则tr(a Td(X)XTb )+tr(a TXd(XT)b )============矩阵求导的转置法则tr(a Td(X)XTb )+tr(a TX(dX)Tb )============tr的转置法则tr(XTb a TdX)+tr(XTa b TdX)============tr的线性法则tr((XTb a T+XTa b T)dX)将该式 d f ( X ) = t r ( ∂ f ( X ) ∂ X T d X ) \mathrm{d}f({X})=\mathrm{tr}(\frac{\partial f({X})}{\partial{X}^T}\mathrm{d}{X}) df(X)=tr(XTf(X)dX) 比较,令 f ( X ) = a ⃗ T X X T b ⃗ f(X)=\vec{a}^T{X}{X}^T\vec{b} f(X)=a TXXTb ,便可以得到 ∂ ( a ⃗ T X X T b ⃗ ) ∂ X T = X T b ⃗ a ⃗ T + X T a ⃗ b ⃗ T ∂ ( a ⃗ T X X T b ⃗ ) ∂ X = a ⃗ b ⃗ T X + b ⃗ a ⃗ T X \begin{aligned}&\frac{\partial(\vec{a}^T{X}{X}^T\vec{b})}{\partial{X}^T}={X}^T\vec{b}\vec{a}^T+{X}^T\vec{a}\vec{b}^T\\\\&\frac{\partial(\vec{a}^T{X}{X}^T\vec{b})}{\partial{X}}=\vec {a}\vec {b}^T{X}+\vec {b}\vec {a}^T{X}\end{aligned} XT(a TXXTb )=XTb a T+XTa b TX(a TXXTb )=a b TX+b a TX证毕

例子2 ∂ t r ( X T X ) ∂ X = 2 X \frac{\partial\mathrm{tr}(\boldsymbol{X}^T\boldsymbol{X})}{\partial\boldsymbol{X}}=2\boldsymbol{X} Xtr(XTX)=2X证明:因为
d ( t r ( X T X ) ) =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ 按tr对标量微分的性质 t r ( d ( X T X ) ) =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ 矩阵乘积微分法则 tr ⁡ ( d ⁡ ( X T ) X + X T d ( X ) ) =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ tr的线性法则 tr ⁡ ( d ⁡ ( X T ) X ) + tr ⁡ ( X T d ( X ) ) =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ 矩阵求导的转置法则 tr ⁡ ( ( d ⁡ ( X ) ) T X ) + tr ⁡ ( X T d ( X ) ) =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ tr的转置法则 2 tr ⁡ ( X T d X ) =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ =  ⁣ tr的线性法则 tr ⁡ ( 2 X T d X ) \begin{aligned}\mathrm{d}(\mathrm{tr}(X^TX))&\overset{\text{按tr对标量微分的性质}}{=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!}\mathrm{tr}(\mathrm{d}(X^TX))\\ &\overset{\text{矩阵乘积微分法则}}{=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!}\operatorname{tr}(\operatorname{d}({X^T}){X}+{X^T}\mathrm{d}{(X)})\\ &\overset{\text{tr的线性法则}}{=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!}\operatorname{tr}(\operatorname{d}({X^T}){X})+\operatorname{tr}({X^T}\mathrm{d}{(X)})\\ &\overset{\text{矩阵求导的转置法则}}{=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!}\operatorname{tr}((\operatorname{d}({X}))^T{X})+\operatorname{tr}({X^T}\mathrm{d}{(X)})\\ &\overset{\text{tr的转置法则}}{=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!}2\operatorname{tr}({X}^T\mathrm{d}{X})\\ &\overset{\text{tr的线性法则}}{=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!=\!}\operatorname{tr}(2{X}^T\mathrm{d}{X}) \end{aligned} d(tr(XTX))============tr对标量微分的性质tr(d(XTX))============矩阵乘积微分法则tr(d(XT)X+XTd(X))============tr的线性法则tr(d(XT)X)+tr(XTd(X))============矩阵求导的转置法则tr((d(X))TX)+tr(XTd(X))============tr的转置法则2tr(XTdX)============tr的线性法则tr(2XTdX)将该式 d f ( X ) = t r ( ∂ f ( X ) ∂ X T d X ) \mathrm{d}f({X})=\mathrm{tr}(\frac{\partial f({X})}{\partial{X}^T}\mathrm{d}{X}) df(X)=tr(XTf(X)dX) 比较,令 f ( X ) = t r ( X T X ) f(X)=\mathrm{tr}({X^T}{X}) f(X)=tr(XTX),便可以得到 t r ( X T X ) ∂ X T = 2 X T t r ( X T X ) ∂ X = 2 X \begin{aligned}&\frac{\mathrm{tr}({X^T}{X})}{\partial{X}^T}=2{X}^T\\\\&\frac{\mathrm{tr}({X^T}{X})}{\partial{X}}=2X\end{aligned} XTtr(XTX)=2XTXtr(XTX)=2X证毕

例子3

∂ log ⁡ ∣ X ∣ ∂ X = ( X − 1 ) T \frac{\partial\log|{X}|}{\partial{X}}=({X}^{-1})^T XlogX=(X1)T其中 X X X n × n n \times n n×n

Emmm,这里我就不看了(请允许我偷个懒- -)

至此,所有的矩阵变元或者向量变元的实值标量函数的一阶矩阵求导都可以用本文的方法进行计算。实际上,由于我们只考虑了实值函数和一阶导数,我们仍然可以定义更高阶的矩阵微分以及将数域推广到复数,但这就属于后话了,以后如果需要的话我会再学习这方面的内容的。

好吧就到此为止了,如果想了解更多的例子,可以从参考的原文章中获取哦,毕竟我也只是来学习的嘛

参考

矩阵求导公式的数学推导(矩阵求导——进阶篇)

张贤达《矩阵分析与应用(第二版)》

  • 11
    点赞
  • 12
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值