矩阵求导(三)-- 一阶微分法

1 矩阵的迹

1.1 定义

         n × n n \times n n×n 的方阵 A n × n \pmb{A}_{n \times n} AAAn×n 的主对角线元素之和就叫矩阵 A \pmb{A} AAA 的迹(trace),记作 t r ( A ) \mathbb{tr}(\pmb{A}) tr(AAA) ,即:
t r ( A ) = a 11 + a 22 + ⋯ + a n n = ∑ i = 1 n a i i (1-1) \mathbb{tr}(\pmb{A})=a_{11} + a_{22} + \cdots + a_{nn} = \sum_{i=1}^n{a_{ii}} \tag{1-1} tr(AAA)=a11+a22++ann=i=1naii(1-1)
        注:非方阵无迹的定义。

1.2 常用性质

1. 标量的迹
        对于一个标量 x x x,可以看成是 1 × 1 1 \times 1 1×1 的矩阵,它的迹就是它自身。
x = t r ( x ) (1-2) x=\mathbb{tr}(x) \tag{1-2} x=tr(x)(1-2)

2. 线性法则
        相加再求迹等于求迹再相加,标量提外面。
t r ( c 1 A + c 2 B ) = c 1 t r ( A ) + c 2 t r ( B ) (1-3) \mathbb{tr}(c_1\pmb{A}+c_2\pmb{B}) = c_1\mathbb{tr}(\pmb{A})+c_2\mathbb{tr}(\pmb{B}) \tag{1-3} tr(c1AAA+c2BBB)=c1tr(AAA)+c2tr(BBB)(1-3)
        其中, c 1 , c 2 c_1, c_2 c1,c2 为标量。

3. 转置
        转置的迹等于原矩阵的迹,因为转置不会改变主对角线的元素,所以可以得到:
t r ( A ) = t r ( A T ) (1-4) \mathbb{tr}(\pmb{A})=\mathbb{tr}(\pmb{A}^T) \tag{1-4} tr(AAA)=tr(AAAT)(1-4)

4. 乘积的迹的本质
        对于两个阶数都是 m × n m \times n m×n 的矩阵 A m × n , B m × n , \pmb{A}_{m\times n},\pmb{B}_{m\times n}, AAAm×n,BBBm×n, 其中一个矩阵乘以(左乘右乘都可以)另一个矩阵的转置的迹,本质是 A m × n , B m × n , \pmb{A}_{m\times n},\pmb{B}_{m\times n}, AAAm×n,BBBm×n, 两个矩阵对应位置的元素相乘并相加,可以理解为向量的点积在矩阵上的推广,即:
t r ( A B T ) = a 11 b 11 + a 12 b 12 + ⋯ + a 1 n b 1 n + a 21 b 21 + a 22 b 22 + ⋯ + a 2 n b 2 n + ⋯ + a m 1 b m 1 + a m 2 b m 2 + ⋯ + a m n b m n (1-5) \begin{aligned} \mathbb{tr}(\pmb{A}\pmb{B}^T) &= a_{11}b_{11}+a_{12}b_{12}+\cdots+a_{1n}b_{1n}\\ &+ a_{21}b_{21}+a_{22}b_{22}+\cdots+a_{2n}b_{2n}\\ &+ \cdots \\ &+ a_{m1}b_{m1}+a_{m2}b_{m2}+\cdots+a_{mn}b_{mn} \end{aligned} \tag{1-5} tr(AAABBBT)=a11b11+a12b12++a1nb1n+a21b21+a22b22++a2nb2n++am1bm1+am2bm2++amnbmn(1-5)

5. 交换律
        矩阵乘积位置互换,迹不变
t r ( A B ) = t r ( B A ) (1-6) \mathbb{tr}(\pmb{A}\pmb{B})= \mathbb{tr}(\pmb{B}\pmb{A}) \tag{1-6} tr(AAABBB)=tr(BBBAAA)(1-6)
        其中, A m × n , B n × m \pmb{A}_{m \times n},\pmb{B}_{n \times m} AAAm×n,BBBn×m,等式两边都等于 ∑ i , j m , n a i j b j i \sum_{i,j}^{m,n}a_{ij}b_{ji} i,jm,naijbji
t r ( A B C ) = t r ( C A B ) = t r ( B C A ) (1-7) \mathbb{tr}(\pmb{A}\pmb{B}\pmb{C})=\mathbb{tr}(\pmb{C}\pmb{A}\pmb{B})=\mathbb{tr}(\pmb{B}\pmb{C}\pmb{A}) \tag{1-7} tr(AAABBBCCC)=tr(CCCAAABBB)=tr(BBBCCCAAA)(1-7)
        其中, A m × n , B n × p , C p × m \pmb{A}_{m \times n},\pmb{B}_{n \times p},\pmb{C}_{p \times m} AAAm×n,BBBn×p,CCCp×m
6. 矩阵乘法/逐元素乘法交换
tr ( A T ( B ⊙ C ) ) = tr ( ( A ⊙ B ) T C ) (1-8) \text{tr}(\pmb{A}^T(\pmb{B}\odot \pmb{C})) = \text{tr}((\pmb{A}\odot \pmb{B})^T\pmb{C})\tag{1-8} tr(AAAT(BBBCCC))=tr((AAABBB)TCCC)(1-8)
        其中, A n × n , B n × n , C n × n \pmb{A}_{n \times n},\pmb{B}_{n \times n},\pmb{C}_{n \times n} AAAn×n,BBBn×n,CCCn×n,等式两边都等于 ∑ i , j n , n a i j b i j c i j \sum_{i,j}^{n,n}a_{ij}b_{ij}c_{ij} i,jn,naijbijcij

2 矩阵微分

2.1 标量对向量的微分

        设 f ( x ) , x = [ x 1 , x 2 , ⋯   , x n ] T f(\pmb{x}),\pmb{x}=[x_1,x_2,\cdots,x_n]^T f(xxx),xxx=[x1,x2,,xn]T,可以看做多元函数,设其可微,则它的全微分为:
d f ( x ) = ∂ f ∂ x 1 d x 1 + ∂ f ∂ x 2 d x 2 + ⋯ + ∂ f ∂ x n d x n = ( ∂ f ∂ x 1 , ∂ f ∂ x 2 , ⋯   , ∂ f ∂ x n ) [ d x 1 d x 2 ⋮ d x n ] (2-1) \begin{aligned} \mathbb{d}f(\pmb{x}) &=\frac{\partial f}{\partial x_1}\mathbb{d}x_1+\frac{\partial f}{\partial x_2}\mathbb{d}x_2 + \cdots+\frac{\partial f}{\partial x_n}\mathbb{d}x_n\\\\ &= (\frac{\partial f}{\partial x_1},\frac{\partial f}{\partial x_2},\cdots,\frac{\partial f}{\partial x_n}) \begin{bmatrix} \mathbb{d}x_1 \\ \mathbb{d}x_2\\ \vdots \\ \mathbb{d}x_n \end{bmatrix} \end{aligned} \tag{2-1} df(xxx)=x1fdx1+x2fdx2++xnfdxn=(x1f,x2f,,xnf)dx1dx2dxn(2-1)
        结果是标量,由式(1-2)可知,式(2-1)可以写成迹的形式,即:
d f ( x ) = t r ( ( ∂ f ∂ x 1 , ∂ f ∂ x 2 , ⋯   , ∂ f ∂ x n ) [ d x 1 d x 2 ⋮ d x n ] ) (2-2) \begin{aligned} \mathbb{d}f(\pmb{x}) &=\mathbb{tr}((\frac{\partial f}{\partial x_1},\frac{\partial f}{\partial x_2},\cdots,\frac{\partial f}{\partial x_n}) \begin{bmatrix} \mathbb{d}x_1 \\ \mathbb{d}x_2\\ \vdots \\ \mathbb{d}x_n \end{bmatrix}) \end{aligned} \tag{2-2} df(xxx)=tr((x1f,x2f,,xnf)dx1dx2dxn)(2-2)
        简记为:
d f ( x ) = ∂ f ( x ) ∂ x T d x = ( d x ) T ∂ f ( x ) ∂ x (2-3) \mathbb{d}f(\pmb{x}) = \dfrac{\partial f(\pmb{x})}{{\partial\pmb{x}^T}}\mathbb{d}\pmb{x} = (\mathbb{d}\pmb{x})^T\dfrac{\partial f(\pmb{x})}{{\partial\pmb{x}}} \tag{2-3} df(xxx)=xxxTf(xxx)dxxx=(dxxx)Txxxf(xxx)(2-3)
        式中,
∂ f ( x ) ∂ x T = [ ∂ f ∂ x 1 , ∂ f ∂ x 2 , ⋯   , ∂ f ∂ x n ] d x = [ d x 1 d x 2 ⋯ d x n ] T (2-4) \dfrac{\partial f(\pmb{x})}{{\partial\pmb{x}^T}} = [\frac{\partial f}{\partial x_1},\frac{\partial f}{\partial x_2},\cdots,\frac{\partial f}{\partial x_n}]\\ \quad \\ \mathbb{d}\pmb{x} = [\mathbb{d}x_1 \quad \mathbb{d}x_2 \quad \cdots \quad \mathbb{d}x_n]^T \tag{2-4} xxxTf(xxx)=[x1f,x2f,,xnf]dxxx=[dx1dx2dxn]T(2-4)
        对于向量变元的实值标量函数的全微分,由式(1-5)的意义,则式(2-2)可以写成:
d f ( x ) = ∂ f ( x ) ∂ x T d x = t r ( ∂ f ( x ) ∂ x T d x ) (2-5) \begin{aligned} \mathbb{d}f(\pmb{x}) &= \dfrac{\partial f(\pmb{x})}{{\partial\pmb{x}^T}}\mathbb{d}\pmb{x} =\mathbb{tr}(\frac{\partial f(\pmb{x})}{\partial\pmb{x}^T} \mathbb{d}\pmb{x})\end{aligned} \tag{2-5} df(xxx)=xxxTf(xxx)dxxx=tr(xxxTf(xxx)dxxx)(2-5)

        因此,通过矩阵微分可以得到Jacobian矩阵和梯度矩阵,即
d f ( x ) = t r ( ∂ f ( x ) ∂ x T d x )    ⟺    D x f ( x ) = ∂ f ( x ) ∂ x T = ( ∇ x f ( x ) ) T (2-6) \mathbb{d}f(\pmb{x}) = \mathbb{tr}(\dfrac{\partial f(\pmb{x})}{\partial\pmb{x}^T} \mathbb{d}\pmb{x}) \iff \text{D}_{\boldsymbol{x}}f(\pmb{x}) = \dfrac{\partial f(\pmb{x})}{\partial\pmb{x}^T} = (\nabla_{\boldsymbol{x}}f(\pmb{x}))^T \tag{2-6} df(xxx)=tr(xxxTf(xxx)dxxx)Dxf(xxx)=xxxTf(xxx)=(xf(xxx))T(2-6)

2.2 标量对矩阵的微分

        设 f ( X ) , X m × n = ( x i j ) i = 1 , j = 1 m , n f(\pmb{X}),\pmb{X}_{m\times n}=(x_{ij})_{i=1,j=1}^{m,n} f(XXX),XXXm×n=(xij)i=1,j=1m,n,它也是多元函数,设其可微,则它的全微分为:
d f ( X ) = ∂ f ∂ x 11 d x 11 + ∂ f ∂ x 12 d x 12 + ⋯ + ∂ f ∂ x 1 n d x 1 n + ∂ f ∂ x 21 d x 21 + ∂ f ∂ x 22 d x 22 + ⋯ + ∂ f ∂ x 2 n d x 2 n + ⋯ + ∂ f ∂ x m 1 d x m 1 + ∂ f ∂ x m 2 d x m 2 + ⋯ + ∂ f ∂ x m n d x m n (2-7) \begin{aligned} \mathbb{d}f(\pmb{X}) &=\frac{\partial f}{\partial x_{11}}\mathbb{d}x_{11}+\frac{\partial f}{\partial x_{12}}\mathbb{d}x_{12} + \cdots+\frac{\partial f}{\partial x_{1n}}\mathbb{d}x_{1n}\\ &+\frac{\partial f}{\partial x_{21}}\mathbb{d}x_{21}+\frac{\partial f}{\partial x_{22}}\mathbb{d}x_{22} + \cdots+\frac{\partial f}{\partial x_{2n}}\mathbb{d}x_{2n}\\ &+\cdots\\ &+\frac{\partial f}{\partial x_{m1}}\mathbb{d}x_{m1}+\frac{\partial f}{\partial x_{m2}}\mathbb{d}x_{m2} + \cdots+\frac{\partial f}{\partial x_{mn}}\mathbb{d}x_{mn} \end{aligned} \tag{2-7} df(XXX)=x11fdx11+x12fdx12++x1nfdx1n+x21fdx21+x22fdx22++x2nfdx2n++xm1fdxm1+xm2fdxm2++xmnfdxmn(2-7)
        我们从这个结果中发现,它其实就是矩阵 ( ∂ f ∂ x i j ) i = 1 , j = 1 m , n (\frac{\partial f}{\partial x_{ij}})_{i=1,j=1}^{m,n} (xijf)i=1,j=1m,n 与矩阵 ( d x i j ) i = 1 , j = 1 m , n (\mathbb{d}x_{ij})_{i=1,j=1}^{m,n} (dxij)i=1,j=1m,n 对应位置的元素相乘并相加,由式(1-5)可知,式(2-7)也可以写成迹的形式,即:
d f ( X ) = t r ( [ ∂ f ∂ x 11 ∂ f ∂ x 21 ⋯ ∂ f ∂ x m 1 ∂ f ∂ x 12 ∂ f ∂ x 22 ⋯ ∂ f ∂ x m 2 ⋮ ⋮ ⋮ ⋮ ∂ f ∂ x 1 n ∂ f ∂ x 2 n ⋯ ∂ f ∂ x m n ] n × m [ d x 11 d x 12 ⋯ d x 1 n d x 21 d x 22 ⋯ d x 2 n ⋮ ⋮ ⋮ ⋮ d x m 1 d x m 2 ⋯ d x m n ] m × n ) (2-8) \begin{aligned} \mathbb{d}f(\pmb{X}) &=\mathbb{tr}( \begin{bmatrix} \frac{\partial f}{\partial x_{11}}&\frac{\partial f}{\partial x_{21}}&\cdots&\frac{\partial f}{\partial x_{m1}} \\ \frac{\partial f}{\partial x_{12}}&\frac{\partial f}{\partial x_{22}}& \cdots & \frac{\partial f}{\partial x_{m2}}\\ \vdots&\vdots&\vdots&\vdots\\ \frac{\partial f} {\partial x_{1n}}&\frac{\partial f}{\partial x_{2n}}&\cdots&\frac{\partial f}{\partial x_{mn}} \end{bmatrix}_{n\times m} \begin{bmatrix} \mathbb{d}x_{11} & \mathbb{d}x_{12} & \cdots & \mathbb{d}x_{1n} \\ \mathbb{d}x_{21} & \mathbb{d}x_{22} & \cdots & \mathbb{d}x_{2n} \\ \vdots&\vdots&\vdots&\vdots\\ \mathbb{d}x_{m1} & \mathbb{d}x_{m2} & \cdots & \mathbb{d}x_{mn} \end{bmatrix}_{m \times n} ) \end{aligned} \tag{2-8} df(XXX)=tr(x11fx12fx1nfx21fx22fx2nfxm1fxm2fxmnfn×mdx11dx21dxm1dx12dx22dxm2dx1ndx2ndxmnm×n)(2-8)
        观察上面的结果,可以看到在 t r ( ) tr() tr() 里,左边的矩阵其实就是矩阵变元的Jacobian 矩阵形式 D X f ( X ) = ∂ f ( X ) ∂ X m × n T \text{D}_{\boldsymbol{X}}f(\pmb{X}) = \frac{\partial f(\boldsymbol{X})}{\partial \boldsymbol{X}^T_{m\times n}} DXf(XXX)=Xm×nTf(X),而右边的矩阵就是 d X m × n \mathbb{d}\pmb{X}_{m \times n} dXXXm×n,所以式(2-8)可以写成:
d f ( X ) = t r ( ∂ f ( X ) ∂ X T d X ) (2-9) \begin{aligned} \mathbb{d}f(\pmb{X}) &=\mathbb{tr}(\frac{\partial f(\pmb{X})}{\partial\pmb{X}^T} \mathbb{d}\pmb{X})\end{aligned} \tag{2-9} df(XXX)=tr(XXXTf(XXX)dXXX)(2-9)

        因此,通过矩阵微分可以得到Jacobian矩阵和梯度矩阵,即
d f ( X ) = t r ( ∂ f ( X ) ∂ X T d X )    ⟺    D X f ( X ) = ∂ f ( X ) ∂ X T = ( ∇ X f ( X ) ) T (2-10) \mathbb{d}f(\pmb{X}) = \mathbb{tr}(\dfrac{\partial f(\pmb{X})}{\partial\pmb{X}^T} \mathbb{d}\pmb{X}) \iff \text{D}_{\boldsymbol{X}}f(\pmb{X}) = \dfrac{\partial f(\pmb{X})}{\partial\pmb{X}^T} = (\nabla_{\boldsymbol{X}}f(\pmb{X}))^T \tag{2-10} df(XXX)=tr(XXXTf(XXX)dXXX)DXf(XXX)=XXXTf(XXX)=(Xf(XXX))T(2-10)

        所以,只要我们可以把一个矩阵变元的实值标量函数的全微分写成式(2-9),我们就找到了矩阵求导的结果。(已经有人证明,这样的结果是唯一的。即若 d f ( X ) = t r ( A 1 d X ) = t r ( A 2 d X ) \mathbb{d}f(\pmb{X}) =\mathbb{tr}(\pmb{A}_1\mathbb{d}\pmb{X}) = \mathbb{tr}(\pmb{A}_2\mathbb{d}\pmb{X}) df(XXX)=tr(AAA1dXXX)=tr(AAA2dXXX) ,则 A 1 = A 2 \pmb{A}_1=\pmb{A}_2 AAA1=AAA2 )

2.3 常用性质

2.3.1 四个法则

  • 常数矩阵的矩阵微分
    d A m × n = 0 m × n (2-11) \mathbb{d}\pmb{A}_{m \times n} = \pmb{0}_{m \times n} \tag{2-11} dAAAm×n=000m×n(2-11)
  • 线性法则
    d ( c 1 F ( X ) + c 2 G ( X ) ) = c 1 d F ( X ) + c 2 d G ( X ) ( c 1 , c 2 为 常 数 ) (2-12) \mathbb{d}(c_1\pmb{F}(\pmb{X})+c_2\pmb{G}(\pmb{X})) = c_1\mathbb{d}\pmb{F}(\pmb{X})+c_2\mathbb{d}\pmb{G}(\pmb{X})(c_1, c_2 为常数)\tag{2-12} d(c1FFF(XXX)+c2GGG(XXX))=c1dFFF(XXX)+c2dGGG(XXX)c1,c2(2-12)
  • 乘积法则
    d ( F ( X ) G ( X ) ) = d ( F ( X ) ) G ( X ) + F ( X ) d G ( X ) ( F p × q ( X ) , G q × s ( X ) ) (2-13) \mathbb{d}(\pmb{F}(\pmb{X})\pmb{G}(\pmb{X}))=\mathbb{d}(\pmb{F}(\pmb{X}))\pmb{G}(\pmb{X}) + \pmb{F}(\pmb{X})\mathbb{d}\pmb{G}(\pmb{X})(\pmb{F}_{p \times q}(\pmb{X}),\pmb{G}_{q \times s}(\pmb{X}))\tag{2-13} d(FFF(XXX)GGG(XXX))=d(FFF(XXX))GGG(XXX)+FFF(XXX)dGGG(XXX)FFFp×q(XXX),GGGq×s(XXX)(2-13)
    更多个乘积的法则:
    d ( F ( X ) G ( X ) H ( X ) ) = d ( F ( X ) ) G ( X ) H ( X ) + F ( X ) d ( G ( X ) ) H ( X ) + F ( X ) G ( X ) d H ( X ) (2-14) \mathbb{d}(\pmb{F}(\pmb{X})\pmb{G}(\pmb{X})\pmb{H}(\pmb{X}))=\mathbb{d}(\pmb{F}(\pmb{X}))\pmb{G}(\pmb{X})\pmb{H}(\pmb{X}) + \pmb{F}(\pmb{X})\mathbb{d}(\pmb{G}(\pmb{X}))\pmb{H}(\pmb{X})+ \pmb{F}(\pmb{X})\pmb{G}(\pmb{X})\mathbb{d}\pmb{H}(\pmb{X}) \tag{2-14} d(FFF(XXX)GGG(XXX)HHH(XXX))=d(FFF(XXX))GGG(XXX)HHH(XXX)+FFF(XXX)d(GGG(XXX))HHH(XXX)+FFF(XXX)GGG(XXX)dHHH(XXX)(2-14)

        注意: 此时的微分是矩阵,不能交换乘积的左右顺序。

  • 转置法则
    矩阵转置的微分等于矩阵微分的转置,即:
    d ( X T ) = ( d X ) T (2-15) \mathbb{d}(\pmb{X}^T) = (\mathbb{d}\pmb{X})^T \tag{2-15} d(XXXT)=(dXXX)T(2-15)

2.3.2 常用公式

(1)常数矩阵与矩阵乘积的微分矩阵
d ( A X B ) = A d ( X ) B (2-16) \mathbb{d}(\pmb{A}\pmb{X}\pmb{B})=\pmb{A}\mathbb{d}(\pmb{X})\pmb{B} \tag{2-16} d(AAAXXXBBB)=AAAd(XXX)BBB(2-16)
         X m × n \pmb{X}_{m\times n} XXXm×n 可以代入其他任意的矩阵函数,如 d ( A F ( X ) B ) = A d ( F ( X ) ) B \mathbb{d}(\pmb{A}\pmb{F}(\pmb{X})\pmb{B})=\pmb{A}\mathbb{d}(\pmb{F}(\pmb{X}))\pmb{B} d(AAAFFF(XXX)BBB)=AAAd(FFF(XXX))BBB
(2)矩阵 X \pmb{X} XXX 的迹的矩阵微分 d ( t r ( X ) ) \mathbb{d}(tr(\pmb{X})) d(tr(XXX)) 等于矩阵微分 d X \mathbb{d}\pmb{X} dXXX 的迹 t r ( d X ) tr(d\pmb{X}) tr(dXXX),即
d ( t r ( X ) ) = t r ( d X ) (2-17) \mathbb{d}(tr(\pmb{X})) = tr(\mathbb{d}\pmb{X}) \tag{2-17} d(tr(XXX))=tr(dXXX)(2-17)
        特别地, X m × n \pmb{X}_{m\times n} XXXm×n 可以代入其他任意的矩阵函数,如 F ( X ) \pmb{F}(\pmb{X}) FFF(XXX) 的迹的矩阵微分为 d ( t r ( F ( X ) ) ) = t r ( d ( F ( X ) ) ) \mathbb{d}(tr(\pmb{F}(\pmb{X}))) = tr(\mathbb{d}(\pmb{F}(\pmb{X}))) d(tr(FFF(XXX)))=tr(d(FFF(XXX)))
(3)行列式
d ∣ X ∣ = ∣ X ∣ t r ( X − 1 d X ) = t r ( ∣ X ∣ X − 1 d X ) (2-18) \mathbb{d}|\pmb{X}|= |\pmb{X}|\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X}) = \mathbb{tr}(|\pmb{X}|\pmb{X}^{-1}\mathbb{d}\pmb{X}) \tag{2-18} dXXX=XXXtr(XXX1dXXX)=tr(XXXXXX1dXXX)(2-18)
证明:
        行列式可以按照一行展开,即一行中每个元素乘以他的代数余子式然后求和,我们按照元素 x i j x_{ij} xij 所在的第 i i i 行展开:
∣ X ∣ = x i 1 A i 1 + x i 2 A i 2 + ⋯ + x i n A i n (2-19) |\pmb{X}|=x_{i1}\pmb{A}_{i1}+x_{i2}\pmb{A}_{i2}+\cdots+x_{in}\pmb{A}_{in} \tag{2-19} XXX=xi1AAAi1+xi2AAAi2++xinAAAin(2-19)
        因此,行列式对元素 x i j x_{ij} xij 的偏导,即为该元素对应的代数余子式。
∂ ∣ X ∣ ∂ x i j = A i j (2-20) \frac{\partial |\pmb{X}|}{\partial x_{ij}} = \pmb{A}_{ij} \tag{2-20} xijXXX=AAAij(2-20)
        因此,行列式对矩阵求导的结果为:
∂ ∣ X ∣ ∂ X T = [ A 11 A 21 ⋯ A n 1 A 12 A 22 ⋯ A n 2 ⋮ ⋮ ⋱ ⋮ A 1 n A 2 n ⋯ A n n ] (2-21) \begin{aligned} \frac{\partial |\pmb{X}|}{\partial \pmb{X}^T} &= \begin{bmatrix} A_{11} & A_{21} & \cdots & A_{n1} \\ A_{12} & A_{22} & \cdots & A_{n2} \\ \vdots & \vdots & \ddots & \vdots \\ A_{1n} & A_{2n} & \cdots & A_{nn} \\ \end{bmatrix} \end{aligned} \tag{2-21} XXXTXXX=A11A12A1nA21A22A2nAn1An2Ann(2-21)
        这个结果其实就是伴随矩阵 X ∗ \pmb{X}^* XXX,由伴随矩阵和逆矩阵的关系
X − 1 = X ∗ ∣ X ∣ (2-22) \pmb{X}^{-1}=\frac{\pmb{X}^*}{|\pmb{X}|} \tag{2-22} XXX1=XXXXXX(2-22)
        代入式(2-10)可得:
d ∣ X ∣ = t r ( ∂ ∣ X ∣ ∂ X T d X ) = t r ( ∣ X ∣ X − 1 d X ) (2-23) \begin{aligned} \mathbb{d}|\pmb{X}| &=\mathbb{tr}(\frac{\partial |\pmb{X}|}{\partial\pmb{X}^T} \mathbb{d}\pmb{X}) \\\\ &=\mathbb{tr}(|\pmb{X}|\pmb{X}^{-1}\mathbb{d}\pmb{X}) \end{aligned} \tag{2-23} dXXX=tr(XXXTXXXdXXX)=tr(XXXXXX1dXXX)(2-23)
        又因为行列式是标量,由式(1-2),可以提到迹的外面,得:
d ∣ X ∣ = ∣ X ∣ t r ( X − 1 d X ) = t r ( ∣ X ∣ X − 1 d X ) (2-24) \mathbb{d}|\pmb{X}|= |\pmb{X}|\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X}) = \mathbb{tr}(|\pmb{X}|\pmb{X}^{-1}\mathbb{d}\pmb{X}) \tag{2-24} dXXX=XXXtr(XXX1dXXX)=tr(XXXXXX1dXXX)(2-24)

        特别地, X m × n \pmb{X}_{m\times n} XXXm×n 可以代入其他任意的矩阵函数,如 F ( X ) \pmb{F}(\pmb{X}) FFF(XXX) 的行列式的矩阵微分为 d ∣ F ( X ) ∣ = ∣ F ( X ) ∣ t r ( F ( X ) − 1 d F ( X ) ) = t r ( ∣ F ( X ) ∣ F ( X ) − 1 d F ( X ) ) \mathbb{d}|\pmb{F}(\pmb{X})|= |\pmb{F}(\pmb{X})|\mathbb{tr}(\pmb{F}(\pmb{X})^{-1}\mathbb{d}\pmb{F}(\pmb{X})) = \mathbb{tr}(|\pmb{F}(\pmb{X})|\pmb{F}(\pmb{X})^{-1}\mathbb{d}\pmb{F}(\pmb{X})) dFFF(XXX)=FFF(XXX)tr(FFF(XXX)1dFFF(XXX))=tr(FFF(XXX)FFF(XXX)1dFFF(XXX))

(4)逆矩阵
d ( X − 1 ) = − X − 1 d ( X ) X − 1 (2-25) \mathbb{d}(\pmb{X}^{-1})=-\pmb{X}^{-1}\mathbb{d}(\pmb{X})\pmb{X}^{-1} \tag{2-25} d(XXX1)=XXX1d(XXX)XXX1(2-25)
证明:
        因为
X X − 1 = I (2-26) \pmb{X}\pmb{X}^{-1}=\pmb{I} \tag{2-26} XXXXXX1=III(2-26)
        而常数矩阵微分为 O \pmb{O} OOO ,两边同时取矩阵微分得:
d ( X ) X − 1 + X d ( X − 1 ) = 0 (2-27) \mathbb{d}(\pmb{X})\pmb{X}^{-1}+\pmb{X}\mathbb{d}(\pmb{X}^{-1}) =\pmb{0} \tag{2-27} d(XXX)XXX1+XXXd(XXX1)=000(2-27)
        等式两边左乘 X − 1 \pmb{X}^{-1} XXX1 即得到结果。

        特别地, X m × n \pmb{X}_{m\times n} XXXm×n 可以代入其他任意的矩阵函数,如 F ( X ) \pmb{F}(\pmb{X}) FFF(XXX) 的逆的矩阵微分为 d ( F ( X ) − 1 ) = − F ( X ) − 1 d ( F ( X ) ) F ( X ) − 1 \mathbb{d}(\pmb{F}(\pmb{X})^{-1})=-\pmb{F}(\pmb{X})^{-1}\mathbb{d}(\pmb{F}(\pmb{X}))\pmb{F}(\pmb{X})^{-1} d(FFF(XXX)1)=FFF(XXX)1d(FFF(XXX))FFF(XXX)1
(5)矩阵函数的Kronecker积的微分矩阵为
d ( U ⊗ V ) = d ( U ) ⊗ V + U ⊗ d ( V ) (2-28) \mathbb{d}(\pmb{U} \otimes \pmb{V}) = \mathbb{d}(\pmb{U}) \otimes \pmb{V} + \pmb{U} \otimes \mathbb{d}(\pmb{V}) \tag{2-28} d(UUUVVV)=d(UUU)VVV+UUUd(VVV)(2-28)
(6)矩阵函数的Hadamard积(逐元素乘法)的微分矩阵为
d ( U ⊙ V ) = d ( U ) ⊙ V + U ⊙ d ( V ) (2-29) \mathbb{d}(\pmb{U} \odot \pmb{V})= \mathbb{d}(\pmb{U}) \odot \pmb{V} + \pmb{U} \odot \mathbb{d}(\pmb{V}) \tag{2-29} d(UUUVVV)=d(UUU)VVV+UUUd(VVV)(2-29)
        逐元素函数: σ ( X ) = [ σ ( x i j ) ] \sigma(\pmb{X}) = [\sigma(x_{ij})] σ(XXX)=[σ(xij)] 是逐元素标量函数运算,则 d σ ( X ) = σ ′ ( X ) ⊙ d X \mathbb{d}\sigma(\pmb{X}) = \sigma'(\pmb{X}) \odot \mathbb{d}\pmb{X} dσ(XXX)=σ(XXX)dXXX σ ′ ( X ) = [ σ ′ ( x i j ) ] \sigma'(\pmb{X})=[\sigma'(x_{ij})] σ(XXX)=[σ(xij)] 是逐元素求导数,如:
X = [ x 11 x 12 x 21 x 22 ] , d sin ⁡ ( X ) = [ cos ⁡ x 11 d x 11 cos ⁡ x 12 d x 12 cos ⁡ x 21 d x 21 cos ⁡ x 22 d x 22 ] = cos ⁡ ( X ) ⊙ d X (2-30) X=\left[\begin{matrix}x_{11} & x_{12} \\ x_{21} & x_{22}\end{matrix}\right], d \sin(\pmb{X}) = \left[\begin{matrix}\cos x_{11} dx_{11} & \cos x_{12} d x_{12}\\ \cos x_{21} d x_{21}& \cos x_{22} dx_{22}\end{matrix}\right] = \cos(\pmb{X})\odot d\pmb{X} \tag{2-30} X=[x11x21x12x22],dsin(XXX)=[cosx11dx11cosx21dx21cosx12dx12cosx22dx22]=cos(XXX)dXXX(2-30)
(7)复合函数
        假设有这样的依赖关系: X → Y → f \pmb{X}\to \pmb{Y} \to f XXXYYYf,在微积分中有标量求导的链式法则 ∂ f ∂ x = ∂ f ∂ y ∂ y ∂ x \frac{\partial f}{\partial x} = \frac{\partial f}{\partial y} \frac{\partial y}{\partial x} xf=yfxy,但这里我们不能随意沿用标量的链式法则,由于这里的自变量和因变量变成了矩阵,要考虑相容性。但我们直接从微分入手建立复合法则:先写出 d f ( X ) = t r ( ∂ f ∂ Y T d Y ) \begin{aligned} \mathbb{d}f(\pmb{X}) &=\mathbb{tr}(\frac{\partial f}{\partial\pmb{Y}^T} \mathbb{d}\pmb{Y})\end{aligned} df(XXX)=tr(YYYTfdYYY),再将 d Y d\pmb{Y} dYYY d X d\pmb{X} dXXX 表示出来代入,并使用迹函数技巧将其他项交换至 d X d\pmb{X} dXXX 左侧,即可得到 ∂ f ∂ X \dfrac{\partial f}{\partial \boldsymbol{X}} Xf

        补充: 在求解过程中,我们会用到几个概念,建议自行学习一下,分别是 Hadamard 积、Kronecker 积

        若标量函数 f f f 是矩阵 X \pmb{X} XXX 经加减乘法、逆、行列式、逐元素函数等运算构成,则使用相应的运算法则对 f f f 求微分,再使用迹技巧给 d f df df 套上迹并将其它项交换至 d X d\pmb{X} dXXX 左侧,对照导数与微分的联系 d f ( X ) = t r ( ∂ f ( X ) ∂ X T d X ) \begin{aligned} \mathbb{d}f(\pmb{X}) &=\mathbb{tr}(\frac{\partial f(\pmb{X})}{\partial\pmb{X}^T} \mathbb{d}\pmb{X})\end{aligned} df(XXX)=tr(XXXTf(XXX)dXXX),即能得到导数。特别地,若矩阵退化为向量,对照导数与微分的联系 d f ( x ) = t r ( ∂ f ( x ) ∂ x T d x ) \begin{aligned} \mathbb{d}f(\pmb{x}) &=\mathbb{tr}(\frac{\partial f(\pmb{x})}{\partial\pmb{x}^T} \mathbb{d}\pmb{x})\end{aligned} df(xxx)=tr(xxxTf(xxx)dxxx),即能得到导数

3 实战练习

3.1 基础题目

        上一篇,我们用定义法证明了: ∂ ( a T X X T b ) ∂ X = a b T X + b a T X \dfrac{\partial( \pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})}{\partial{\pmb{X}}} = \pmb{a}\pmb{b}^T\pmb{X}+\pmb{b}\pmb{a}^T\pmb{X} XXX(aaaTXXXXXXTbbb)=aaabbbTXXX+bbbaaaTXXX,下面我们用矩阵微分的方法进行证明。由于这是第一个案例,写的尽可能详细。
证明:
第一步:根据标量的迹(式2-1),写成迹函数的形式
d ( a T X X T b ) = t r ( d ( a T X X T b ) ) (3-1) \mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})= \mathbb{tr}(\mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b}))\tag{3-1} d(aaaTXXXXXXTbbb)=tr(d(aaaTXXXXXXTbbb))(3-1)
第二步:使用矩阵微分的运算法则,化简为迹函数微分矩阵的规范形式
        由常数矩阵与矩阵乘积的微分矩阵的关系(式2-16)可得:
d ( a T X X T b ) = t r ( d ( a T X X T b ) ) = t r ( a T d ( X X T ) b ) (3-2) \begin{aligned} \mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b}) &= \mathbb{tr}(\mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})) \\ &= \mathbb{tr}(\pmb{a}^T\mathbb{d}(\pmb{X}\pmb{X}^T)\pmb{b}) \end{aligned} \tag{3-2} d(aaaTXXXXXXTbbb)=tr(d(aaaTXXXXXXTbbb))=tr(aaaTd(XXXXXXT)bbb)(3-2)
        由矩阵微分的乘积法则(式2-13)可得:
d ( a T X X T b ) = t r ( a T d ( X X T ) b ) = t r [ a T ( d ( X ) X T + X d X T ) b ] (3-3) \begin{aligned} \mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b}) &= \mathbb{tr}(\pmb{a}^T\mathbb{d}(\pmb{X}\pmb{X}^T)\pmb{b}) \\ &= \mathbb{tr}[\pmb{a}^T(\mathbb{d}(\pmb{X})\pmb{X}^T+\pmb{X}\mathbb{d}\pmb{X}^T)\pmb{b}] \end{aligned} \tag{3-3} d(aaaTXXXXXXTbbb)=tr(aaaTd(XXXXXXT)bbb)=tr[aaaT(d(XXX)XXXT+XXXdXXXT)bbb](3-3)
        由矩阵的迹的线性法则(式1-3)可得:
d ( a T X X T b ) = t r [ a T ( d ( X ) X T + X d X T ) b ]   = t r ( a T d ( X ) X T b ) + t r ( a T X d ( X T ) b ) (3-4) \begin{aligned} \mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b}) &= \mathbb{tr}[\pmb{a}^T(\mathbb{d}(\pmb{X})\pmb{X}^T+\pmb{X}\mathbb{d}\pmb{X}^T)\pmb{b}] \\\ &= \mathbb{tr}(\pmb{a}^T\mathbb{d}(\pmb{X})\pmb{X}^T\pmb{b})+\mathbb{tr}(\pmb{a}^T\pmb{X}\mathbb{d}(\pmb{X}^T)\pmb{b}) \end{aligned} \tag{3-4} d(aaaTXXXXXXTbbb) =tr[aaaT(d(XXX)XXXT+XXXdXXXT)bbb]=tr(aaaTd(XXX)XXXTbbb)+tr(aaaTXXXd(XXXT)bbb)(3-4)
由矩阵微分的转置法则(式2-15)可得:
d ( a T X X T b ) = t r ( a T d ( X ) X T b ) + t r ( a T X d ( X T ) b ) = t r ( a T d ( X ) X T b ) + t r ( a T X ( d X ) T b ) (3-5) \begin{aligned} \mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b}) &= \mathbb{tr}(\pmb{a}^T\mathbb{d}(\pmb{X})\pmb{X}^T\pmb{b})+\mathbb{tr}(\pmb{a}^T\pmb{X}\mathbb{d}(\pmb{X}^T)\pmb{b}) \\ &= \mathbb{tr}(\pmb{a}^T\mathbb{d}(\pmb{X})\pmb{X}^T\pmb{b})+\mathbb{tr}(\pmb{a}^T\pmb{X}(\mathbb{d}\pmb{X})^T\pmb{b}) \end{aligned} \tag{3-5} d(aaaTXXXXXXTbbb)=tr(aaaTd(XXX)XXXTbbb)+tr(aaaTXXXd(XXXT)bbb)=tr(aaaTd(XXX)XXXTbbb)+tr(aaaTXXX(dXXX)Tbbb)(3-5)
        由矩阵的迹的交换律(式1-6)可得:
d ( a T X X T b ) = t r ( a T d ( X ) X T b ) + t r ( a T X ( d X ) T b ) = t r ( X T b a T d X ) + t r ( b a T X ( d X ) T ) = t r ( X T b a T d X ) + t r ( ( b a T X ) T d X ) = t r ( X T b a T d X ) + t r ( X T a b T d X ) (3-6) \begin{aligned} \mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b}) &= \mathbb{tr}(\pmb{a}^T\mathbb{d}(\pmb{X})\pmb{X}^T\pmb{b})+\mathbb{tr}(\pmb{a}^T\pmb{X}(\mathbb{d}\pmb{X})^T\pmb{b}) \\ &= \mathbb{tr}(\pmb{X}^T\pmb{b}\pmb{a}^T\mathbb{d}\pmb{X}) + \mathbb{tr}(\pmb{b}\pmb{a}^T\pmb{X}(\mathbb{d}\pmb{X})^T)\\ &= \mathbb{tr}(\pmb{X}^T\pmb{b}\pmb{a}^T\mathbb{d}\pmb{X}) + \mathbb{tr}((\pmb{b}\pmb{a}^T\pmb{X})^T\mathbb{d}\pmb{X})\\ &= \mathbb{tr}(\pmb{X}^T\pmb{b}\pmb{a}^T\mathbb{d}\pmb{X}) + \mathbb{tr}(\pmb{X}^T\pmb{a}\pmb{b}^T\mathbb{d}\pmb{X}) \end{aligned} \tag{3-6} d(aaaTXXXXXXTbbb)=tr(aaaTd(XXX)XXXTbbb)+tr(aaaTXXX(dXXX)Tbbb)=tr(XXXTbbbaaaTdXXX)+tr(bbbaaaTXXX(dXXX)T)=tr(XXXTbbbaaaTdXXX)+tr((bbbaaaTXXX)TdXXX)=tr(XXXTbbbaaaTdXXX)+tr(XXXTaaabbbTdXXX)(3-6)
        由矩阵的迹的线性法则(式1-3)可得:
d ( a T X X T b ) = t r ( X T b a T d X ) + t r ( X T a b T d X ) = t r ( ( X T b a T + X T a b T ) d X ) (3-7) \begin{aligned} \mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b}) &= \mathbb{tr}(\pmb{X}^T\pmb{b}\pmb{a}^T\mathbb{d}\pmb{X}) + \mathbb{tr}(\pmb{X}^T\pmb{a}\pmb{b}^T\mathbb{d}\pmb{X}) \\ &= \mathbb{tr}((\pmb{X}^T\pmb{b}\pmb{a}^T+\pmb{X}^T\pmb{a}\pmb{b}^T)\mathbb{d}\pmb{X}) \end{aligned} \tag{3-7} d(aaaTXXXXXXTbbb)=tr(XXXTbbbaaaTdXXX)+tr(XXXTaaabbbTdXXX)=tr((XXXTbbbaaaT+XXXTaaabbbT)dXXX)(3-7)
第三步:根据导数与微分的联系,写出最终结果
∂ ( a T X X T b ) ∂ X T = X T b a T + X T a b T ∂ ( a T X X T b ) ∂ X = a b T X + b a T X (3-8) \begin{aligned} \frac{\partial( \pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})}{\partial{\pmb{X}^T}} &=\pmb{X}^T\pmb{b}\pmb{a}^T+\pmb{X}^T\pmb{a}\pmb{b}^T \\ \frac{\partial( \pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})}{\partial{\pmb{X}}} &= \pmb{a}\pmb{b}^T\pmb{X}+\pmb{b}\pmb{a}^T\pmb{X} \\\\ \end{aligned} \tag{3-8} XXXT(aaaTXXXXXXTbbb)XXX(aaaTXXXXXXTbbb)=XXXTbbbaaaT+XXXTaaabbbT=aaabbbTXXX+bbbaaaTXXX(3-8)

3.2 矩阵的标量函数:迹

        例: f = tr ( Y T M Y ) , Y = σ ( W X ) f = \text{tr}(\boldsymbol{Y}^T \boldsymbol{MY}), \boldsymbol{Y} = \sigma(\boldsymbol{WX}) f=tr(YTMY),Y=σ(WX),求 ∂ f ∂ X \dfrac{\partial f}{\partial \pmb{X}} XXXf。其中 W \pmb{W} WWW l × m l \times m l×m 矩阵, X \pmb{X} XXX m × n m \times n m×n 矩阵, Y \pmb{Y} YYY l × n l \times n l×n 矩阵, M \pmb{M} MMM l × l l \times l l×l 对称矩阵, σ \sigma σ 是逐元素函数, f f f 是标量。
解:
第一步:先求 ∂ f ∂ Y \dfrac{\partial f}{\partial \pmb{Y}} YYYf,
d f = tr ( ( d Y ) T M Y ) + tr ( Y T M d Y ) = tr ( Y T M T d Y ) + tr ( Y T M d Y ) = tr ( Y T ( M + M T ) d Y ) (3-9) df = \text{tr}((d\boldsymbol{Y})^T\boldsymbol{MY}) + \text{tr}(\boldsymbol{Y}^T\boldsymbol{M}d\boldsymbol{Y}) = \text{tr}(\boldsymbol{Y}^T\boldsymbol{M}^Td\boldsymbol{Y}) + \text{tr}(\boldsymbol{Y}^T\boldsymbol{M}d\boldsymbol{Y}) = \text{tr}(\boldsymbol{Y}^T(\boldsymbol{M}+\boldsymbol{M}^T)d\boldsymbol{Y}) \tag{3-9} df=tr((dY)TMY)+tr(YTMdY)=tr(YTMTdY)+tr(YTMdY)=tr(YT(M+MT)dY)(3-9)
        根据导数与微分的联系,而 M \pmb{M} MMM l × l l \times l l×l 对称矩阵,可得:
∂ f ∂ Y = ( M + M T ) Y = 2 M Y (3-10) \frac{\partial f}{\partial \boldsymbol{Y}}=(\boldsymbol{M}+\boldsymbol{M}^T)\boldsymbol{Y} = 2\boldsymbol{MY} \tag{3-10} Yf=(M+MT)Y=2MY(3-10)
第二步:将 d Y d\boldsymbol{Y} dY d X d\boldsymbol{X} dX 表示出来代入,并使用矩阵乘法/逐元素乘法交换(式1-8),可得:
d f = tr ( ∂ f ∂ Y T ( σ ′ ( W X ) ⊙ ( W d X ) ) ) = tr ( ( ∂ f ∂ Y ⊙ σ ′ ( W X ) ) T W d X ) (3-11) df = \text{tr}\left(\frac{\partial f}{\partial \boldsymbol{Y}}^T (\sigma'(\boldsymbol{WX})\odot (\boldsymbol{W}d\boldsymbol{X}))\right) = \text{tr}\left(\left(\frac{\partial f}{\partial \boldsymbol{Y}} \odot \sigma'(\boldsymbol{WX})\right)^T \boldsymbol{W} d\boldsymbol{X}\right)\tag{3-11} df=tr(YfT(σ(WX)(WdX)))=tr((Yfσ(WX))TWdX)(3-11)
第三步:根据导数与微分的联系,可得:
∂ f ∂ X = W T ( ∂ f ∂ Y ⊙ σ ′ ( W X ) ) = W T ( ( 2 M σ ( W X ) ) ⊙ σ ′ ( W X ) ) (3-12) \frac{\partial f}{\partial \boldsymbol{X}}=\boldsymbol{W}^T \left(\frac{\partial f}{\partial \boldsymbol{Y}}\odot \sigma'(\boldsymbol{WX})\right)=\boldsymbol{W}^T((2\boldsymbol{M}\sigma(\boldsymbol{WX}))\odot\sigma'(\boldsymbol{WX}))\tag{3-12} Xf=WT(Yfσ(WX))=WT((2Mσ(WX))σ(WX))(3-12)
        下图汇总了几种典型的迹函数的微分矩阵与梯度矩阵的对应关系,为了省事的话话,可以查表。

图1 几种迹函数的微分矩阵与Jacobian矩阵

3.3 矩阵的标量函数:行列式

∂ ∣ X 3 ∣ ∂ X = ∂ ∣ X ∣ 3 ∂ X = 3 ∣ X ∣ 3 ( X − 1 ) T = 3 ∣ X 3 ∣ ( X − 1 ) T (3-13) \begin{aligned} \frac{\partial|\pmb{X}^3|}{\partial \pmb{X}} &=\frac{\partial|\pmb{X}|^3}{\partial \pmb{X}} =3|\pmb{X}|^3(\pmb{X}^{-1})^T = 3|\pmb{X}^3|(\pmb{X}^{-1})^T \end{aligned} \tag{3-13} XXXXXX3=XXXXXX3=3XXX3(XXX1)T=3XXX3(XXX1)T(3-13)
第一步:写成迹函数的形式
        对于 n n n 阶方阵 A , B \pmb{A}, \pmb{B} AAA,BBB,有 ∣ A B ∣ = ∣ A ∣ ∣ B ∣ |\pmb{A}\pmb{B}|=|\pmb{A}| |\pmb{B}| AAABBB=AAABBB,则
d ∣ X 3 ∣ = d ( ∣ X ∣ 3 ) = t r ( d ( ∣ X ∣ 3 ) )   (3-14) \begin{aligned} \mathbb{d}|\pmb{X}^3| =\mathbb{d}(|\pmb{X}|^3)= \mathbb{tr}(\mathbb{d}(|\pmb{X}|^3)) \end{aligned} \\\ \tag{3-14} dXXX3=d(XXX3)=tr(d(XXX3)) (3-14)
第二步:化简为迹函数微分矩阵的规范形式
        由于这里是一个复合函数的全微分,可令: z = ∣ X ∣ 3 , u = ∣ X ∣ z=|\pmb{X}|^3,u=|\pmb{X}| z=XXX3,u=XXX,则
d ( ∣ X ∣ 3 ) = t r ( d ( ∣ X ∣ 3 ) ) = t r ( d z ) = t r ( d ( u 3 ) ) = t r ( 3 u 2 d u ) = t r ( 3 ∣ X ∣ 2 d ∣ X ∣ ) (3-15) \begin{aligned} \mathbb{d}(|\pmb{X}|^3) &= \mathbb{tr}(\mathbb{d}(|\pmb{X}|^3)) \\ &= \mathbb{tr}(\mathbb{d}z) \\ &= \mathbb{tr}(\mathbb{d}(u^3)) \\ &= \mathbb{tr}(3u^2\mathbb{d}u) \\ &= \mathbb{tr}(3|\pmb{X}|^2\mathbb{d}|\pmb{X}|) \end{aligned} \tag{3-15} d(XXX3)=tr(d(XXX3))=tr(dz)=tr(d(u3))=tr(3u2du)=tr(3XXX2dXXX)(3-15)
        由矩阵行列式的微分可得:
d ( t r ( ∣ X ∣ 3 ) ) = t r ( 3 ∣ X ∣ 2 d ∣ X ∣ ) = t r ( 3 ∣ X ∣ 2 ∣ X ∣ t r ( X − 1 d X ) ) = t r ( 3 ∣ X ∣ 3 t r ( X − 1 d X ) ) (3-16 ) \begin{aligned} \mathbb{d}(\mathbb{tr}(|\pmb{X}|^3)) &= \mathbb{tr}(3|\pmb{X}|^2\mathbb{d}|\pmb{X}|) \\ &= \mathbb{tr}(3|\pmb{X}|^2|\pmb{X}|\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X}) ) \\ &= \mathbb{tr}(3|\pmb{X}|^3\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X}) ) \end{aligned} \tag{3-16 } d(tr(XXX3))=tr(3XXX2dXXX)=tr(3XXX2XXXtr(XXX1dXXX))=tr(3XXX3tr(XXX1dXXX))(3-16 )
        由矩阵的迹的线性法则(式1-3)可得:
d ( t r ( ∣ X ∣ 3 ) ) = t r ( 3 ∣ X ∣ 3 t r ( X − 1 d X ) ) = 3 ∣ X ∣ 3 t r ( X − 1 d X ) = t r ( 3 ∣ X 3 ∣ X − 1 d X ) (3-17 ) \begin{aligned} \mathbb{d}(\mathbb{tr}(|\pmb{X}|^3)) &= \mathbb{tr}(3|\pmb{X}|^3\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X}) ) \\ &= 3|\pmb{X}|^3\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X}) \\ &= \mathbb{tr}(3|\pmb{X}^3|\pmb{X}^{-1}\mathbb{d}\pmb{X}) \end{aligned} \tag{3-17 } d(tr(XXX3))=tr(3XXX3tr(XXX1dXXX))=3XXX3tr(XXX1dXXX)=tr(3XXX3XXX1dXXX)(3-17 )
第三步:根据导数与微分的联系,可得:
∂ ∣ X 3 ∣ ∂ X T = ∂ ∣ X ∣ 3 ∂ X T = 3 ∣ X ∣ 3 X − 1 = 3 ∣ X 3 ∣ X − 1 ∂ ∣ X 3 ∣ ∂ X = ∂ ∣ X ∣ 3 ∂ X = 3 ∣ X ∣ 3 ( X − 1 ) T = 3 ∣ X 3 ∣ ( X − 1 ) T (3-18) \begin{aligned} \frac{\partial|\pmb{X}^3|}{\partial \pmb{X}^T} &=\frac{\partial|\pmb{X}|^3}{\partial \pmb{X}^T} =3|\pmb{X}|^3\pmb{X}^{-1} = 3|\pmb{X}^3|\pmb{X}^{-1} \\ \frac{\partial|\pmb{X}^3|}{\partial \pmb{X}} &=\frac{\partial|\pmb{X}|^3}{\partial \pmb{X}} =3|\pmb{X}|^3(\pmb{X}^{-1})^T = 3|\pmb{X}^3|(\pmb{X}^{-1})^T \end{aligned} \tag{3-18} XXXTXXX3XXXXXX3=XXXTXXX3=3XXX3XXX1=3XXX3XXX1=XXXXXX3=3XXX3(XXX1)T=3XXX3(XXX1)T(3-18)

        下图汇总了一些典型的行列式函数的微分矩阵与梯度矩阵的对应关系,为了省事的话话,可以查表。

图2 几种行列式函数的实微分矩阵与Jacobian矩阵

        使用矩阵微分,可以在不对向量或矩阵中的某一元素单独求导再拼接,因此会比较方便,所以建议大家多找几道习题联系,争取熟练使用上面矩阵微分的性质,以及迹函数的性质。

参考

  • 0
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

长路漫漫2021

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值