1 矩阵的迹
1.1 定义
n
×
n
n \times n
n×n 的方阵
A
n
×
n
\pmb{A}_{n \times n}
AAAn×n 的主对角线元素之和就叫矩阵
A
\pmb{A}
AAA 的迹(trace),记作
t
r
(
A
)
\mathbb{tr}(\pmb{A})
tr(AAA) ,即:
t
r
(
A
)
=
a
11
+
a
22
+
⋯
+
a
n
n
=
∑
i
=
1
n
a
i
i
(1-1)
\mathbb{tr}(\pmb{A})=a_{11} + a_{22} + \cdots + a_{nn} = \sum_{i=1}^n{a_{ii}} \tag{1-1}
tr(AAA)=a11+a22+⋯+ann=i=1∑naii(1-1)
注:非方阵无迹的定义。
1.2 常用性质
1. 标量的迹
对于一个标量
x
x
x,可以看成是
1
×
1
1 \times 1
1×1 的矩阵,它的迹就是它自身。
x
=
t
r
(
x
)
(1-2)
x=\mathbb{tr}(x) \tag{1-2}
x=tr(x)(1-2)
2. 线性法则
相加再求迹等于求迹再相加,标量提外面。
t
r
(
c
1
A
+
c
2
B
)
=
c
1
t
r
(
A
)
+
c
2
t
r
(
B
)
(1-3)
\mathbb{tr}(c_1\pmb{A}+c_2\pmb{B}) = c_1\mathbb{tr}(\pmb{A})+c_2\mathbb{tr}(\pmb{B}) \tag{1-3}
tr(c1AAA+c2BBB)=c1tr(AAA)+c2tr(BBB)(1-3)
其中,
c
1
,
c
2
c_1, c_2
c1,c2 为标量。
3. 转置
转置的迹等于原矩阵的迹,因为转置不会改变主对角线的元素,所以可以得到:
t
r
(
A
)
=
t
r
(
A
T
)
(1-4)
\mathbb{tr}(\pmb{A})=\mathbb{tr}(\pmb{A}^T) \tag{1-4}
tr(AAA)=tr(AAAT)(1-4)
4. 乘积的迹的本质
对于两个阶数都是
m
×
n
m \times n
m×n 的矩阵
A
m
×
n
,
B
m
×
n
,
\pmb{A}_{m\times n},\pmb{B}_{m\times n},
AAAm×n,BBBm×n, 其中一个矩阵乘以(左乘右乘都可以)另一个矩阵的转置的迹,本质是
A
m
×
n
,
B
m
×
n
,
\pmb{A}_{m\times n},\pmb{B}_{m\times n},
AAAm×n,BBBm×n, 两个矩阵对应位置的元素相乘并相加,可以理解为向量的点积在矩阵上的推广,即:
t
r
(
A
B
T
)
=
a
11
b
11
+
a
12
b
12
+
⋯
+
a
1
n
b
1
n
+
a
21
b
21
+
a
22
b
22
+
⋯
+
a
2
n
b
2
n
+
⋯
+
a
m
1
b
m
1
+
a
m
2
b
m
2
+
⋯
+
a
m
n
b
m
n
(1-5)
\begin{aligned} \mathbb{tr}(\pmb{A}\pmb{B}^T) &= a_{11}b_{11}+a_{12}b_{12}+\cdots+a_{1n}b_{1n}\\ &+ a_{21}b_{21}+a_{22}b_{22}+\cdots+a_{2n}b_{2n}\\ &+ \cdots \\ &+ a_{m1}b_{m1}+a_{m2}b_{m2}+\cdots+a_{mn}b_{mn} \end{aligned} \tag{1-5}
tr(AAABBBT)=a11b11+a12b12+⋯+a1nb1n+a21b21+a22b22+⋯+a2nb2n+⋯+am1bm1+am2bm2+⋯+amnbmn(1-5)
5. 交换律
矩阵乘积位置互换,迹不变
t
r
(
A
B
)
=
t
r
(
B
A
)
(1-6)
\mathbb{tr}(\pmb{A}\pmb{B})= \mathbb{tr}(\pmb{B}\pmb{A}) \tag{1-6}
tr(AAABBB)=tr(BBBAAA)(1-6)
其中,
A
m
×
n
,
B
n
×
m
\pmb{A}_{m \times n},\pmb{B}_{n \times m}
AAAm×n,BBBn×m,等式两边都等于
∑
i
,
j
m
,
n
a
i
j
b
j
i
\sum_{i,j}^{m,n}a_{ij}b_{ji}
∑i,jm,naijbji
t
r
(
A
B
C
)
=
t
r
(
C
A
B
)
=
t
r
(
B
C
A
)
(1-7)
\mathbb{tr}(\pmb{A}\pmb{B}\pmb{C})=\mathbb{tr}(\pmb{C}\pmb{A}\pmb{B})=\mathbb{tr}(\pmb{B}\pmb{C}\pmb{A}) \tag{1-7}
tr(AAABBBCCC)=tr(CCCAAABBB)=tr(BBBCCCAAA)(1-7)
其中,
A
m
×
n
,
B
n
×
p
,
C
p
×
m
\pmb{A}_{m \times n},\pmb{B}_{n \times p},\pmb{C}_{p \times m}
AAAm×n,BBBn×p,CCCp×m
6. 矩阵乘法/逐元素乘法交换
tr
(
A
T
(
B
⊙
C
)
)
=
tr
(
(
A
⊙
B
)
T
C
)
(1-8)
\text{tr}(\pmb{A}^T(\pmb{B}\odot \pmb{C})) = \text{tr}((\pmb{A}\odot \pmb{B})^T\pmb{C})\tag{1-8}
tr(AAAT(BBB⊙CCC))=tr((AAA⊙BBB)TCCC)(1-8)
其中,
A
n
×
n
,
B
n
×
n
,
C
n
×
n
\pmb{A}_{n \times n},\pmb{B}_{n \times n},\pmb{C}_{n \times n}
AAAn×n,BBBn×n,CCCn×n,等式两边都等于
∑
i
,
j
n
,
n
a
i
j
b
i
j
c
i
j
\sum_{i,j}^{n,n}a_{ij}b_{ij}c_{ij}
∑i,jn,naijbijcij
2 矩阵微分
2.1 标量对向量的微分
设
f
(
x
)
,
x
=
[
x
1
,
x
2
,
⋯
,
x
n
]
T
f(\pmb{x}),\pmb{x}=[x_1,x_2,\cdots,x_n]^T
f(xxx),xxx=[x1,x2,⋯,xn]T,可以看做多元函数,设其可微,则它的全微分为:
d
f
(
x
)
=
∂
f
∂
x
1
d
x
1
+
∂
f
∂
x
2
d
x
2
+
⋯
+
∂
f
∂
x
n
d
x
n
=
(
∂
f
∂
x
1
,
∂
f
∂
x
2
,
⋯
,
∂
f
∂
x
n
)
[
d
x
1
d
x
2
⋮
d
x
n
]
(2-1)
\begin{aligned} \mathbb{d}f(\pmb{x}) &=\frac{\partial f}{\partial x_1}\mathbb{d}x_1+\frac{\partial f}{\partial x_2}\mathbb{d}x_2 + \cdots+\frac{\partial f}{\partial x_n}\mathbb{d}x_n\\\\ &= (\frac{\partial f}{\partial x_1},\frac{\partial f}{\partial x_2},\cdots,\frac{\partial f}{\partial x_n}) \begin{bmatrix} \mathbb{d}x_1 \\ \mathbb{d}x_2\\ \vdots \\ \mathbb{d}x_n \end{bmatrix} \end{aligned} \tag{2-1}
df(xxx)=∂x1∂fdx1+∂x2∂fdx2+⋯+∂xn∂fdxn=(∂x1∂f,∂x2∂f,⋯,∂xn∂f)⎣⎢⎢⎢⎡dx1dx2⋮dxn⎦⎥⎥⎥⎤(2-1)
结果是标量,由式(1-2)可知,式(2-1)可以写成迹的形式,即:
d
f
(
x
)
=
t
r
(
(
∂
f
∂
x
1
,
∂
f
∂
x
2
,
⋯
,
∂
f
∂
x
n
)
[
d
x
1
d
x
2
⋮
d
x
n
]
)
(2-2)
\begin{aligned} \mathbb{d}f(\pmb{x}) &=\mathbb{tr}((\frac{\partial f}{\partial x_1},\frac{\partial f}{\partial x_2},\cdots,\frac{\partial f}{\partial x_n}) \begin{bmatrix} \mathbb{d}x_1 \\ \mathbb{d}x_2\\ \vdots \\ \mathbb{d}x_n \end{bmatrix}) \end{aligned} \tag{2-2}
df(xxx)=tr((∂x1∂f,∂x2∂f,⋯,∂xn∂f)⎣⎢⎢⎢⎡dx1dx2⋮dxn⎦⎥⎥⎥⎤)(2-2)
简记为:
d
f
(
x
)
=
∂
f
(
x
)
∂
x
T
d
x
=
(
d
x
)
T
∂
f
(
x
)
∂
x
(2-3)
\mathbb{d}f(\pmb{x}) = \dfrac{\partial f(\pmb{x})}{{\partial\pmb{x}^T}}\mathbb{d}\pmb{x} = (\mathbb{d}\pmb{x})^T\dfrac{\partial f(\pmb{x})}{{\partial\pmb{x}}} \tag{2-3}
df(xxx)=∂xxxT∂f(xxx)dxxx=(dxxx)T∂xxx∂f(xxx)(2-3)
式中,
∂
f
(
x
)
∂
x
T
=
[
∂
f
∂
x
1
,
∂
f
∂
x
2
,
⋯
,
∂
f
∂
x
n
]
d
x
=
[
d
x
1
d
x
2
⋯
d
x
n
]
T
(2-4)
\dfrac{\partial f(\pmb{x})}{{\partial\pmb{x}^T}} = [\frac{\partial f}{\partial x_1},\frac{\partial f}{\partial x_2},\cdots,\frac{\partial f}{\partial x_n}]\\ \quad \\ \mathbb{d}\pmb{x} = [\mathbb{d}x_1 \quad \mathbb{d}x_2 \quad \cdots \quad \mathbb{d}x_n]^T \tag{2-4}
∂xxxT∂f(xxx)=[∂x1∂f,∂x2∂f,⋯,∂xn∂f]dxxx=[dx1dx2⋯dxn]T(2-4)
对于向量变元的实值标量函数的全微分,由式(1-5)的意义,则式(2-2)可以写成:
d
f
(
x
)
=
∂
f
(
x
)
∂
x
T
d
x
=
t
r
(
∂
f
(
x
)
∂
x
T
d
x
)
(2-5)
\begin{aligned} \mathbb{d}f(\pmb{x}) &= \dfrac{\partial f(\pmb{x})}{{\partial\pmb{x}^T}}\mathbb{d}\pmb{x} =\mathbb{tr}(\frac{\partial f(\pmb{x})}{\partial\pmb{x}^T} \mathbb{d}\pmb{x})\end{aligned} \tag{2-5}
df(xxx)=∂xxxT∂f(xxx)dxxx=tr(∂xxxT∂f(xxx)dxxx)(2-5)
因此,通过矩阵微分可以得到Jacobian
矩阵和梯度矩阵,即
d
f
(
x
)
=
t
r
(
∂
f
(
x
)
∂
x
T
d
x
)
⟺
D
x
f
(
x
)
=
∂
f
(
x
)
∂
x
T
=
(
∇
x
f
(
x
)
)
T
(2-6)
\mathbb{d}f(\pmb{x}) = \mathbb{tr}(\dfrac{\partial f(\pmb{x})}{\partial\pmb{x}^T} \mathbb{d}\pmb{x}) \iff \text{D}_{\boldsymbol{x}}f(\pmb{x}) = \dfrac{\partial f(\pmb{x})}{\partial\pmb{x}^T} = (\nabla_{\boldsymbol{x}}f(\pmb{x}))^T \tag{2-6}
df(xxx)=tr(∂xxxT∂f(xxx)dxxx)⟺Dxf(xxx)=∂xxxT∂f(xxx)=(∇xf(xxx))T(2-6)
2.2 标量对矩阵的微分
设
f
(
X
)
,
X
m
×
n
=
(
x
i
j
)
i
=
1
,
j
=
1
m
,
n
f(\pmb{X}),\pmb{X}_{m\times n}=(x_{ij})_{i=1,j=1}^{m,n}
f(XXX),XXXm×n=(xij)i=1,j=1m,n,它也是多元函数,设其可微,则它的全微分为:
d
f
(
X
)
=
∂
f
∂
x
11
d
x
11
+
∂
f
∂
x
12
d
x
12
+
⋯
+
∂
f
∂
x
1
n
d
x
1
n
+
∂
f
∂
x
21
d
x
21
+
∂
f
∂
x
22
d
x
22
+
⋯
+
∂
f
∂
x
2
n
d
x
2
n
+
⋯
+
∂
f
∂
x
m
1
d
x
m
1
+
∂
f
∂
x
m
2
d
x
m
2
+
⋯
+
∂
f
∂
x
m
n
d
x
m
n
(2-7)
\begin{aligned} \mathbb{d}f(\pmb{X}) &=\frac{\partial f}{\partial x_{11}}\mathbb{d}x_{11}+\frac{\partial f}{\partial x_{12}}\mathbb{d}x_{12} + \cdots+\frac{\partial f}{\partial x_{1n}}\mathbb{d}x_{1n}\\ &+\frac{\partial f}{\partial x_{21}}\mathbb{d}x_{21}+\frac{\partial f}{\partial x_{22}}\mathbb{d}x_{22} + \cdots+\frac{\partial f}{\partial x_{2n}}\mathbb{d}x_{2n}\\ &+\cdots\\ &+\frac{\partial f}{\partial x_{m1}}\mathbb{d}x_{m1}+\frac{\partial f}{\partial x_{m2}}\mathbb{d}x_{m2} + \cdots+\frac{\partial f}{\partial x_{mn}}\mathbb{d}x_{mn} \end{aligned} \tag{2-7}
df(XXX)=∂x11∂fdx11+∂x12∂fdx12+⋯+∂x1n∂fdx1n+∂x21∂fdx21+∂x22∂fdx22+⋯+∂x2n∂fdx2n+⋯+∂xm1∂fdxm1+∂xm2∂fdxm2+⋯+∂xmn∂fdxmn(2-7)
我们从这个结果中发现,它其实就是矩阵
(
∂
f
∂
x
i
j
)
i
=
1
,
j
=
1
m
,
n
(\frac{\partial f}{\partial x_{ij}})_{i=1,j=1}^{m,n}
(∂xij∂f)i=1,j=1m,n 与矩阵
(
d
x
i
j
)
i
=
1
,
j
=
1
m
,
n
(\mathbb{d}x_{ij})_{i=1,j=1}^{m,n}
(dxij)i=1,j=1m,n 对应位置的元素相乘并相加,由式(1-5)可知,式(2-7)也可以写成迹的形式,即:
d
f
(
X
)
=
t
r
(
[
∂
f
∂
x
11
∂
f
∂
x
21
⋯
∂
f
∂
x
m
1
∂
f
∂
x
12
∂
f
∂
x
22
⋯
∂
f
∂
x
m
2
⋮
⋮
⋮
⋮
∂
f
∂
x
1
n
∂
f
∂
x
2
n
⋯
∂
f
∂
x
m
n
]
n
×
m
[
d
x
11
d
x
12
⋯
d
x
1
n
d
x
21
d
x
22
⋯
d
x
2
n
⋮
⋮
⋮
⋮
d
x
m
1
d
x
m
2
⋯
d
x
m
n
]
m
×
n
)
(2-8)
\begin{aligned} \mathbb{d}f(\pmb{X}) &=\mathbb{tr}( \begin{bmatrix} \frac{\partial f}{\partial x_{11}}&\frac{\partial f}{\partial x_{21}}&\cdots&\frac{\partial f}{\partial x_{m1}} \\ \frac{\partial f}{\partial x_{12}}&\frac{\partial f}{\partial x_{22}}& \cdots & \frac{\partial f}{\partial x_{m2}}\\ \vdots&\vdots&\vdots&\vdots\\ \frac{\partial f} {\partial x_{1n}}&\frac{\partial f}{\partial x_{2n}}&\cdots&\frac{\partial f}{\partial x_{mn}} \end{bmatrix}_{n\times m} \begin{bmatrix} \mathbb{d}x_{11} & \mathbb{d}x_{12} & \cdots & \mathbb{d}x_{1n} \\ \mathbb{d}x_{21} & \mathbb{d}x_{22} & \cdots & \mathbb{d}x_{2n} \\ \vdots&\vdots&\vdots&\vdots\\ \mathbb{d}x_{m1} & \mathbb{d}x_{m2} & \cdots & \mathbb{d}x_{mn} \end{bmatrix}_{m \times n} ) \end{aligned} \tag{2-8}
df(XXX)=tr(⎣⎢⎢⎢⎢⎡∂x11∂f∂x12∂f⋮∂x1n∂f∂x21∂f∂x22∂f⋮∂x2n∂f⋯⋯⋮⋯∂xm1∂f∂xm2∂f⋮∂xmn∂f⎦⎥⎥⎥⎥⎤n×m⎣⎢⎢⎢⎡dx11dx21⋮dxm1dx12dx22⋮dxm2⋯⋯⋮⋯dx1ndx2n⋮dxmn⎦⎥⎥⎥⎤m×n)(2-8)
观察上面的结果,可以看到在
t
r
(
)
tr()
tr() 里,左边的矩阵其实就是矩阵变元的Jacobian 矩阵形式
D
X
f
(
X
)
=
∂
f
(
X
)
∂
X
m
×
n
T
\text{D}_{\boldsymbol{X}}f(\pmb{X}) = \frac{\partial f(\boldsymbol{X})}{\partial \boldsymbol{X}^T_{m\times n}}
DXf(XXX)=∂Xm×nT∂f(X),而右边的矩阵就是
d
X
m
×
n
\mathbb{d}\pmb{X}_{m \times n}
dXXXm×n,所以式(2-8)可以写成:
d
f
(
X
)
=
t
r
(
∂
f
(
X
)
∂
X
T
d
X
)
(2-9)
\begin{aligned} \mathbb{d}f(\pmb{X}) &=\mathbb{tr}(\frac{\partial f(\pmb{X})}{\partial\pmb{X}^T} \mathbb{d}\pmb{X})\end{aligned} \tag{2-9}
df(XXX)=tr(∂XXXT∂f(XXX)dXXX)(2-9)
因此,通过矩阵微分可以得到Jacobian
矩阵和梯度矩阵,即
d
f
(
X
)
=
t
r
(
∂
f
(
X
)
∂
X
T
d
X
)
⟺
D
X
f
(
X
)
=
∂
f
(
X
)
∂
X
T
=
(
∇
X
f
(
X
)
)
T
(2-10)
\mathbb{d}f(\pmb{X}) = \mathbb{tr}(\dfrac{\partial f(\pmb{X})}{\partial\pmb{X}^T} \mathbb{d}\pmb{X}) \iff \text{D}_{\boldsymbol{X}}f(\pmb{X}) = \dfrac{\partial f(\pmb{X})}{\partial\pmb{X}^T} = (\nabla_{\boldsymbol{X}}f(\pmb{X}))^T \tag{2-10}
df(XXX)=tr(∂XXXT∂f(XXX)dXXX)⟺DXf(XXX)=∂XXXT∂f(XXX)=(∇Xf(XXX))T(2-10)
所以,只要我们可以把一个矩阵变元的实值标量函数的全微分写成式(2-9),我们就找到了矩阵求导的结果。(已经有人证明,这样的结果是唯一的。即若 d f ( X ) = t r ( A 1 d X ) = t r ( A 2 d X ) \mathbb{d}f(\pmb{X}) =\mathbb{tr}(\pmb{A}_1\mathbb{d}\pmb{X}) = \mathbb{tr}(\pmb{A}_2\mathbb{d}\pmb{X}) df(XXX)=tr(AAA1dXXX)=tr(AAA2dXXX) ,则 A 1 = A 2 \pmb{A}_1=\pmb{A}_2 AAA1=AAA2 )
2.3 常用性质
2.3.1 四个法则
- 常数矩阵的矩阵微分
d A m × n = 0 m × n (2-11) \mathbb{d}\pmb{A}_{m \times n} = \pmb{0}_{m \times n} \tag{2-11} dAAAm×n=000m×n(2-11) - 线性法则
d ( c 1 F ( X ) + c 2 G ( X ) ) = c 1 d F ( X ) + c 2 d G ( X ) ( c 1 , c 2 为 常 数 ) (2-12) \mathbb{d}(c_1\pmb{F}(\pmb{X})+c_2\pmb{G}(\pmb{X})) = c_1\mathbb{d}\pmb{F}(\pmb{X})+c_2\mathbb{d}\pmb{G}(\pmb{X})(c_1, c_2 为常数)\tag{2-12} d(c1FFF(XXX)+c2GGG(XXX))=c1dFFF(XXX)+c2dGGG(XXX)(c1,c2为常数)(2-12) - 乘积法则
d ( F ( X ) G ( X ) ) = d ( F ( X ) ) G ( X ) + F ( X ) d G ( X ) ( F p × q ( X ) , G q × s ( X ) ) (2-13) \mathbb{d}(\pmb{F}(\pmb{X})\pmb{G}(\pmb{X}))=\mathbb{d}(\pmb{F}(\pmb{X}))\pmb{G}(\pmb{X}) + \pmb{F}(\pmb{X})\mathbb{d}\pmb{G}(\pmb{X})(\pmb{F}_{p \times q}(\pmb{X}),\pmb{G}_{q \times s}(\pmb{X}))\tag{2-13} d(FFF(XXX)GGG(XXX))=d(FFF(XXX))GGG(XXX)+FFF(XXX)dGGG(XXX)(FFFp×q(XXX),GGGq×s(XXX))(2-13)
更多个乘积的法则:
d ( F ( X ) G ( X ) H ( X ) ) = d ( F ( X ) ) G ( X ) H ( X ) + F ( X ) d ( G ( X ) ) H ( X ) + F ( X ) G ( X ) d H ( X ) (2-14) \mathbb{d}(\pmb{F}(\pmb{X})\pmb{G}(\pmb{X})\pmb{H}(\pmb{X}))=\mathbb{d}(\pmb{F}(\pmb{X}))\pmb{G}(\pmb{X})\pmb{H}(\pmb{X}) + \pmb{F}(\pmb{X})\mathbb{d}(\pmb{G}(\pmb{X}))\pmb{H}(\pmb{X})+ \pmb{F}(\pmb{X})\pmb{G}(\pmb{X})\mathbb{d}\pmb{H}(\pmb{X}) \tag{2-14} d(FFF(XXX)GGG(XXX)HHH(XXX))=d(FFF(XXX))GGG(XXX)HHH(XXX)+FFF(XXX)d(GGG(XXX))HHH(XXX)+FFF(XXX)GGG(XXX)dHHH(XXX)(2-14)
注意: 此时的微分是矩阵,不能交换乘积的左右顺序。
- 转置法则
矩阵转置的微分等于矩阵微分的转置,即:
d ( X T ) = ( d X ) T (2-15) \mathbb{d}(\pmb{X}^T) = (\mathbb{d}\pmb{X})^T \tag{2-15} d(XXXT)=(dXXX)T(2-15)
2.3.2 常用公式
(1)常数矩阵与矩阵乘积的微分矩阵
d
(
A
X
B
)
=
A
d
(
X
)
B
(2-16)
\mathbb{d}(\pmb{A}\pmb{X}\pmb{B})=\pmb{A}\mathbb{d}(\pmb{X})\pmb{B} \tag{2-16}
d(AAAXXXBBB)=AAAd(XXX)BBB(2-16)
X
m
×
n
\pmb{X}_{m\times n}
XXXm×n 可以代入其他任意的矩阵函数,如
d
(
A
F
(
X
)
B
)
=
A
d
(
F
(
X
)
)
B
\mathbb{d}(\pmb{A}\pmb{F}(\pmb{X})\pmb{B})=\pmb{A}\mathbb{d}(\pmb{F}(\pmb{X}))\pmb{B}
d(AAAFFF(XXX)BBB)=AAAd(FFF(XXX))BBB。
(2)矩阵
X
\pmb{X}
XXX 的迹的矩阵微分
d
(
t
r
(
X
)
)
\mathbb{d}(tr(\pmb{X}))
d(tr(XXX)) 等于矩阵微分
d
X
\mathbb{d}\pmb{X}
dXXX 的迹
t
r
(
d
X
)
tr(d\pmb{X})
tr(dXXX),即
d
(
t
r
(
X
)
)
=
t
r
(
d
X
)
(2-17)
\mathbb{d}(tr(\pmb{X})) = tr(\mathbb{d}\pmb{X}) \tag{2-17}
d(tr(XXX))=tr(dXXX)(2-17)
特别地,
X
m
×
n
\pmb{X}_{m\times n}
XXXm×n 可以代入其他任意的矩阵函数,如
F
(
X
)
\pmb{F}(\pmb{X})
FFF(XXX) 的迹的矩阵微分为
d
(
t
r
(
F
(
X
)
)
)
=
t
r
(
d
(
F
(
X
)
)
)
\mathbb{d}(tr(\pmb{F}(\pmb{X}))) = tr(\mathbb{d}(\pmb{F}(\pmb{X})))
d(tr(FFF(XXX)))=tr(d(FFF(XXX)))。
(3)行列式
d
∣
X
∣
=
∣
X
∣
t
r
(
X
−
1
d
X
)
=
t
r
(
∣
X
∣
X
−
1
d
X
)
(2-18)
\mathbb{d}|\pmb{X}|= |\pmb{X}|\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X}) = \mathbb{tr}(|\pmb{X}|\pmb{X}^{-1}\mathbb{d}\pmb{X}) \tag{2-18}
d∣XXX∣=∣XXX∣tr(XXX−1dXXX)=tr(∣XXX∣XXX−1dXXX)(2-18)
证明:
行列式可以按照一行展开,即一行中每个元素乘以他的代数余子式然后求和,我们按照元素
x
i
j
x_{ij}
xij 所在的第
i
i
i 行展开:
∣
X
∣
=
x
i
1
A
i
1
+
x
i
2
A
i
2
+
⋯
+
x
i
n
A
i
n
(2-19)
|\pmb{X}|=x_{i1}\pmb{A}_{i1}+x_{i2}\pmb{A}_{i2}+\cdots+x_{in}\pmb{A}_{in} \tag{2-19}
∣XXX∣=xi1AAAi1+xi2AAAi2+⋯+xinAAAin(2-19)
因此,行列式对元素
x
i
j
x_{ij}
xij 的偏导,即为该元素对应的代数余子式。
∂
∣
X
∣
∂
x
i
j
=
A
i
j
(2-20)
\frac{\partial |\pmb{X}|}{\partial x_{ij}} = \pmb{A}_{ij} \tag{2-20}
∂xij∂∣XXX∣=AAAij(2-20)
因此,行列式对矩阵求导的结果为:
∂
∣
X
∣
∂
X
T
=
[
A
11
A
21
⋯
A
n
1
A
12
A
22
⋯
A
n
2
⋮
⋮
⋱
⋮
A
1
n
A
2
n
⋯
A
n
n
]
(2-21)
\begin{aligned} \frac{\partial |\pmb{X}|}{\partial \pmb{X}^T} &= \begin{bmatrix} A_{11} & A_{21} & \cdots & A_{n1} \\ A_{12} & A_{22} & \cdots & A_{n2} \\ \vdots & \vdots & \ddots & \vdots \\ A_{1n} & A_{2n} & \cdots & A_{nn} \\ \end{bmatrix} \end{aligned} \tag{2-21}
∂XXXT∂∣XXX∣=⎣⎢⎢⎢⎡A11A12⋮A1nA21A22⋮A2n⋯⋯⋱⋯An1An2⋮Ann⎦⎥⎥⎥⎤(2-21)
这个结果其实就是伴随矩阵
X
∗
\pmb{X}^*
XXX∗,由伴随矩阵和逆矩阵的关系
X
−
1
=
X
∗
∣
X
∣
(2-22)
\pmb{X}^{-1}=\frac{\pmb{X}^*}{|\pmb{X}|} \tag{2-22}
XXX−1=∣XXX∣XXX∗(2-22)
代入式(2-10)可得:
d
∣
X
∣
=
t
r
(
∂
∣
X
∣
∂
X
T
d
X
)
=
t
r
(
∣
X
∣
X
−
1
d
X
)
(2-23)
\begin{aligned} \mathbb{d}|\pmb{X}| &=\mathbb{tr}(\frac{\partial |\pmb{X}|}{\partial\pmb{X}^T} \mathbb{d}\pmb{X}) \\\\ &=\mathbb{tr}(|\pmb{X}|\pmb{X}^{-1}\mathbb{d}\pmb{X}) \end{aligned} \tag{2-23}
d∣XXX∣=tr(∂XXXT∂∣XXX∣dXXX)=tr(∣XXX∣XXX−1dXXX)(2-23)
又因为行列式是标量,由式(1-2),可以提到迹的外面,得:
d
∣
X
∣
=
∣
X
∣
t
r
(
X
−
1
d
X
)
=
t
r
(
∣
X
∣
X
−
1
d
X
)
(2-24)
\mathbb{d}|\pmb{X}|= |\pmb{X}|\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X}) = \mathbb{tr}(|\pmb{X}|\pmb{X}^{-1}\mathbb{d}\pmb{X}) \tag{2-24}
d∣XXX∣=∣XXX∣tr(XXX−1dXXX)=tr(∣XXX∣XXX−1dXXX)(2-24)
特别地, X m × n \pmb{X}_{m\times n} XXXm×n 可以代入其他任意的矩阵函数,如 F ( X ) \pmb{F}(\pmb{X}) FFF(XXX) 的行列式的矩阵微分为 d ∣ F ( X ) ∣ = ∣ F ( X ) ∣ t r ( F ( X ) − 1 d F ( X ) ) = t r ( ∣ F ( X ) ∣ F ( X ) − 1 d F ( X ) ) \mathbb{d}|\pmb{F}(\pmb{X})|= |\pmb{F}(\pmb{X})|\mathbb{tr}(\pmb{F}(\pmb{X})^{-1}\mathbb{d}\pmb{F}(\pmb{X})) = \mathbb{tr}(|\pmb{F}(\pmb{X})|\pmb{F}(\pmb{X})^{-1}\mathbb{d}\pmb{F}(\pmb{X})) d∣FFF(XXX)∣=∣FFF(XXX)∣tr(FFF(XXX)−1dFFF(XXX))=tr(∣FFF(XXX)∣FFF(XXX)−1dFFF(XXX))。
(4)逆矩阵
d
(
X
−
1
)
=
−
X
−
1
d
(
X
)
X
−
1
(2-25)
\mathbb{d}(\pmb{X}^{-1})=-\pmb{X}^{-1}\mathbb{d}(\pmb{X})\pmb{X}^{-1} \tag{2-25}
d(XXX−1)=−XXX−1d(XXX)XXX−1(2-25)
证明:
因为
X
X
−
1
=
I
(2-26)
\pmb{X}\pmb{X}^{-1}=\pmb{I} \tag{2-26}
XXXXXX−1=III(2-26)
而常数矩阵微分为
O
\pmb{O}
OOO ,两边同时取矩阵微分得:
d
(
X
)
X
−
1
+
X
d
(
X
−
1
)
=
0
(2-27)
\mathbb{d}(\pmb{X})\pmb{X}^{-1}+\pmb{X}\mathbb{d}(\pmb{X}^{-1}) =\pmb{0} \tag{2-27}
d(XXX)XXX−1+XXXd(XXX−1)=000(2-27)
等式两边左乘
X
−
1
\pmb{X}^{-1}
XXX−1 即得到结果。
特别地,
X
m
×
n
\pmb{X}_{m\times n}
XXXm×n 可以代入其他任意的矩阵函数,如
F
(
X
)
\pmb{F}(\pmb{X})
FFF(XXX) 的逆的矩阵微分为
d
(
F
(
X
)
−
1
)
=
−
F
(
X
)
−
1
d
(
F
(
X
)
)
F
(
X
)
−
1
\mathbb{d}(\pmb{F}(\pmb{X})^{-1})=-\pmb{F}(\pmb{X})^{-1}\mathbb{d}(\pmb{F}(\pmb{X}))\pmb{F}(\pmb{X})^{-1}
d(FFF(XXX)−1)=−FFF(XXX)−1d(FFF(XXX))FFF(XXX)−1。
(5)矩阵函数的Kronecker
积的微分矩阵为
d
(
U
⊗
V
)
=
d
(
U
)
⊗
V
+
U
⊗
d
(
V
)
(2-28)
\mathbb{d}(\pmb{U} \otimes \pmb{V}) = \mathbb{d}(\pmb{U}) \otimes \pmb{V} + \pmb{U} \otimes \mathbb{d}(\pmb{V}) \tag{2-28}
d(UUU⊗VVV)=d(UUU)⊗VVV+UUU⊗d(VVV)(2-28)
(6)矩阵函数的Hadamard
积(逐元素乘法)的微分矩阵为
d
(
U
⊙
V
)
=
d
(
U
)
⊙
V
+
U
⊙
d
(
V
)
(2-29)
\mathbb{d}(\pmb{U} \odot \pmb{V})= \mathbb{d}(\pmb{U}) \odot \pmb{V} + \pmb{U} \odot \mathbb{d}(\pmb{V}) \tag{2-29}
d(UUU⊙VVV)=d(UUU)⊙VVV+UUU⊙d(VVV)(2-29)
逐元素函数:
σ
(
X
)
=
[
σ
(
x
i
j
)
]
\sigma(\pmb{X}) = [\sigma(x_{ij})]
σ(XXX)=[σ(xij)] 是逐元素标量函数运算,则
d
σ
(
X
)
=
σ
′
(
X
)
⊙
d
X
\mathbb{d}\sigma(\pmb{X}) = \sigma'(\pmb{X}) \odot \mathbb{d}\pmb{X}
dσ(XXX)=σ′(XXX)⊙dXXX,
σ
′
(
X
)
=
[
σ
′
(
x
i
j
)
]
\sigma'(\pmb{X})=[\sigma'(x_{ij})]
σ′(XXX)=[σ′(xij)] 是逐元素求导数,如:
X
=
[
x
11
x
12
x
21
x
22
]
,
d
sin
(
X
)
=
[
cos
x
11
d
x
11
cos
x
12
d
x
12
cos
x
21
d
x
21
cos
x
22
d
x
22
]
=
cos
(
X
)
⊙
d
X
(2-30)
X=\left[\begin{matrix}x_{11} & x_{12} \\ x_{21} & x_{22}\end{matrix}\right], d \sin(\pmb{X}) = \left[\begin{matrix}\cos x_{11} dx_{11} & \cos x_{12} d x_{12}\\ \cos x_{21} d x_{21}& \cos x_{22} dx_{22}\end{matrix}\right] = \cos(\pmb{X})\odot d\pmb{X} \tag{2-30}
X=[x11x21x12x22],dsin(XXX)=[cosx11dx11cosx21dx21cosx12dx12cosx22dx22]=cos(XXX)⊙dXXX(2-30)
(7)复合函数
假设有这样的依赖关系:
X
→
Y
→
f
\pmb{X}\to \pmb{Y} \to f
XXX→YYY→f,在微积分中有标量求导的链式法则
∂
f
∂
x
=
∂
f
∂
y
∂
y
∂
x
\frac{\partial f}{\partial x} = \frac{\partial f}{\partial y} \frac{\partial y}{\partial x}
∂x∂f=∂y∂f∂x∂y,但这里我们不能随意沿用标量的链式法则,由于这里的自变量和因变量变成了矩阵,要考虑相容性。但我们直接从微分入手建立复合法则:先写出
d
f
(
X
)
=
t
r
(
∂
f
∂
Y
T
d
Y
)
\begin{aligned} \mathbb{d}f(\pmb{X}) &=\mathbb{tr}(\frac{\partial f}{\partial\pmb{Y}^T} \mathbb{d}\pmb{Y})\end{aligned}
df(XXX)=tr(∂YYYT∂fdYYY),再将
d
Y
d\pmb{Y}
dYYY 用
d
X
d\pmb{X}
dXXX 表示出来代入,并使用迹函数技巧将其他项交换至
d
X
d\pmb{X}
dXXX 左侧,即可得到
∂
f
∂
X
\dfrac{\partial f}{\partial \boldsymbol{X}}
∂X∂f。
补充: 在求解过程中,我们会用到几个概念,建议自行学习一下,分别是 Hadamard 积、Kronecker 积。
若标量函数 f f f 是矩阵 X \pmb{X} XXX 经加减乘法、逆、行列式、逐元素函数等运算构成,则使用相应的运算法则对 f f f 求微分,再使用迹技巧给 d f df df 套上迹并将其它项交换至 d X d\pmb{X} dXXX 左侧,对照导数与微分的联系 d f ( X ) = t r ( ∂ f ( X ) ∂ X T d X ) \begin{aligned} \mathbb{d}f(\pmb{X}) &=\mathbb{tr}(\frac{\partial f(\pmb{X})}{\partial\pmb{X}^T} \mathbb{d}\pmb{X})\end{aligned} df(XXX)=tr(∂XXXT∂f(XXX)dXXX),即能得到导数。特别地,若矩阵退化为向量,对照导数与微分的联系 d f ( x ) = t r ( ∂ f ( x ) ∂ x T d x ) \begin{aligned} \mathbb{d}f(\pmb{x}) &=\mathbb{tr}(\frac{\partial f(\pmb{x})}{\partial\pmb{x}^T} \mathbb{d}\pmb{x})\end{aligned} df(xxx)=tr(∂xxxT∂f(xxx)dxxx),即能得到导数
3 实战练习
3.1 基础题目
上一篇,我们用定义法证明了:
∂
(
a
T
X
X
T
b
)
∂
X
=
a
b
T
X
+
b
a
T
X
\dfrac{\partial( \pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})}{\partial{\pmb{X}}} = \pmb{a}\pmb{b}^T\pmb{X}+\pmb{b}\pmb{a}^T\pmb{X}
∂XXX∂(aaaTXXXXXXTbbb)=aaabbbTXXX+bbbaaaTXXX,下面我们用矩阵微分的方法进行证明。由于这是第一个案例,写的尽可能详细。
证明:
第一步:根据标量的迹(式2-1),写成迹函数的形式
d
(
a
T
X
X
T
b
)
=
t
r
(
d
(
a
T
X
X
T
b
)
)
(3-1)
\mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})= \mathbb{tr}(\mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b}))\tag{3-1}
d(aaaTXXXXXXTbbb)=tr(d(aaaTXXXXXXTbbb))(3-1)
第二步:使用矩阵微分的运算法则,化简为迹函数微分矩阵的规范形式
由常数矩阵与矩阵乘积的微分矩阵的关系(式2-16)可得:
d
(
a
T
X
X
T
b
)
=
t
r
(
d
(
a
T
X
X
T
b
)
)
=
t
r
(
a
T
d
(
X
X
T
)
b
)
(3-2)
\begin{aligned} \mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b}) &= \mathbb{tr}(\mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})) \\ &= \mathbb{tr}(\pmb{a}^T\mathbb{d}(\pmb{X}\pmb{X}^T)\pmb{b}) \end{aligned} \tag{3-2}
d(aaaTXXXXXXTbbb)=tr(d(aaaTXXXXXXTbbb))=tr(aaaTd(XXXXXXT)bbb)(3-2)
由矩阵微分的乘积法则(式2-13)可得:
d
(
a
T
X
X
T
b
)
=
t
r
(
a
T
d
(
X
X
T
)
b
)
=
t
r
[
a
T
(
d
(
X
)
X
T
+
X
d
X
T
)
b
]
(3-3)
\begin{aligned} \mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b}) &= \mathbb{tr}(\pmb{a}^T\mathbb{d}(\pmb{X}\pmb{X}^T)\pmb{b}) \\ &= \mathbb{tr}[\pmb{a}^T(\mathbb{d}(\pmb{X})\pmb{X}^T+\pmb{X}\mathbb{d}\pmb{X}^T)\pmb{b}] \end{aligned} \tag{3-3}
d(aaaTXXXXXXTbbb)=tr(aaaTd(XXXXXXT)bbb)=tr[aaaT(d(XXX)XXXT+XXXdXXXT)bbb](3-3)
由矩阵的迹的线性法则(式1-3)可得:
d
(
a
T
X
X
T
b
)
=
t
r
[
a
T
(
d
(
X
)
X
T
+
X
d
X
T
)
b
]
=
t
r
(
a
T
d
(
X
)
X
T
b
)
+
t
r
(
a
T
X
d
(
X
T
)
b
)
(3-4)
\begin{aligned} \mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b}) &= \mathbb{tr}[\pmb{a}^T(\mathbb{d}(\pmb{X})\pmb{X}^T+\pmb{X}\mathbb{d}\pmb{X}^T)\pmb{b}] \\\ &= \mathbb{tr}(\pmb{a}^T\mathbb{d}(\pmb{X})\pmb{X}^T\pmb{b})+\mathbb{tr}(\pmb{a}^T\pmb{X}\mathbb{d}(\pmb{X}^T)\pmb{b}) \end{aligned} \tag{3-4}
d(aaaTXXXXXXTbbb) =tr[aaaT(d(XXX)XXXT+XXXdXXXT)bbb]=tr(aaaTd(XXX)XXXTbbb)+tr(aaaTXXXd(XXXT)bbb)(3-4)
由矩阵微分的转置法则(式2-15)可得:
d
(
a
T
X
X
T
b
)
=
t
r
(
a
T
d
(
X
)
X
T
b
)
+
t
r
(
a
T
X
d
(
X
T
)
b
)
=
t
r
(
a
T
d
(
X
)
X
T
b
)
+
t
r
(
a
T
X
(
d
X
)
T
b
)
(3-5)
\begin{aligned} \mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b}) &= \mathbb{tr}(\pmb{a}^T\mathbb{d}(\pmb{X})\pmb{X}^T\pmb{b})+\mathbb{tr}(\pmb{a}^T\pmb{X}\mathbb{d}(\pmb{X}^T)\pmb{b}) \\ &= \mathbb{tr}(\pmb{a}^T\mathbb{d}(\pmb{X})\pmb{X}^T\pmb{b})+\mathbb{tr}(\pmb{a}^T\pmb{X}(\mathbb{d}\pmb{X})^T\pmb{b}) \end{aligned} \tag{3-5}
d(aaaTXXXXXXTbbb)=tr(aaaTd(XXX)XXXTbbb)+tr(aaaTXXXd(XXXT)bbb)=tr(aaaTd(XXX)XXXTbbb)+tr(aaaTXXX(dXXX)Tbbb)(3-5)
由矩阵的迹的交换律(式1-6)可得:
d
(
a
T
X
X
T
b
)
=
t
r
(
a
T
d
(
X
)
X
T
b
)
+
t
r
(
a
T
X
(
d
X
)
T
b
)
=
t
r
(
X
T
b
a
T
d
X
)
+
t
r
(
b
a
T
X
(
d
X
)
T
)
=
t
r
(
X
T
b
a
T
d
X
)
+
t
r
(
(
b
a
T
X
)
T
d
X
)
=
t
r
(
X
T
b
a
T
d
X
)
+
t
r
(
X
T
a
b
T
d
X
)
(3-6)
\begin{aligned} \mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b}) &= \mathbb{tr}(\pmb{a}^T\mathbb{d}(\pmb{X})\pmb{X}^T\pmb{b})+\mathbb{tr}(\pmb{a}^T\pmb{X}(\mathbb{d}\pmb{X})^T\pmb{b}) \\ &= \mathbb{tr}(\pmb{X}^T\pmb{b}\pmb{a}^T\mathbb{d}\pmb{X}) + \mathbb{tr}(\pmb{b}\pmb{a}^T\pmb{X}(\mathbb{d}\pmb{X})^T)\\ &= \mathbb{tr}(\pmb{X}^T\pmb{b}\pmb{a}^T\mathbb{d}\pmb{X}) + \mathbb{tr}((\pmb{b}\pmb{a}^T\pmb{X})^T\mathbb{d}\pmb{X})\\ &= \mathbb{tr}(\pmb{X}^T\pmb{b}\pmb{a}^T\mathbb{d}\pmb{X}) + \mathbb{tr}(\pmb{X}^T\pmb{a}\pmb{b}^T\mathbb{d}\pmb{X}) \end{aligned} \tag{3-6}
d(aaaTXXXXXXTbbb)=tr(aaaTd(XXX)XXXTbbb)+tr(aaaTXXX(dXXX)Tbbb)=tr(XXXTbbbaaaTdXXX)+tr(bbbaaaTXXX(dXXX)T)=tr(XXXTbbbaaaTdXXX)+tr((bbbaaaTXXX)TdXXX)=tr(XXXTbbbaaaTdXXX)+tr(XXXTaaabbbTdXXX)(3-6)
由矩阵的迹的线性法则(式1-3)可得:
d
(
a
T
X
X
T
b
)
=
t
r
(
X
T
b
a
T
d
X
)
+
t
r
(
X
T
a
b
T
d
X
)
=
t
r
(
(
X
T
b
a
T
+
X
T
a
b
T
)
d
X
)
(3-7)
\begin{aligned} \mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b}) &= \mathbb{tr}(\pmb{X}^T\pmb{b}\pmb{a}^T\mathbb{d}\pmb{X}) + \mathbb{tr}(\pmb{X}^T\pmb{a}\pmb{b}^T\mathbb{d}\pmb{X}) \\ &= \mathbb{tr}((\pmb{X}^T\pmb{b}\pmb{a}^T+\pmb{X}^T\pmb{a}\pmb{b}^T)\mathbb{d}\pmb{X}) \end{aligned} \tag{3-7}
d(aaaTXXXXXXTbbb)=tr(XXXTbbbaaaTdXXX)+tr(XXXTaaabbbTdXXX)=tr((XXXTbbbaaaT+XXXTaaabbbT)dXXX)(3-7)
第三步:根据导数与微分的联系,写出最终结果
∂
(
a
T
X
X
T
b
)
∂
X
T
=
X
T
b
a
T
+
X
T
a
b
T
∂
(
a
T
X
X
T
b
)
∂
X
=
a
b
T
X
+
b
a
T
X
(3-8)
\begin{aligned} \frac{\partial( \pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})}{\partial{\pmb{X}^T}} &=\pmb{X}^T\pmb{b}\pmb{a}^T+\pmb{X}^T\pmb{a}\pmb{b}^T \\ \frac{\partial( \pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})}{\partial{\pmb{X}}} &= \pmb{a}\pmb{b}^T\pmb{X}+\pmb{b}\pmb{a}^T\pmb{X} \\\\ \end{aligned} \tag{3-8}
∂XXXT∂(aaaTXXXXXXTbbb)∂XXX∂(aaaTXXXXXXTbbb)=XXXTbbbaaaT+XXXTaaabbbT=aaabbbTXXX+bbbaaaTXXX(3-8)
3.2 矩阵的标量函数:迹
例:
f
=
tr
(
Y
T
M
Y
)
,
Y
=
σ
(
W
X
)
f = \text{tr}(\boldsymbol{Y}^T \boldsymbol{MY}), \boldsymbol{Y} = \sigma(\boldsymbol{WX})
f=tr(YTMY),Y=σ(WX),求
∂
f
∂
X
\dfrac{\partial f}{\partial \pmb{X}}
∂XXX∂f。其中
W
\pmb{W}
WWW 是
l
×
m
l \times m
l×m 矩阵,
X
\pmb{X}
XXX 是
m
×
n
m \times n
m×n 矩阵,
Y
\pmb{Y}
YYY 是
l
×
n
l \times n
l×n 矩阵,
M
\pmb{M}
MMM 是
l
×
l
l \times l
l×l 对称矩阵,
σ
\sigma
σ 是逐元素函数,
f
f
f 是标量。
解:
第一步:先求
∂
f
∂
Y
\dfrac{\partial f}{\partial \pmb{Y}}
∂YYY∂f,
d
f
=
tr
(
(
d
Y
)
T
M
Y
)
+
tr
(
Y
T
M
d
Y
)
=
tr
(
Y
T
M
T
d
Y
)
+
tr
(
Y
T
M
d
Y
)
=
tr
(
Y
T
(
M
+
M
T
)
d
Y
)
(3-9)
df = \text{tr}((d\boldsymbol{Y})^T\boldsymbol{MY}) + \text{tr}(\boldsymbol{Y}^T\boldsymbol{M}d\boldsymbol{Y}) = \text{tr}(\boldsymbol{Y}^T\boldsymbol{M}^Td\boldsymbol{Y}) + \text{tr}(\boldsymbol{Y}^T\boldsymbol{M}d\boldsymbol{Y}) = \text{tr}(\boldsymbol{Y}^T(\boldsymbol{M}+\boldsymbol{M}^T)d\boldsymbol{Y}) \tag{3-9}
df=tr((dY)TMY)+tr(YTMdY)=tr(YTMTdY)+tr(YTMdY)=tr(YT(M+MT)dY)(3-9)
根据导数与微分的联系,而
M
\pmb{M}
MMM 是
l
×
l
l \times l
l×l 对称矩阵,可得:
∂
f
∂
Y
=
(
M
+
M
T
)
Y
=
2
M
Y
(3-10)
\frac{\partial f}{\partial \boldsymbol{Y}}=(\boldsymbol{M}+\boldsymbol{M}^T)\boldsymbol{Y} = 2\boldsymbol{MY} \tag{3-10}
∂Y∂f=(M+MT)Y=2MY(3-10)
第二步:将
d
Y
d\boldsymbol{Y}
dY 用
d
X
d\boldsymbol{X}
dX 表示出来代入,并使用矩阵乘法/逐元素乘法交换(式1-8),可得:
d
f
=
tr
(
∂
f
∂
Y
T
(
σ
′
(
W
X
)
⊙
(
W
d
X
)
)
)
=
tr
(
(
∂
f
∂
Y
⊙
σ
′
(
W
X
)
)
T
W
d
X
)
(3-11)
df = \text{tr}\left(\frac{\partial f}{\partial \boldsymbol{Y}}^T (\sigma'(\boldsymbol{WX})\odot (\boldsymbol{W}d\boldsymbol{X}))\right) = \text{tr}\left(\left(\frac{\partial f}{\partial \boldsymbol{Y}} \odot \sigma'(\boldsymbol{WX})\right)^T \boldsymbol{W} d\boldsymbol{X}\right)\tag{3-11}
df=tr(∂Y∂fT(σ′(WX)⊙(WdX)))=tr((∂Y∂f⊙σ′(WX))TWdX)(3-11)
第三步:根据导数与微分的联系,可得:
∂
f
∂
X
=
W
T
(
∂
f
∂
Y
⊙
σ
′
(
W
X
)
)
=
W
T
(
(
2
M
σ
(
W
X
)
)
⊙
σ
′
(
W
X
)
)
(3-12)
\frac{\partial f}{\partial \boldsymbol{X}}=\boldsymbol{W}^T \left(\frac{\partial f}{\partial \boldsymbol{Y}}\odot \sigma'(\boldsymbol{WX})\right)=\boldsymbol{W}^T((2\boldsymbol{M}\sigma(\boldsymbol{WX}))\odot\sigma'(\boldsymbol{WX}))\tag{3-12}
∂X∂f=WT(∂Y∂f⊙σ′(WX))=WT((2Mσ(WX))⊙σ′(WX))(3-12)
下图汇总了几种典型的迹函数的微分矩阵与梯度矩阵的对应关系,为了省事的话话,可以查表。
3.3 矩阵的标量函数:行列式
∂
∣
X
3
∣
∂
X
=
∂
∣
X
∣
3
∂
X
=
3
∣
X
∣
3
(
X
−
1
)
T
=
3
∣
X
3
∣
(
X
−
1
)
T
(3-13)
\begin{aligned} \frac{\partial|\pmb{X}^3|}{\partial \pmb{X}} &=\frac{\partial|\pmb{X}|^3}{\partial \pmb{X}} =3|\pmb{X}|^3(\pmb{X}^{-1})^T = 3|\pmb{X}^3|(\pmb{X}^{-1})^T \end{aligned} \tag{3-13}
∂XXX∂∣XXX3∣=∂XXX∂∣XXX∣3=3∣XXX∣3(XXX−1)T=3∣XXX3∣(XXX−1)T(3-13)
第一步:写成迹函数的形式
对于
n
n
n 阶方阵
A
,
B
\pmb{A}, \pmb{B}
AAA,BBB,有
∣
A
B
∣
=
∣
A
∣
∣
B
∣
|\pmb{A}\pmb{B}|=|\pmb{A}| |\pmb{B}|
∣AAABBB∣=∣AAA∣∣BBB∣,则
d
∣
X
3
∣
=
d
(
∣
X
∣
3
)
=
t
r
(
d
(
∣
X
∣
3
)
)
(3-14)
\begin{aligned} \mathbb{d}|\pmb{X}^3| =\mathbb{d}(|\pmb{X}|^3)= \mathbb{tr}(\mathbb{d}(|\pmb{X}|^3)) \end{aligned} \\\ \tag{3-14}
d∣XXX3∣=d(∣XXX∣3)=tr(d(∣XXX∣3)) (3-14)
第二步:化简为迹函数微分矩阵的规范形式
由于这里是一个复合函数的全微分,可令:
z
=
∣
X
∣
3
,
u
=
∣
X
∣
z=|\pmb{X}|^3,u=|\pmb{X}|
z=∣XXX∣3,u=∣XXX∣,则
d
(
∣
X
∣
3
)
=
t
r
(
d
(
∣
X
∣
3
)
)
=
t
r
(
d
z
)
=
t
r
(
d
(
u
3
)
)
=
t
r
(
3
u
2
d
u
)
=
t
r
(
3
∣
X
∣
2
d
∣
X
∣
)
(3-15)
\begin{aligned} \mathbb{d}(|\pmb{X}|^3) &= \mathbb{tr}(\mathbb{d}(|\pmb{X}|^3)) \\ &= \mathbb{tr}(\mathbb{d}z) \\ &= \mathbb{tr}(\mathbb{d}(u^3)) \\ &= \mathbb{tr}(3u^2\mathbb{d}u) \\ &= \mathbb{tr}(3|\pmb{X}|^2\mathbb{d}|\pmb{X}|) \end{aligned} \tag{3-15}
d(∣XXX∣3)=tr(d(∣XXX∣3))=tr(dz)=tr(d(u3))=tr(3u2du)=tr(3∣XXX∣2d∣XXX∣)(3-15)
由矩阵行列式的微分可得:
d
(
t
r
(
∣
X
∣
3
)
)
=
t
r
(
3
∣
X
∣
2
d
∣
X
∣
)
=
t
r
(
3
∣
X
∣
2
∣
X
∣
t
r
(
X
−
1
d
X
)
)
=
t
r
(
3
∣
X
∣
3
t
r
(
X
−
1
d
X
)
)
(3-16 )
\begin{aligned} \mathbb{d}(\mathbb{tr}(|\pmb{X}|^3)) &= \mathbb{tr}(3|\pmb{X}|^2\mathbb{d}|\pmb{X}|) \\ &= \mathbb{tr}(3|\pmb{X}|^2|\pmb{X}|\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X}) ) \\ &= \mathbb{tr}(3|\pmb{X}|^3\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X}) ) \end{aligned} \tag{3-16 }
d(tr(∣XXX∣3))=tr(3∣XXX∣2d∣XXX∣)=tr(3∣XXX∣2∣XXX∣tr(XXX−1dXXX))=tr(3∣XXX∣3tr(XXX−1dXXX))(3-16 )
由矩阵的迹的线性法则(式1-3)可得:
d
(
t
r
(
∣
X
∣
3
)
)
=
t
r
(
3
∣
X
∣
3
t
r
(
X
−
1
d
X
)
)
=
3
∣
X
∣
3
t
r
(
X
−
1
d
X
)
=
t
r
(
3
∣
X
3
∣
X
−
1
d
X
)
(3-17 )
\begin{aligned} \mathbb{d}(\mathbb{tr}(|\pmb{X}|^3)) &= \mathbb{tr}(3|\pmb{X}|^3\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X}) ) \\ &= 3|\pmb{X}|^3\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X}) \\ &= \mathbb{tr}(3|\pmb{X}^3|\pmb{X}^{-1}\mathbb{d}\pmb{X}) \end{aligned} \tag{3-17 }
d(tr(∣XXX∣3))=tr(3∣XXX∣3tr(XXX−1dXXX))=3∣XXX∣3tr(XXX−1dXXX)=tr(3∣XXX3∣XXX−1dXXX)(3-17 )
第三步:根据导数与微分的联系,可得:
∂
∣
X
3
∣
∂
X
T
=
∂
∣
X
∣
3
∂
X
T
=
3
∣
X
∣
3
X
−
1
=
3
∣
X
3
∣
X
−
1
∂
∣
X
3
∣
∂
X
=
∂
∣
X
∣
3
∂
X
=
3
∣
X
∣
3
(
X
−
1
)
T
=
3
∣
X
3
∣
(
X
−
1
)
T
(3-18)
\begin{aligned} \frac{\partial|\pmb{X}^3|}{\partial \pmb{X}^T} &=\frac{\partial|\pmb{X}|^3}{\partial \pmb{X}^T} =3|\pmb{X}|^3\pmb{X}^{-1} = 3|\pmb{X}^3|\pmb{X}^{-1} \\ \frac{\partial|\pmb{X}^3|}{\partial \pmb{X}} &=\frac{\partial|\pmb{X}|^3}{\partial \pmb{X}} =3|\pmb{X}|^3(\pmb{X}^{-1})^T = 3|\pmb{X}^3|(\pmb{X}^{-1})^T \end{aligned} \tag{3-18}
∂XXXT∂∣XXX3∣∂XXX∂∣XXX3∣=∂XXXT∂∣XXX∣3=3∣XXX∣3XXX−1=3∣XXX3∣XXX−1=∂XXX∂∣XXX∣3=3∣XXX∣3(XXX−1)T=3∣XXX3∣(XXX−1)T(3-18)
下图汇总了一些典型的行列式函数的微分矩阵与梯度矩阵的对应关系,为了省事的话话,可以查表。
使用矩阵微分,可以在不对向量或矩阵中的某一元素单独求导再拼接,因此会比较方便,所以建议大家多找几道习题联系,争取熟练使用上面矩阵微分的性质,以及迹函数的性质。
参考
- 矩阵求导术(上):https://zhuanlan.zhihu.com/p/24709748
- 矩阵求导公式的数学推导(矩阵求导——进阶篇):https://zhuanlan.zhihu.com/p/288541909
- 矩阵微分笔记:https://www.iteye.com/blog/cherishlc-1765932
- Matrix Differentiation:https://atmos.washington.edu/~dennis/MatrixCalculus.pdf
- Matrix Calculus:http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/calculus.html