Jacobian矩阵 梯度矩阵 矩阵偏导与微分 常见公式
矩阵求导是机器学习中常见的运算方法,研究对象包括标量矩阵,求导分为标量矩阵求导,矩阵求导。
根据个人理解和经验,机器学习中的优化目标一般是一个由向量或矩阵运算得到的标量,因此应该重点关注标量对向量和矩阵的求导。
本文总结了矩阵求导的定义和常见公式,主要内容来自张贤达《矩阵分析与应用(第二版)》的第三章。
Jacobian矩阵
矩阵导数可以理解成实值标量函数、实值向量函数、实值矩阵函数对于向量或矩阵中的每一个元素的偏导,是由一系列偏导组成的。
若有
m
m
m维列向量
x
∈
R
m
×
1
x\in \mathbb{R}^{m\times 1}
x∈Rm×1,变元为
x
x
x的实值标量函数
f
(
x
)
f(x)
f(x)在
x
x
x处的偏导向量定义为:
∂
f
(
x
)
∂
x
T
=
[
∂
f
∂
x
1
,
∂
f
∂
x
2
,
⋯
,
∂
f
∂
x
m
]
\frac{\partial f(x)}{\partial x^T} =[\frac{\partial f}{\partial x_1},\frac{\partial f}{\partial x_2},\cdots ,\frac{\partial f}{\partial x_m}]
∂xT∂f(x)=[∂x1∂f,∂x2∂f,⋯,∂xm∂f]
若有矩阵
X
∈
R
m
×
n
X\in \mathbb{R}^{m\times n}
X∈Rm×n,变元为
X
X
X的实值标量函数
f
(
X
)
f(X)
f(X)在
X
X
X处的Jacobian矩阵定义为:
∂
f
(
X
)
∂
x
T
=
[
∂
f
(
X
)
∂
X
11
⋯
∂
f
(
X
)
∂
X
m
1
⋮
⋱
⋮
∂
f
(
X
)
∂
X
1
n
⋯
∂
f
(
X
)
∂
X
m
n
]
∈
R
n
×
m
\frac{\partial f(X)}{\partial x^T} =\left[ \begin{matrix} \frac{\partial f(X)}{\partial X_{11}} & \cdots & \frac{\partial f(X)}{\partial X_{m1}} \\ \vdots & \ddots & \vdots \\ \frac{\partial f(X)}{\partial X_{1n}} & \cdots & \frac{\partial f(X)}{\partial X_{mn}} \\ \end{matrix} \right]\in \mathbb{R}^{n\times m}
∂xT∂f(X)=⎣⎢⎢⎡∂X11∂f(X)⋮∂X1n∂f(X)⋯⋱⋯∂Xm1∂f(X)⋮∂Xmn∂f(X)⎦⎥⎥⎤∈Rn×m
对于矩阵
X
∈
R
m
×
n
X\in \mathbb{R}^{m\times n}
X∈Rm×n,实值矩阵函数
f
(
X
)
∈
R
p
×
q
f(X)\in \mathbb{R}^{p\times q}
f(X)∈Rp×q在
X
X
X处的Jacobian矩阵定义为:
∂
f
(
X
)
∂
X
T
=
[
∂
f
(
X
)
11
∂
X
11
⋯
∂
f
(
X
)
11
∂
X
m
1
⋯
∂
f
(
X
)
11
∂
X
m
n
⋮
⋱
⋮
⋱
⋮
∂
f
(
X
)
p
1
∂
X
11
⋯
∂
f
(
X
)
p
1
∂
X
m
1
⋯
∂
f
(
X
)
p
1
∂
X
m
n
⋮
⋱
⋮
⋱
⋮
∂
f
(
X
)
p
q
∂
X
11
⋯
∂
f
(
X
)
p
q
∂
X
m
1
⋯
∂
f
(
X
)
p
q
∂
X
m
n
]
∈
R
p
q
×
m
n
\frac{\partial f(X)}{\partial X^T} =\left[ \begin{matrix} \frac{\partial f(X)_{11}}{\partial X_{11}} & \cdots & \frac{\partial f(X)_{11}}{\partial X_{m1}} & \cdots & \frac{\partial f(X)_{11}}{\partial X_{mn}} \\ \vdots & \ddots & \vdots &\ddots & \vdots\\ \frac{\partial f(X)_{p1}}{\partial X_{11}} & \cdots & \frac{\partial f(X)_{p1}}{\partial X_{m1}} & \cdots & \frac{\partial f(X)_{p1}}{\partial X_{mn}} \\ \vdots & \ddots & \vdots &\ddots & \vdots\\ \frac{\partial f(X)_{pq}}{\partial X_{11}} & \cdots & \frac{\partial f(X)_{pq}}{\partial X_{m1}} &\cdots & \frac{\partial f(X)_{pq}}{\partial X_{mn}} \\ \end{matrix} \right]\in \mathbb{R}^{pq\times mn}
∂XT∂f(X)=⎣⎢⎢⎢⎢⎢⎢⎢⎡∂X11∂f(X)11⋮∂X11∂f(X)p1⋮∂X11∂f(X)pq⋯⋱⋯⋱⋯∂Xm1∂f(X)11⋮∂Xm1∂f(X)p1⋮∂Xm1∂f(X)pq⋯⋱⋯⋱⋯∂Xmn∂f(X)11⋮∂Xmn∂f(X)p1⋮∂Xmn∂f(X)pq⎦⎥⎥⎥⎥⎥⎥⎥⎤∈Rpq×mn
这个Jacobian矩阵是分别对
f
(
X
)
f(X)
f(X)和
X
X
X做向量化然后逐元素求偏导得到的。这里的
f
(
X
)
f(X)
f(X)和
X
X
X都是按列展开的。
有了这个通用公式,其他关于向量的各种Jacobian矩阵也都有定义了。
梯度矩阵
实值标量函数
f
(
x
)
f(x)
f(x)在列向量变元
x
∈
R
m
×
1
x\in \mathbb{R}^{m\times 1}
x∈Rm×1处的梯度向量定义为:
∂
f
(
x
)
∂
x
=
[
∂
f
(
x
)
∂
x
1
,
⋯
,
∂
f
(
x
)
∂
x
m
]
T
\frac{\partial f(x)}{\partial x} =[\frac{\partial f(x)}{\partial x_1},\cdots,\frac{\partial f(x)}{\partial x_m}]^T
∂x∂f(x)=[∂x1∂f(x),⋯,∂xm∂f(x)]T
注意这是个列向量。
实值标量函数
f
(
X
)
f(X)
f(X)在矩阵变元
X
∈
R
m
×
n
X\in \mathbb{R}^{m\times n}
X∈Rm×n处的梯度矩阵定义为:
∂
f
(
X
)
∂
X
=
[
∂
f
(
X
)
∂
X
11
⋯
∂
f
(
X
)
∂
X
1
n
⋮
⋱
⋮
∂
f
(
X
)
∂
X
m
1
⋯
∂
f
(
X
)
∂
X
m
n
]
∈
R
m
×
n
\frac{\partial f(X)}{\partial X} =\left[ \begin{matrix} \frac{\partial f(X)}{\partial X_{11}} & \cdots & \frac{\partial f(X)}{\partial X_{1n}} \\ \vdots & \ddots & \vdots \\ \frac{\partial f(X)}{\partial X_{m1}} & \cdots & \frac{\partial f(X)}{\partial X_{mn}} \\ \end{matrix} \right]\in \mathbb{R}^{m\times n}
∂X∂f(X)=⎣⎢⎢⎡∂X11∂f(X)⋮∂Xm1∂f(X)⋯⋱⋯∂X1n∂f(X)⋮∂Xmn∂f(X)⎦⎥⎥⎤∈Rm×n
实值矩阵函数
f
(
X
)
∈
R
p
×
q
f(X)\in \mathbb{R}^{p\times q}
f(X)∈Rp×q在矩阵变元
X
∈
R
m
×
n
X\in \mathbb{R}^{m\times n}
X∈Rm×n处的梯度矩阵定义为:
∂
f
(
X
)
∂
X
=
[
∂
f
(
X
)
11
∂
X
11
⋯
∂
f
(
X
)
p
1
∂
X
11
⋯
∂
f
(
X
)
p
q
∂
X
11
⋮
⋱
⋮
⋱
⋮
∂
f
(
X
)
11
∂
X
m
1
⋯
∂
f
(
X
)
p
1
∂
X
m
1
⋯
∂
f
(
X
)
p
q
∂
X
m
1
⋮
⋱
⋮
⋱
⋮
∂
f
(
X
)
11
∂
X
m
n
⋯
∂
f
(
X
)
p
1
∂
X
m
n
⋯
∂
f
(
X
)
p
q
∂
X
m
n
]
∈
R
m
n
×
p
q
\frac{\partial f(X)}{\partial X} =\left[ \begin{matrix} \frac{\partial f(X)_{11}}{\partial X_{11}} & \cdots & \frac{\partial f(X)_{p1}}{\partial X_{11}} & \cdots & \frac{\partial f(X)_{pq}}{\partial X_{11}} \\ \vdots & \ddots & \vdots &\ddots & \vdots\\ \frac{\partial f(X)_{11}}{\partial X_{m1}} & \cdots & \frac{\partial f(X)_{p1}}{\partial X_{m1}} & \cdots & \frac{\partial f(X)_{pq}}{\partial X_{m1}} \\ \vdots & \ddots & \vdots &\ddots & \vdots\\ \frac{\partial f(X)_{11}}{\partial X_{mn}} & \cdots & \frac{\partial f(X)_{p1}}{\partial X_{mn}} &\cdots & \frac{\partial f(X)_{pq}}{\partial X_{mn}} \\ \end{matrix} \right]\in \mathbb{R}^{mn\times pq}
∂X∂f(X)=⎣⎢⎢⎢⎢⎢⎢⎢⎡∂X11∂f(X)11⋮∂Xm1∂f(X)11⋮∂Xmn∂f(X)11⋯⋱⋯⋱⋯∂X11∂f(X)p1⋮∂Xm1∂f(X)p1⋮∂Xmn∂f(X)p1⋯⋱⋯⋱⋯∂X11∂f(X)pq⋮∂Xm1∂f(X)pq⋮∂Xmn∂f(X)pq⎦⎥⎥⎥⎥⎥⎥⎥⎤∈Rmn×pq
相同函数与变元对应的Jacobian矩阵和梯度矩阵互为转置关系。
不知道是张老的书写的不够简明扼要,还是我没认真看,这么简单的定义我看了好久才搞明白。书中指出,在流行计算、几何物理、微分几何等领域,行向量偏导向量和Jacobian矩阵是最自然的选择,在最优化和许多工程问题中,梯度向量和梯度矩阵是最自然的选择。这也符合我的一些经验,梯度矩阵看起来要比Jacobian矩阵顺眼很多。
一般见到的矩阵导数是梯度矩阵的形式。 说白了Jacobian矩阵是对 X T X^T XT求导得到的,梯度矩阵是对 X X X求导得到的。
矩阵偏导和梯度计算法则
一般说的矩阵导数就是梯度矩阵或向量。根据定义可有如下常用运算法则:
- 若 c c c为常数, ∂ c ∂ X = O m × n \frac{\partial c}{\partial X}=O_{m\times n} ∂X∂c=Om×n, O m × n O_{m\times n} Om×n是 m m m行 n n n列的0矩阵.
- ∂ [ c 1 f ( X ) + c 2 g ( X ) ] ∂ X = c 1 ∂ f ( X ) ∂ X + c 2 ∂ g ( X ) ∂ X \frac{\partial [c_1f(X)+c_2g(X)]}{\partial X}=c_1 \frac{\partial f(X)}{\partial X}+c_2 \frac{\partial g(X)}{\partial X} ∂X∂[c1f(X)+c2g(X)]=c1∂X∂f(X)+c2∂X∂g(X)
- ∂ [ f ( X ) g ( x ) ] ∂ X = g ( X ) ∂ f ( X ) ∂ X + f ( X ) ∂ g ( X ) ∂ X \frac{\partial [f(X)g(x)]}{\partial X}=g(X)\frac{\partial f(X)}{\partial X}+f(X)\frac{\partial g(X)}{\partial X} ∂X∂[f(X)g(x)]=g(X)∂X∂f(X)+f(X)∂X∂g(X)
- ∂ [ f ( X ) g ( X ) h ( X ) ] ∂ X = g ( X ) h ( X ) ∂ f ( X ) ∂ X + f ( X ) h ( X ) ∂ g ( X ) ∂ X + f ( X ) g ( X ) ∂ h ( X ) ∂ X \frac{\partial [f(X)g(X)h(X)]}{\partial X}=g(X)h(X)\frac{\partial f(X)}{\partial X}+f(X)h(X)\frac{\partial g(X)}{\partial X}+f(X)g(X)\frac{\partial h(X)}{\partial X} ∂X∂[f(X)g(X)h(X)]=g(X)h(X)∂X∂f(X)+f(X)h(X)∂X∂g(X)+f(X)g(X)∂X∂h(X)
- ∂ [ f ( X ) / g ( X ) ] ∂ X = 1 g 2 ( X ) [ g ( X ) ∂ f ( X ) ∂ X − f ( X ) ∂ g ( X ) ∂ X ] \frac{\partial [f(X)/g(X)]}{\partial X}=\frac{1}{g^2(X)}[g(X)\frac{\partial f(X)}{\partial X}-f(X)\frac{\partial g(X)}{\partial X}] ∂X∂[f(X)/g(X)]=g2(X)1[g(X)∂X∂f(X)−f(X)∂X∂g(X)]
- ∂ g ( f ( X ) ) ∂ X = d g ( f ( X ) ) d f ( X ) ∂ f ( X ) ∂ X \frac{\partial g(f(X))}{\partial X}=\frac{dg(f(X))}{df(X)}\frac{\partial f(X)}{\partial X} ∂X∂g(f(X))=df(X)dg(f(X))∂X∂f(X)
- 求导链式法则: ∂ g ( f ( X ) ) ∂ X = d g ( y ) d y ∂ f ( X ) ∂ X \frac{\partial g(f(X))}{\partial X}=\frac{dg(y)}{dy} \frac{\partial f(X)}{\partial X} ∂X∂g(f(X))=dydg(y)∂X∂f(X)
此外在计算以向量和矩阵为变元的函数的偏导时,有个重要的独立性基本假设,即向量和矩阵中的各个元素是相互独立的,用公式表示为:
∂
x
i
∂
x
j
=
{
1
,
i
f
i
=
j
0
,
e
l
s
e
\frac{\partial x_i}{\partial x_j}=\left\{ \begin{array}{l} 1,if\ i=j \\ 0,else\end{array}\right.
∂xj∂xi={1,if i=j0,else
以及:
∂
x
k
l
∂
x
i
j
=
{
1
,
i
f
k
=
i
a
n
d
l
=
j
0
,
e
l
s
e
\frac{\partial x_{kl}}{\partial x_{ij}}=\left\{ \begin{array}{l} 1,if\ k=i\ and\ l=j \\ 0,else\end{array}\right.
∂xij∂xkl={1,if k=i and l=j0,else
举个根据定义求解梯度矩阵的例子,求实值函数
f
(
X
)
=
a
T
X
X
T
b
f(X)=a^TXX^Tb
f(X)=aTXXTb在矩阵变元
X
X
X处的梯度矩阵,
a
,
b
a,b
a,b均为
n
n
n维列向量:
a
T
X
X
T
b
=
∑
k
=
1
m
∑
l
=
1
n
a
k
(
∑
p
=
1
n
x
k
p
x
l
p
)
b
l
a^TXX^Tb=\sum_{k=1}^m\sum_{l=1}^na_k(\sum_{p=1}^nx_{kp}x_{lp})b_l
aTXXTb=k=1∑ml=1∑nak(p=1∑nxkpxlp)bl
然后
根据定义就是这样求解的。
矩阵微分以及与一阶导数的关系:Jacobian矩阵的辨识
矩阵微分的定义为:
d
X
=
[
d
X
i
j
]
i
,
j
=
1
m
,
n
dX=[dX_{ij}]_{i,j=1}^{m,n}
dX=[dXij]i,j=1m,n
标量对标量的导数是用微分定义的,标量
f
f
f对标量
x
x
x的导数
f
′
(
x
)
f'(x)
f′(x)满足
d
f
=
f
′
(
x
)
d
x
df=f'(x)dx
df=f′(x)dx。而实值标量函数
f
(
x
)
f(x)
f(x)对向量
x
x
x的导数与微分的关系,可以表示为(此表示的证明书上有):
d
f
(
x
)
=
∑
i
=
1
n
∂
f
(
x
)
∂
x
i
d
x
i
=
∂
f
(
x
)
∂
x
T
d
x
df(x)=\sum_{i=1}^n\frac{\partial f(x)}{\partial x_i}dx_i=\frac{\partial f(x)}{\partial x}^Tdx
df(x)=i=1∑n∂xi∂f(x)dxi=∂x∂f(x)Tdx
即
f
(
x
)
f(x)
f(x)的微分与
x
x
x中每个元素的微分都有关,
∂
f
(
x
)
∂
x
\frac{\partial f(x)}{\partial x}
∂x∂f(x)即为标量
f
f
f对向量
x
x
x的梯度向量,是一个向量。
同样,实标量函数
f
(
X
)
f(X)
f(X)对矩阵
X
∈
R
m
×
n
X\in\mathbb{R}^{m\times n}
X∈Rm×n求导时,
f
(
X
)
f(X)
f(X)的微分也与
X
X
X中每个元素有关,表示为(此表示的证明书上有):
d
f
(
X
)
=
∑
i
=
1
m
∑
j
=
1
n
∂
f
(
X
)
∂
X
i
j
d
X
i
j
=
t
r
(
∂
f
(
X
)
∂
X
T
d
X
)
df(X)=\sum_{i=1}^m\sum_{j=1}^n\frac{\partial f(X)}{\partial X_{ij}}dX_{ij}=tr(\frac{\partial f(X)}{\partial X}^TdX)
df(X)=i=1∑mj=1∑n∂Xij∂f(X)dXij=tr(∂X∂f(X)TdX)
其中
t
r
tr
tr表示的是矩阵求迹运算,
∂
f
(
X
)
∂
X
\frac{\partial f(X)}{\partial X}
∂X∂f(X)表示
f
(
X
)
f(X)
f(X)对
X
X
X的梯度矩阵。后一个等号成立的原因是矩阵迹运算有如下性质:
t
r
(
A
T
B
)
=
∑
i
,
j
A
i
j
B
i
j
tr(A^TB)=\sum_{i,j}A_{ij}B_{ij}
tr(ATB)=i,j∑AijBij
即
A
T
B
A^TB
ATB的迹等于
A
A
A与
B
B
B中对应元素乘积的和。
这部分给出了微分矩阵与实标量函数对向量和矩阵变元的Jacobian矩阵(向量)和梯度矩阵(向量)的关系,这种关系也可以用来求实标量函数对向量和矩阵变元的Jacobian矩阵和梯度矩阵,这种关系称为Jacobian矩阵的辨识。
书上还给了实矩阵函数对矩阵变元导数与微分矩阵的辨识关系,以及二阶导数与微分矩阵的关系(Hessian矩阵的辨识,Hessian矩阵即矩阵二阶导),不过由于我不是很关注,所以没写在这里。
矩阵微分运算法则
这里给出一些求矩阵微分和迹的运算法则:
- d ( X + Y ) = d X + d Y , d ( X Y ) = ( d X ) Y + X ( d Y ) d(X+Y)=dX+dY,d(XY)=(dX)Y+X(dY) d(X+Y)=dX+dY,d(XY)=(dX)Y+X(dY)
- d ( X T ) = ( d X ) T d(X^T)=(dX)^T d(XT)=(dX)T
- d A = 0 dA=0 dA=0, A A A为常数矩阵.
- d ( a X ) = a d ( X ) d(aX)=ad(X) d(aX)=ad(X), a a a为常数.
- d ( A X B ) = A ( d X ) B d(AXB)=A(dX)B d(AXB)=A(dX)B, A , B A,B A,B为常数矩阵.
- d ( f ( X ) g ( X ) h ( X ) ) = ( d f ( X ) ) g ( X ) h ( X ) + f ( X ) ( d g ( X ) ) h ( X ) + f ( X ) g ( X ) ( d h ( X ) ) d(f(X)g(X)h(X))=(df(X))g(X)h(X)+f(X)(dg(X))h(X)+f(X)g(X)(dh(X)) d(f(X)g(X)h(X))=(df(X))g(X)h(X)+f(X)(dg(X))h(X)+f(X)g(X)(dh(X))
- d t r ( X ) = t r ( d X ) dtr(X)=tr(dX) dtr(X)=tr(dX)
- d ∣ X ∣ = ∣ X ∣ t r ( X − 1 d X ) d|X|=|X|tr(X^{-1}dX) d∣X∣=∣X∣tr(X−1dX),行列式的微分
举个用微分与梯度矩阵的关系求梯度矩阵的例子,求
f
(
X
)
=
t
r
(
X
A
X
B
)
f(X)=tr(XAXB)
f(X)=tr(XAXB)对于矩阵
X
X
X的梯度矩阵:
d
t
r
(
X
A
X
B
)
=
t
r
(
d
(
X
A
X
B
)
)
=
t
r
[
(
d
X
)
A
X
B
+
X
A
(
d
X
)
B
]
=
t
r
[
(
A
X
B
+
B
X
A
)
d
X
]
dtr(XAXB)=tr(d(XAXB))\\ =tr[(dX)AXB+XA(dX)B]\\ =tr[(AXB+BXA)dX]
dtr(XAXB)=tr(d(XAXB))=tr[(dX)AXB+XA(dX)B]=tr[(AXB+BXA)dX]
因此得梯度矩阵:
∂
t
r
(
X
A
X
B
)
∂
X
=
(
A
X
B
+
B
X
A
)
T
\frac{\partial tr(XAXB)}{\partial X}=(AXB+BXA)^T
∂X∂tr(XAXB)=(AXB+BXA)T
常用矩阵求导公式总结
在Matrix Cookbook第二章Derivatives里面有很多,这里取常见的一些总结如下。我们用 ∂ f ( X ) ∂ X \frac{\partial f(X)}{\partial X} ∂X∂f(X)表示 f ( X ) f(X) f(X)在 X X X处的导数(即梯度矩阵),大写字母为矩阵,小写字母为列向量,则有:
- ∂ a T X b ∂ X = a b T \frac{\partial a^TXb}{\partial X}=ab^T ∂X∂aTXb=abT
- ∂ a T X T b ∂ X = b a T \frac{\partial a^TX^Tb}{\partial X}=ba^T ∂X∂aTXTb=baT
- ∂ x T A x ∂ x = ( A + A T ) x \frac{\partial x^TAx}{\partial x}=(A+A^T)x ∂x∂xTAx=(A+AT)x
- ∂ a T X X T b ∂ X = ( a b T + b a T ) X \frac{\partial a^TXX^Tb}{\partial X}=(ab^T+ba^T)X ∂X∂aTXXTb=(abT+baT)X
- ∂ a T X T X b ∂ X = X ( a b T + b a T ) \frac{\partial a^TX^TXb}{\partial X}=X(ab^T+ba^T) ∂X∂aTXTXb=X(abT+baT)
- ∂ b T X T D X c ∂ X = D X b c T + D X c b T \frac{\partial b^TX^TDXc}{\partial X}=DXbc^T+DXcb^T ∂X∂bTXTDXc=DXbcT+DXcbT
- ∂ ( X b + c ) T D ( X b + c ) ∂ X = ( D + D T ) ( X b + c ) b T \frac{\partial (Xb+c)^TD(Xb+c)}{\partial X}=(D+D^T)(Xb+c)b^T ∂X∂(Xb+c)TD(Xb+c)=(D+DT)(Xb+c)bT
- ∂ ( A x + b ) T C ( D x + e ) ∂ x = D T C T ( A x + b ) + A T C ( D x + e ) \frac{\partial (Ax+b)^TC(Dx+e)}{\partial x}=D^TC^T(Ax+b)+A^TC(Dx+e) ∂x∂(Ax+b)TC(Dx+e)=DTCT(Ax+b)+ATC(Dx+e)
迹的导数:
- ∂ t r ( X ) ∂ X = I \frac{\partial tr(X)}{\partial X}=I ∂X∂tr(X)=I
- ∂ t r ( X A ) ∂ X = A T \frac{\partial tr(XA)}{\partial X}=A^T ∂X∂tr(XA)=AT
- ∂ t r ( X T A ) ∂ X = A \frac{\partial tr(X^TA)}{\partial X}=A ∂X∂tr(XTA)=A
- ∂ t r ( A X T ) ∂ X = A \frac{\partial tr(AX^T)}{\partial X}=A ∂X∂tr(AXT)=A
- ∂ t r ( A X B ) ∂ X = A T B T \frac{\partial tr(AXB)}{\partial X}=A^TB^T ∂X∂tr(AXB)=ATBT
- ∂ t r ( A X T B ) ∂ X = B A \frac{\partial tr(AX^TB)}{\partial X}=BA ∂X∂tr(AXTB)=BA
- ∂ t r ( B X T X ) ∂ X = X B T + X B \frac{\partial tr(BX^TX)}{\partial X}=XB^T+XB ∂X∂tr(BXTX)=XBT+XB
- ∂ t r ( A X B X ) ∂ X = A T X T B T + B T X T A T \frac{\partial tr(AXBX)}{\partial X}=A^TX^TB^T+B^TX^TA^T ∂X∂tr(AXBX)=ATXTBT+BTXTAT
- ∂ t r ( B T X T C X B ) ∂ X = C T X B B T + C X B B T \frac{\partial tr(B^TX^TCXB)}{\partial X}=C^TXBB^T+CXBB^T ∂X∂tr(BTXTCXB)=CTXBBT+CXBBT
- ∂ t r ( A X B X T C ) ∂ X = A T C T X B T + C A X B \frac{\partial tr(AXBX^TC)}{\partial X}=A^TC^TXB^T+CAXB ∂X∂tr(AXBXTC)=ATCTXBT+CAXB
- ∂ t r ( A X T B X C ) ∂ X = B X C A + B T X A T C T \frac{\partial tr(AX^TBXC)}{\partial X}=BXCA+B^TXA^TC^T ∂X∂tr(AXTBXC)=BXCA+BTXATCT
- ∂ t r [ ( A X B + C ) ( A X B + C ) T ] ∂ X = 2 A T ( A X B + C ) B T \frac{\partial tr[(AXB+C)(AXB+C)^T]}{\partial X}=2A^T(AXB+C)B^T ∂X∂tr[(AXB+C)(AXB+C)T]=2AT(AXB+C)BT
范数的导数:
- ∂ ∥ x ∥ 2 2 ∂ x = ∂ x T x ∂ x = 2 x \frac{\partial \|x\|_2^2}{\partial x}=\frac{\partial x^Tx}{\partial x}=2x ∂x∂∥x∥22=∂x∂xTx=2x
- ∂ ∂ x ∥ x − a ∥ 2 = x − a ∥ x − a ∥ 2 \frac{\partial }{\partial x}\|x-a\|_2=\frac{x-a}{\|x-a\|_2} ∂x∂∥x−a∥2=∥x−a∥2x−a
- ∂ ∥ X ∥ F 2 ∂ X = ∂ ∂ X t r ( X X H ) = 2 X \frac{\partial \|X\|_F^2}{\partial X}=\frac{\partial }{\partial X}tr(XX^H)=2X ∂X∂∥X∥F2=∂X∂tr(XXH)=2X
行列式的导数:
- ∂ d e t ( X ) ∂ X = d e t ( X ) ( X − 1 ) T \frac{\partial det(X)}{\partial X}=det(X)(X^{-1})^T ∂X∂det(X)=det(X)(X−1)T
- ∂ d e t ( A X B ) ∂ X = d e t ( A X B ) ( X − 1 ) T = d e t ( A X B ) ( X T ) − 1 \frac{\partial det(AXB)}{\partial X}=det(AXB)(X^{-1})^T=det(AXB)(X^{T})^{-1} ∂X∂det(AXB)=det(AXB)(X−1)T=det(AXB)(XT)−1
暂时就这些,以后还有别的再补充。Matrix Cookbook上还有许多其他公式,也有的公式没有给出。如前所述,目前关注的还是实标量函数对矩阵和向量的导数,许多实矩阵函数对矩阵的导数的公式这里没有给出。