前言
以下内容均假设运算可以成立
向量与标量之间
向量与标量之间的导数均是向量,其第
i
i
i 个分量分别为
(
∂
a
⃗
∂
x
)
i
=
∂
a
i
⃗
∂
x
(\frac{\partial \vec{a}}{\partial x})_i = \frac{\partial \vec{a_i}}{\partial x}
(∂x∂a)i=∂x∂ai
(
∂
x
∂
a
⃗
)
i
=
∂
x
∂
a
i
⃗
(\frac{\partial x}{\partial \vec{a}})_i = \frac{\partial x}{\partial \vec{a_i}}
(∂a∂x)i=∂ai∂x
矩阵与标量之间
矩阵与标量之间的导数均是矩阵,其第
i
i
i 行第
j
j
j 列元素分别为
(
∂
A
∂
x
)
i
j
=
∂
A
i
j
∂
x
(\frac{\partial \mathbf{A} }{\partial x})_{ij} = \frac{\partial A_{ij}}{\partial x}
(∂x∂A)ij=∂x∂Aij
(
∂
x
∂
A
)
i
j
=
∂
x
∂
A
i
j
(\frac{\partial x}{\partial \mathbf{A}})_{ij} = \frac{\partial x}{\partial A_{ij}}
(∂A∂x)ij=∂Aij∂x
函数关于向量
一阶导
一阶导数是向量,其第
i
i
i 个分量为
(
∇
f
(
x
)
)
i
=
∂
f
(
x
)
∂
x
i
(\nabla f(x))_i = \frac{\partial f(x)}{\partial x_i}
(∇f(x))i=∂xi∂f(x)
二阶导(海森矩阵)
二阶导数是矩阵,其第
i
i
i 行第
j
j
j 列元素为
(
∇
2
f
(
x
)
)
i
j
=
∂
2
f
(
x
)
∂
x
i
∂
x
j
(\nabla^2f(x))_{ij} = \frac{\partial^2 f(x)}{\partial x_i \partial x_j}
(∇2f(x))ij=∂xi∂xj∂2f(x)
规则
向量和矩阵的导数满足乘法法则
此 处 a 相 对 于 x 为 常 量 此处a相对于x为常量 此处a相对于x为常量
∂
x
T
a
∂
x
=
∂
a
T
x
x
=
a
\frac{\partial x^Ta}{\partial x} = \frac{\partial a^Tx}{x} = a
∂x∂xTa=x∂aTx=a
∂
A
B
∂
x
=
∂
A
∂
x
B
+
A
∂
B
∂
x
\frac{\partial AB}{\partial x} = \frac{\partial A}{\partial x}B + A\frac{\partial B}{\partial x}
∂x∂AB=∂x∂AB+A∂x∂B
逆矩阵的导数表示
∂
A
−
1
∂
x
=
−
A
−
1
∂
A
∂
x
A
−
1
\frac{\partial A^{-1}}{\partial x} = -A^{-1}\frac{\partial A}{\partial x}A^{-1}
∂x∂A−1=−A−1∂x∂AA−1
此处
A
−
1
A
=
I
A^{-1}A = I
A−1A=I
求导的标量是矩阵元素
∂
t
r
(
A
B
)
∂
A
i
j
=
B
j
i
\frac{\partial\ tr(AB)}{\partial A_{ij}} = B_{ji}
∂Aij∂ tr(AB)=Bji
∂
t
r
(
A
B
)
∂
A
=
B
T
\frac{\partial\ tr(AB)}{\partial A} = B^T
∂A∂ tr(AB)=BT
进而有
∂
t
r
(
A
T
B
)
∂
A
=
B
\frac{\partial\ tr(A^TB)}{\partial A} = B
∂A∂ tr(ATB)=B
∂
t
r
(
A
)
∂
A
=
I
\frac{\partial\ tr(A)}{\partial A} = I
∂A∂ tr(A)=I
∂
t
r
(
A
B
A
T
)
∂
A
=
A
(
B
+
B
T
)
\frac{\partial\ tr(ABA^T)}{\partial A} = A(B + B^T)
∂A∂ tr(ABAT)=A(B+BT)
∂
∥
A
∥
F
2
∂
A
=
∂
t
r
(
A
A
T
)
∂
A
=
2
A
\frac{\partial \| A \|_F^2}{\partial A} = \frac{\partial\ tr(AA^T)}{\partial A} = 2A
∂A∂∥A∥F2=∂A∂ tr(AAT)=2A
链式法则
若函数
f
f
f 是
g
g
g 和
h
h
h 的符合,即
f
(
x
)
=
g
(
h
(
x
)
)
f(x) = g(h(x))
f(x)=g(h(x)) ,则有
∂
f
(
x
)
∂
x
=
∂
g
(
h
(
x
)
)
∂
h
(
x
)
⋅
∂
h
(
x
)
∂
x
\frac{\partial f(x)}{\partial x} = \frac{\partial g(h(x))}{\partial h(x)} \cdot \frac{\partial h(x)}{\partial x}
∂x∂f(x)=∂h(x)∂g(h(x))⋅∂x∂h(x)
举例
将
A
x
−
b
Ax - b
Ax−b 看作一个整体可简化计算:
∂
∂
x
(
A
x
−
b
)
T
W
(
A
x
−
b
)
=
∂
(
A
x
−
b
)
∂
x
⋅
2
W
(
A
x
−
b
)
=
2
A
T
W
(
A
x
−
b
)
\frac{\partial}{\partial x}(Ax - b)^TW(Ax - b) = \frac{\partial (Ax - b)}{\partial x} \cdot2W(Ax - b) = 2A^TW(Ax - b)
∂x∂(Ax−b)TW(Ax−b)=∂x∂(Ax−b)⋅2W(Ax−b)=2ATW(Ax−b)