参考资料
1. 机器学习中的矩阵、向量求导
矩阵对标量求导
求导结果与函数(矩阵)同型,即导数结果的每个元素就是矩阵相应分量对标量的求导。若函数矩阵
f
\boldsymbol f
f是一个
m
×
n
m\times n
m×n维矩阵,则求导结果也是一个
m
×
n
m\times n
m×n维矩阵,其中
(
∂
f
∂
x
)
i
j
=
∂
f
i
j
∂
x
\left(\frac{\partial\boldsymbol f}{\partial x}\right)_{ij}=\frac{\partial f_{ij}}{\partial x}
(∂x∂f)ij=∂x∂fij
特别地,对于
n
n
n维向量,其对标量自变量的导数为
y
=
(
y
1
,
⋯
,
y
n
)
⟹
∂
y
∂
x
=
(
∂
y
1
∂
x
,
⋯
,
∂
y
n
∂
x
)
\boldsymbol y=(y_1,\cdots,y_n) \implies \frac{\partial\boldsymbol y}{\partial x}=\left(\frac{\partial y_1}{\partial x},\cdots,\frac{\partial y_n}{\partial x}\right)
y=(y1,⋯,yn)⟹∂x∂y=(∂x∂y1,⋯,∂x∂yn)
标量对矩阵求导
求导结果与自变量(矩阵)同型,若自变量矩阵
X
X
X是一个
m
×
n
m\times n
m×n维矩阵,则求导结果也是一个
m
×
n
m\times n
m×n维矩阵,其中
(
∂
f
∂
X
)
i
j
=
∂
f
∂
x
i
j
\left(\frac{\partial f}{\partial X}\right)_{ij}=\frac{\partial f}{\partial x_{ij}}
(∂X∂f)ij=∂xij∂f
特别地,标量函数
f
f
f对于
n
n
n维向量
x
\boldsymbol x
x的导数为
∇
x
f
=
(
∂
f
∂
x
1
,
⋯
,
∂
f
∂
x
n
)
⊤
\nabla_{\boldsymbol x} f= \left(\frac{\partial f}{\partial x_1},\cdots,\frac{\partial f}{\partial x_n}\right)^\top
∇xf=(∂x1∂f,⋯,∂xn∂f)⊤
向量对向量求导(雅可比矩阵)
若函数值
f
\boldsymbol f
f是一个
m
m
m维向量,自变量
x
x
x是
n
n
n维向量,则求导结果是
m
×
n
m\times n
m×n维矩阵,其中
∂
f
∂
x
=
(
∂
f
∂
x
1
,
⋯
,
∂
f
∂
x
n
)
,
(
∂
f
∂
x
)
i
j
=
∂
f
i
∂
x
j
\frac{\partial\boldsymbol f}{\partial\boldsymbol x}= \left(\frac{\partial\boldsymbol f}{\partial x_1},\cdots,\frac{\partial\boldsymbol f}{\partial x_n}\right),\quad \left(\frac{\partial\boldsymbol f}{\partial\boldsymbol x}\right)_{ij}=\frac{\partial f_i}{\partial x_j}
∂x∂f=(∂x1∂f,⋯,∂xn∂f),(∂x∂f)ij=∂xj∂fi
特殊地,当函数值
f
\boldsymbol f
f为标量时,雅克比矩阵是一个行向量,这与标量对向量的求导结果不一致,即
∇
x
f
=
(
∇
x
f
)
⊤
=
(
∂
f
∂
x
1
,
⋯
,
∂
f
∂
x
n
)
\nabla_{\boldsymbol x}\boldsymbol f=(\nabla_{\boldsymbol x}f)^\top =\left(\frac{\partial f}{\partial x_1},\cdots,\frac{\partial f}{\partial x_n}\right)
∇xf=(∇xf)⊤=(∂x1∂f,⋯,∂xn∂f)
向量求导的链式法则
若中间变量都是向量,假设变量存在依赖关系
x
→
v
→
u
→
f
\boldsymbol x\to\boldsymbol v\to\boldsymbol u\to\boldsymbol f
x→v→u→f,则
∂
f
∂
x
=
∂
f
∂
u
∂
u
∂
v
∂
v
∂
x
\frac{\partial\boldsymbol f}{\partial\boldsymbol x}=\frac{\partial\boldsymbol f}{\partial\boldsymbol u}\frac{\partial\boldsymbol u}{\partial\boldsymbol v}\frac{\partial\boldsymbol v}{\partial\boldsymbol x}
∂x∂f=∂u∂f∂v∂u∂x∂v
若结果变量
f
f
f是标量,则
∂
f
∂
x
=
∂
f
∂
x
⊤
=
∂
f
∂
u
⊤
∂
u
∂
v
∂
v
∂
x
\frac{\partial\boldsymbol f}{\partial\boldsymbol x}=\frac{\partial f}{\partial\boldsymbol x^\top}=\frac{\partial f}{\partial\boldsymbol u^\top}\frac{\partial\boldsymbol u}{\partial\boldsymbol v}\frac{\partial\boldsymbol v}{\partial\boldsymbol x}
∂x∂f=∂x⊤∂f=∂u⊤∂f∂v∂u∂x∂v
以上结果,可用于RNN的BPTT推导。
矩阵迹
迹的基本性质
- 转置不变性: t r ( A ) = t r ( A ⊤ ) tr(A)=tr(A^\top) tr(A)=tr(A⊤)
- 轮换不变性: t r ( A B C ) = t r ( C A B ) = t r ( B C A ) tr(ABC)=tr(CAB)=tr(BCA) tr(ABC)=tr(CAB)=tr(BCA)
迹(标量)的导数
- ∇ t r ( A ⊤ X ) = ∇ t r ( X ⊤ A ) = ∇ t r ( A X ⊤ ) = A \nabla tr(A^\top X)=\nabla tr(X^\top A)=\nabla tr(AX^\top)=A ∇tr(A⊤X)=∇tr(X⊤A)=∇tr(AX⊤)=A
- ∇ t r ( A X ) = ∇ t r ( X A ) = A ⊤ \nabla tr(AX)=\nabla tr(XA)=A^\top ∇tr(AX)=∇tr(XA)=A⊤
- ∇ t r ( X A X ⊤ B ) = B ⊤ X A ⊤ + B X A \nabla tr(XAX^\top B)=B^\top XA^\top+BXA ∇tr(XAX⊤B)=B⊤XA⊤+BXA
证明:
∇
t
r
(
X
A
X
⊤
B
)
=
∇
X
1
t
r
(
X
1
A
X
2
⊤
B
)
+
∇
X
2
t
r
(
X
1
A
X
2
⊤
B
)
=
∇
X
1
t
r
(
A
X
2
⊤
B
X
1
)
+
∇
X
2
t
r
(
B
X
1
A
X
2
⊤
)
=
B
⊤
X
2
A
⊤
+
B
X
1
A
=
B
⊤
X
A
⊤
+
B
X
A
\begin{aligned} \nabla tr(XAX^\top B) &=\nabla_{X_1} tr(X_1AX_2^\top B)+\nabla_{X_2} tr(X_1AX_2^\top B)\\ &=\nabla_{X_1} tr(AX_2^\top BX_1)+\nabla_{X_2} tr(BX_1AX_2^\top )\\ &=B^\top X_2A^\top+BX_1A\\ &=B^\top XA^\top+BXA\\ \end{aligned}
∇tr(XAX⊤B)=∇X1tr(X1AX2⊤B)+∇X2tr(X1AX2⊤B)=∇X1tr(AX2⊤BX1)+∇X2tr(BX1AX2⊤)=B⊤X2A⊤+BX1A=B⊤XA⊤+BXA
与迹有关的导数
- ∇ a ⊤ X ⊤ X a = 2 X a a ⊤ \nabla a^\top X^\top Xa=2Xa a^\top ∇a⊤X⊤Xa=2Xaa⊤
- ∇ ( X a − b ) ⊤ ( X a − b ) = 2 ( X a − b ) a ⊤ \nabla (Xa-b)^\top(Xa-b)=2(Xa-b)a^\top ∇(Xa−b)⊤(Xa−b)=2(Xa−b)a⊤
实值函数对向量求导
标量对向量的求导,可以用迹相关的性质。
- 矩阵乘法求导: ∇ a ⊤ x = a \nabla\boldsymbol a^\top\boldsymbol x=\boldsymbol a ∇a⊤x=a
- 内积求导: ∇ x ⊤ x = 2 x \nabla\boldsymbol x^\top\boldsymbol x=2\boldsymbol x ∇x⊤x=2x
- 二次型求导: ∇ x ⊤ A x = ( A + A ⊤ ) x \nabla\boldsymbol x^\top A \boldsymbol x=(A + A^\top)\boldsymbol x ∇x⊤Ax=(A+A⊤)x
- 向量内积求导: ∇ x u ⊤ v = ( ∇ x u ) ⊤ v + ( ∇ x v ) ⊤ u \nabla_{\boldsymbol x}\boldsymbol u^\top\boldsymbol v=(\nabla_\boldsymbol x \boldsymbol u)^\top\boldsymbol v+(\nabla_\boldsymbol x \boldsymbol v)^\top\boldsymbol u ∇xu⊤v=(∇xu)⊤v+(∇xv)⊤u