文章目录
0.符号定义
- 数域:记 F \mathbb{F} F为某一数域。
- 标量:记 y y y和 x x x为标量,相应的 d y \mathrm{d}y dy和 d x \mathrm{d}x dx也为标量,即 x , d x , y , d y ∈ F 1 x,\mathrm{d}x, y, \mathrm{d}y \in \mathbb{F}^{1} x,dx,y,dy∈F1。
- 向量:记
y
⃗
\vec{y}
y和
x
⃗
\vec{x}
x分别为
m
m
m和
n
n
n维列向量,相应的
d
y
⃗
\mathrm{d}\vec{y}
dy和
d
x
⃗
\mathrm{d}\vec{x}
dx也分别为
m
m
m和
n
n
n维列向量。
即 x ⃗ , d x ⃗ , ∈ F n \vec{x},\mathrm{d}\vec{x}, \in \mathbb{F}^{n} x,dx,∈Fn和 y ⃗ , d y ⃗ , ∈ F m \vec{y},\mathrm{d}\vec{y}, \in \mathbb{F}^{m} y,dy,∈Fm。 - 矩阵:记
Y
Y
Y和
X
X
X为矩阵,相应的
d
y
\mathrm{d}y
dy和
d
x
\mathrm{d}x
dx也为矩阵。
即 X , d X , ∈ F r × s X,\mathrm{d}X, \in \mathbb{F}^{r \times s} X,dX,∈Fr×s和 Y , d Y , ∈ F p × q Y,\mathrm{d}Y, \in \mathbb{F}^{p \times q} Y,dY,∈Fp×q。
其中
d
x
⃗
=
(
d
x
1
,
d
x
2
,
⋯
,
d
x
n
)
T
\mathrm{d}\vec{x} = ( \mathrm{d}x_1, \mathrm{d}x_2, \cdots, \mathrm{d}x_n )^T
dx=(dx1,dx2,⋯,dxn)T,
d
y
⃗
\mathrm{d}\vec{y}
dy同理。
d
X
=
(
d
x
⃗
1
,
d
x
⃗
2
,
⋯
,
d
x
⃗
s
)
=
(
d
x
11
d
x
12
d
x
13
⋯
d
x
1
s
d
x
21
d
x
22
d
x
23
⋯
d
x
2
s
d
x
31
d
x
32
d
x
33
⋯
d
x
3
s
⋮
⋮
⋮
⋱
⋮
d
x
r
1
d
x
r
2
d
x
r
3
⋯
d
x
r
s
)
r
×
s
\mathrm{d}X = \left( \begin{matrix} \mathrm{d}\vec{x}_1, & \mathrm{d}\vec{x}_2, & \cdots, & \mathrm{d}\vec{x}_s \end{matrix} \right) = \left( \begin{matrix} \mathrm{d}x_{11} & \mathrm{d}x_{12} & \mathrm{d}x_{13} & \cdots & \mathrm{d}x_{1s} \\ \mathrm{d}x_{21} & \mathrm{d}x_{22} & \mathrm{d}x_{23} & \cdots & \mathrm{d}x_{2s} \\ \mathrm{d}x_{31} & \mathrm{d}x_{32} & \mathrm{d}x_{33} & \cdots & \mathrm{d}x_{3s} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \mathrm{d}x_{r1} & \mathrm{d}x_{r2} & \mathrm{d}x_{r3} & \cdots & \mathrm{d}x_{rs} \\ \end{matrix} \right)_{r \times s}
dX=(dx1,dx2,⋯,dxs)=⎝⎜⎜⎜⎜⎜⎛dx11dx21dx31⋮dxr1dx12dx22dx32⋮dxr2dx13dx23dx33⋮dxr3⋯⋯⋯⋱⋯dx1sdx2sdx3s⋮dxrs⎠⎟⎟⎟⎟⎟⎞r×s
矩阵求导类型[1]
标量 | 向量 | 矩阵 |
---|---|---|
标量 | ∂ y ∂ x \frac{\partial y}{\partial x} ∂x∂y | ∂ y ⃗ ∂ x \frac{\partial \vec{y}}{\partial x} ∂x∂y |
向量 | ∂ y ∂ x ⃗ \frac{\partial y}{\partial \vec{x}} ∂x∂y | ∂ y ⃗ ∂ x ⃗ \frac{\partial \vec{y}}{\partial \vec{x}} ∂x∂y |
矩阵 | ∂ y ∂ X \frac{\partial y}{\partial X} ∂X∂y | ∂ y ⃗ ∂ X \frac{\partial \vec{y}}{\partial X} ∂X∂y |
1.对标量求导
1.1标量对标量求导
为了全文书写上风格统一:
d
y
=
∂
y
∂
x
d
x
.
\mathrm{d}y = \frac{\partial y}{\partial x} \mathrm{d}x.
dy=∂x∂ydx.
性质
- SS1(线性): ∂ ( u + v ) ∂ x = ∂ u ∂ x + ∂ v ∂ x \frac{\partial (u + v)}{\partial x} = \frac{\partial u}{\partial x} + \frac{\partial v}{\partial x} ∂x∂(u+v)=∂x∂u+∂x∂v。
- SS2(分部): ∂ ( u v ) ∂ x = u ∂ v ∂ x + v ∂ u ∂ x \frac{\partial (uv)}{\partial x} = u \frac{\partial v}{\partial x} + v \frac{\partial u}{\partial x} ∂x∂(uv)=u∂x∂v+v∂x∂u。
- SS3(链式): ∂ g ( u ) ∂ x = ∂ g ( u ) ∂ u ∂ u ∂ x \frac{\partial g(u)}{\partial x} = \frac{\partial g(u)}{\partial u} \frac{\partial u}{\partial x} ∂x∂g(u)=∂u∂g(u)∂x∂u。
1.2向量对标量求导
∂ y ⃗ ∂ x = ( ∂ y 1 ∂ x ∂ y 2 ∂ x ∂ y 3 ∂ x ⋮ ∂ y n ∂ x ) n × 1 \frac{\partial \vec{y}}{\partial x} = \left( \begin{matrix} \frac{\partial y_1}{\partial x} \\ \frac{\partial y_2}{\partial x} \\ \frac{\partial y_3}{\partial x} \\ \vdots \\ \frac{\partial y_n}{\partial x} \\ \end{matrix} \right)_{n \times 1} ∂x∂y=⎝⎜⎜⎜⎜⎜⎛∂x∂y1∂x∂y2∂x∂y3⋮∂x∂yn⎠⎟⎟⎟⎟⎟⎞n×1
因此有
d
y
i
=
∂
y
i
∂
x
d
x
,
i
=
1
,
2
,
⋯
,
m
\mathrm{d}y_i = \frac{\partial y_i}{\partial x} \mathrm{d}x, i = 1,2,\cdots, m
dyi=∂x∂yidx,i=1,2,⋯,m。即
d
y
⃗
=
(
∂
y
1
∂
x
d
x
∂
y
2
∂
x
d
x
∂
y
3
∂
x
d
x
⋮
∂
y
n
∂
x
d
x
)
n
×
1
=
∂
y
⃗
∂
x
⊗
d
x
\mathrm{d}\vec{y} = \left( \begin{matrix} \frac{\partial y_1}{\partial x} \mathrm{d}x \\ \frac{\partial y_2}{\partial x} \mathrm{d}x \\ \frac{\partial y_3}{\partial x} \mathrm{d}x \\ \vdots \\ \frac{\partial y_n}{\partial x} \mathrm{d}x \\ \end{matrix} \right)_{n \times 1} = \frac{\partial \vec{y}}{\partial x} \otimes \mathrm{d}x
dy=⎝⎜⎜⎜⎜⎜⎛∂x∂y1dx∂x∂y2dx∂x∂y3dx⋮∂x∂yndx⎠⎟⎟⎟⎟⎟⎞n×1=∂x∂y⊗dx
性质
- VS1(常向量):对于 ∀ a ⃗ ∈ F n × 1 \forall \vec{a} \in \mathbb{F}^{n \times 1} ∀a∈Fn×1的常列向量, ∂ a ⃗ ∂ x = 0 ⃗ n × 1 \frac{\partial \vec{a}}{\partial x} = \vec{0}_{n \times 1} ∂x∂a=0n×1
- VS2(向量数乘):对于 ∀ u ⃗ ( x ) ∈ F n × 1 , a ∈ F 1 \forall \vec{u}(x) \in \mathbb{F}^{n \times 1}, a \in \mathbb{F}^{1} ∀u(x)∈Fn×1,a∈F1, 有 ∂ a u ⃗ ∂ x = a ∂ u ⃗ ∂ x \frac{\partial a\vec{u}}{\partial x} = a \frac{\partial \vec{u}}{\partial x} ∂x∂au=a∂x∂u。
- VS3(向量矩阵乘):对于 ∀ u ⃗ ( x ) ∈ F n × 1 , A ∈ F m × n \forall \vec{u}(x) \in \mathbb{F}^{n \times 1}, A \in \mathbb{F}^{m \times n} ∀u(x)∈Fn×1,A∈Fm×n, 有 ∂ A u ⃗ ∂ x = A ∂ u ⃗ ∂ x \frac{\partial A\vec{u}}{\partial x} = A \frac{\partial \vec{u}}{\partial x} ∂x∂Au=A∂x∂u。
- VS4(向量转置):对于 ∀ u ⃗ ( x ) ∈ F n × 1 \forall \vec{u}(x) \in \mathbb{F}^{n \times 1} ∀u(x)∈Fn×1, 有 ∂ ( u ⃗ T ) ∂ x = ( ∂ u ⃗ ∂ x ) T \frac{\partial ( \vec{u}^T ) }{\partial x} = \left( \frac{\partial \vec{u}}{\partial x} \right)^T ∂x∂(uT)=(∂x∂u)T。
- VS5(向量加法): 对于 ∀ u ⃗ ( x ) , v ⃗ ( x ) ∈ F n × 1 \forall \vec{u}(x), \vec{v}(x) \in \mathbb{F}^{n \times 1} ∀u(x),v(x)∈Fn×1, 有 ∂ ( u ⃗ + v ⃗ ) ∂ x = ∂ u ⃗ ∂ x + ∂ v ⃗ ∂ x \frac{\partial (\vec{u} + \vec{v})}{\partial x} = \frac{\partial \vec{u}}{\partial x} + \frac{\partial \vec{v}}{\partial x} ∂x∂(u+v)=∂x∂u+∂x∂v。
- VS6(链式): 对于 ∀ u ⃗ ( x ) ∈ F n × 1 , g ( u ⃗ ) ⃗ ∈ F m × 1 \forall \vec{u}(x) \in \mathbb{F}^{n \times 1} , \vec{g(\vec{u})} \in \mathbb{F}^{m \times 1} ∀u(x)∈Fn×1,g(u)∈Fm×1, 有 ∂ g ⃗ ∂ x = ( ∂ g ⃗ ∂ u ⃗ ) T ∂ u ⃗ ∂ x \frac{\partial \vec{g} }{\partial x} = \left( \frac{\partial \vec{g}}{\partial \vec{u}} \right)^T \frac{\partial \vec{u}}{\partial x} ∂x∂g=(∂u∂g)T∂x∂u。
简单对 VS3(向量矩阵乘) 性质做证明。
∂
(
A
u
⃗
)
∂
x
=
∂
∂
x
(
(
a
11
a
12
⋯
a
1
n
a
21
a
22
⋯
a
2
n
⋮
⋮
⋱
⋮
a
m
1
a
m
2
⋯
a
m
n
)
(
u
1
u
2
⋮
u
n
)
)
=
∂
∂
x
(
u
1
(
a
11
a
21
⋮
a
n
1
)
+
u
2
(
a
12
a
22
⋮
a
n
2
)
+
⋯
+
u
m
(
a
1
m
a
2
m
⋮
a
n
m
)
)
=
∂
u
1
∂
x
(
a
11
a
21
⋮
a
n
1
)
+
∂
u
2
∂
x
(
a
12
a
22
⋮
a
n
2
)
+
⋯
+
∂
u
m
∂
x
(
a
1
m
a
2
m
⋮
a
n
m
)
=
(
a
11
a
12
⋯
a
1
n
a
21
a
22
⋯
a
2
n
⋮
⋮
⋱
⋮
a
m
1
a
m
2
⋯
a
m
n
)
(
∂
u
1
∂
x
∂
u
2
∂
x
⋮
∂
u
n
∂
x
)
=
A
∂
u
⃗
∂
x
\begin{aligned} \frac{\partial \left( A\vec{u} \right)}{\partial x} = & \frac{\partial}{\partial x} \left( \left( \begin{matrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \\ \end{matrix} \right) \left( \begin{matrix} u_1 \\ u_2 \\ \vdots \\ u_n \\ \end{matrix} \right) \right) \\ = & \frac{\partial}{\partial x} \left( u_1 \left( \begin{matrix} a_{11} \\ a_{21} \\ \vdots \\ a_{n1} \\ \end{matrix} \right) + u_2 \left( \begin{matrix} a_{12} \\ a_{22} \\ \vdots \\ a_{n2} \\ \end{matrix} \right) + \cdots + u_m \left( \begin{matrix} a_{1m} \\ a_{2m} \\ \vdots \\ a_{nm} \\ \end{matrix} \right) \right) \\ = & \frac{\partial u_1}{\partial x} \left( \begin{matrix} a_{11} \\ a_{21} \\ \vdots \\ a_{n1} \\ \end{matrix} \right) + \frac{\partial u_2}{\partial x} \left( \begin{matrix} a_{12} \\ a_{22} \\ \vdots \\ a_{n2} \\ \end{matrix} \right) + \cdots + \frac{\partial u_m}{\partial x} \left( \begin{matrix} a_{1m} \\ a_{2m} \\ \vdots \\ a_{nm} \\ \end{matrix} \right) \\ = & \left( \begin{matrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \\ \end{matrix} \right) \left( \begin{matrix} \frac{\partial u_1}{\partial x} \\ \frac{\partial u_2}{\partial x} \\ \vdots \\ \frac{\partial u_n}{\partial x} \\ \end{matrix} \right) \\ = & A \frac{\partial \vec{u} }{\partial x} \end{aligned}
∂x∂(Au)=====∂x∂⎝⎜⎜⎜⎛⎝⎜⎜⎜⎛a11a21⋮am1a12a22⋮am2⋯⋯⋱⋯a1na2n⋮amn⎠⎟⎟⎟⎞⎝⎜⎜⎜⎛u1u2⋮un⎠⎟⎟⎟⎞⎠⎟⎟⎟⎞∂x∂⎝⎜⎜⎜⎛u1⎝⎜⎜⎜⎛a11a21⋮an1⎠⎟⎟⎟⎞+u2⎝⎜⎜⎜⎛a12a22⋮an2⎠⎟⎟⎟⎞+⋯+um⎝⎜⎜⎜⎛a1ma2m⋮anm⎠⎟⎟⎟⎞⎠⎟⎟⎟⎞∂x∂u1⎝⎜⎜⎜⎛a11a21⋮an1⎠⎟⎟⎟⎞+∂x∂u2⎝⎜⎜⎜⎛a12a22⋮an2⎠⎟⎟⎟⎞+⋯+∂x∂um⎝⎜⎜⎜⎛a1ma2m⋮anm⎠⎟⎟⎟⎞⎝⎜⎜⎜⎛a11a21⋮am1a12a22⋮am2⋯⋯⋱⋯a1na2n⋮amn⎠⎟⎟⎟⎞⎝⎜⎜⎜⎛∂x∂u1∂x∂u2⋮∂x∂un⎠⎟⎟⎟⎞A∂x∂u
下面简单证明 VS6(链式) 法则。首先,对于
∂
g
⃗
∂
u
⃗
\frac{\partial \vec{g}}{\partial \vec{u}}
∂u∂g属于向量对向量求导,有
∂
g
⃗
∂
u
⃗
=
(
∂
g
1
∂
u
⃗
,
∂
g
2
∂
u
⃗
,
∂
g
3
∂
u
⃗
,
⋯
,
∂
g
m
∂
u
⃗
)
=
(
∂
g
1
∂
u
1
∂
g
2
∂
u
1
∂
g
3
∂
x
1
⋯
∂
g
m
∂
u
1
∂
g
1
∂
u
2
∂
g
2
∂
u
2
∂
g
3
∂
x
2
⋯
∂
g
m
∂
u
2
∂
g
1
∂
u
3
∂
g
2
∂
u
3
∂
g
3
∂
x
3
⋯
∂
g
m
∂
u
3
⋮
⋮
⋮
⋱
⋮
∂
g
1
∂
u
n
∂
g
2
∂
u
n
∂
g
3
∂
u
n
⋯
∂
g
m
∂
u
n
)
n
×
m
\frac{\partial \vec{g}}{\partial \vec{u}} = \left( \begin{matrix} \frac{\partial g_1}{\partial \vec{u}}, & \frac{\partial g_2}{\partial \vec{u}}, & \frac{\partial g_3}{\partial \vec{u}}, & \cdots, & \frac{\partial g_m}{\partial \vec{u}} \end{matrix} \right) = \left( \begin{matrix} \frac{\partial g_1}{\partial u_1} & \frac{\partial g_2}{\partial u_1} & \frac{\partial g_3}{\partial x_1} & \cdots & \frac{\partial g_m}{\partial u_1} \\ \frac{\partial g_1}{\partial u_2} & \frac{\partial g_2}{\partial u_2} & \frac{\partial g_3}{\partial x_2} & \cdots & \frac{\partial g_m}{\partial u_2} \\ \frac{\partial g_1}{\partial u_3} & \frac{\partial g_2}{\partial u_3} & \frac{\partial g_3}{\partial x_3} & \cdots & \frac{\partial g_m}{\partial u_3} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \frac{\partial g_1}{\partial u_n} & \frac{\partial g_2}{\partial u_n} & \frac{\partial g_3}{\partial u_n} & \cdots & \frac{\partial g_m}{\partial u_n} \\ \end{matrix} \right)_{n \times m}
∂u∂g=(∂u∂g1,∂u∂g2,∂u∂g3,⋯,∂u∂gm)=⎝⎜⎜⎜⎜⎜⎜⎛∂u1∂g1∂u2∂g1∂u3∂g1⋮∂un∂g1∂u1∂g2∂u2∂g2∂u3∂g2⋮∂un∂g2∂x1∂g3∂x2∂g3∂x3∂g3⋮∂un∂g3⋯⋯⋯⋱⋯∂u1∂gm∂u2∂gm∂u3∂gm⋮∂un∂gm⎠⎟⎟⎟⎟⎟⎟⎞n×m
∂
u
⃗
∂
x
\frac{\partial \vec{u}}{\partial x}
∂x∂u属于向量对标量求导,有
∂
u
⃗
∂
x
=
(
∂
u
1
∂
x
∂
u
2
∂
x
∂
u
3
∂
x
⋮
∂
u
n
∂
x
)
n
×
1
\frac{\partial \vec{u}}{\partial x} = \left( \begin{matrix} \frac{\partial u_1}{\partial x} \\ \frac{\partial u_2}{\partial x} \\ \frac{\partial u_3}{\partial x} \\ \vdots \\ \frac{\partial u_n}{\partial x} \\ \end{matrix} \right)_{n \times 1}
∂x∂u=⎝⎜⎜⎜⎜⎜⎛∂x∂u1∂x∂u2∂x∂u3⋮∂x∂un⎠⎟⎟⎟⎟⎟⎞n×1
因此
R
H
S
=
(
∂
g
⃗
∂
u
⃗
)
T
∂
u
⃗
∂
x
=
(
∂
g
1
∂
u
1
∂
g
1
∂
u
2
∂
g
1
∂
x
3
⋯
∂
g
1
∂
u
m
∂
g
2
∂
u
1
∂
g
2
∂
u
2
∂
g
2
∂
x
3
⋯
∂
g
2
∂
u
m
∂
g
3
∂
u
1
∂
g
3
∂
u
2
∂
g
3
∂
x
3
⋯
∂
g
3
∂
u
m
⋮
⋮
⋮
⋱
⋮
∂
g
n
∂
u
1
∂
g
n
∂
u
2
∂
g
n
∂
u
3
⋯
∂
g
m
∂
u
n
)
m
×
n
(
∂
u
1
∂
x
∂
u
2
∂
x
∂
u
3
∂
x
⋮
∂
u
n
∂
x
)
n
×
1
=
(
∑
i
=
1
n
∂
g
1
∂
u
i
∂
u
i
∂
x
∑
i
=
1
n
∂
g
2
∂
u
i
∂
u
i
∂
x
∑
i
=
1
n
∂
g
3
∂
u
i
∂
u
i
∂
x
⋮
∑
i
=
1
n
∂
g
m
∂
u
i
∂
u
i
∂
x
)
m
×
1
\begin{aligned} RHS = \left( \frac{\partial \vec{g}}{\partial \vec{u}} \right)^T \frac{\partial \vec{u}}{\partial x} = & \left( \begin{matrix} \frac{\partial g_1}{\partial u_1} & \frac{\partial g_1}{\partial u_2} & \frac{\partial g_1}{\partial x_3} & \cdots & \frac{\partial g_1}{\partial u_m} \\ \frac{\partial g_2}{\partial u_1} & \frac{\partial g_2}{\partial u_2} & \frac{\partial g_2}{\partial x_3} & \cdots & \frac{\partial g_2}{\partial u_m} \\ \frac{\partial g_3}{\partial u_1} & \frac{\partial g_3}{\partial u_2} & \frac{\partial g_3}{\partial x_3} & \cdots & \frac{\partial g_3}{\partial u_m} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \frac{\partial g_n}{\partial u_1} & \frac{\partial g_n}{\partial u_2} & \frac{\partial g_n}{\partial u_3} & \cdots & \frac{\partial g_m}{\partial u_n} \\ \end{matrix} \right)_{m \times n} \left( \begin{matrix} \frac{\partial u_1}{\partial x} \\ \frac{\partial u_2}{\partial x} \\ \frac{\partial u_3}{\partial x} \\ \vdots \\ \frac{\partial u_n}{\partial x} \\ \end{matrix} \right)_{n \times 1} \\ = & \left( \begin{matrix} \sum_{i=1}^{n} \frac{\partial g_1}{\partial u_i} \frac{\partial u_i}{\partial x} \\ \sum_{i=1}^{n} \frac{\partial g_2}{\partial u_i} \frac{\partial u_i}{\partial x} \\ \sum_{i=1}^{n} \frac{\partial g_3}{\partial u_i} \frac{\partial u_i}{\partial x} \\ \vdots \\ \sum_{i=1}^{n} \frac{\partial g_m}{\partial u_i} \frac{\partial u_i}{\partial x} \\ \end{matrix} \right)_{m \times 1} \end{aligned}
RHS=(∂u∂g)T∂x∂u==⎝⎜⎜⎜⎜⎜⎜⎛∂u1∂g1∂u1∂g2∂u1∂g3⋮∂u1∂gn∂u2∂g1∂u2∂g2∂u2∂g3⋮∂u2∂gn∂x3∂g1∂x3∂g2∂x3∂g3⋮∂u3∂gn⋯⋯⋯⋱⋯∂um∂g1∂um∂g2∂um∂g3⋮∂un∂gm⎠⎟⎟⎟⎟⎟⎟⎞m×n⎝⎜⎜⎜⎜⎜⎛∂x∂u1∂x∂u2∂x∂u3⋮∂x∂un⎠⎟⎟⎟⎟⎟⎞n×1⎝⎜⎜⎜⎜⎜⎜⎛∑i=1n∂ui∂g1∂x∂ui∑i=1n∂ui∂g2∂x∂ui∑i=1n∂ui∂g3∂x∂ui⋮∑i=1n∂ui∂gm∂x∂ui⎠⎟⎟⎟⎟⎟⎟⎞m×1
如果将
∂
g
⃗
∂
x
\frac{\partial \vec{g} }{\partial x}
∂x∂g看成向量对标量求导,则
L
H
S
=
∂
g
⃗
∂
x
=
(
∂
g
1
∂
x
∂
g
2
∂
x
∂
g
3
∂
x
⋮
∂
g
m
∂
x
)
m
×
1
=
(
∑
i
=
1
n
∂
g
1
∂
u
i
∂
u
i
∂
x
∑
i
=
1
n
∂
g
2
∂
u
i
∂
u
i
∂
x
∑
i
=
1
n
∂
g
3
∂
u
i
∂
u
i
∂
x
⋮
∑
i
=
1
n
∂
g
m
∂
u
i
∂
u
i
∂
x
)
m
×
1
=
R
H
S
.
LHS = \frac{\partial \vec{g} }{\partial x} =\left( \begin{matrix} \frac{\partial g_1}{\partial x} \\ \frac{\partial g_2}{\partial x} \\ \frac{\partial g_3}{\partial x} \\ \vdots \\ \frac{\partial g_m}{\partial x} \\ \end{matrix} \right)_{m \times 1} =\left( \begin{matrix} \sum_{i=1}^{n} \frac{\partial g_1}{\partial u_i} \frac{\partial u_i}{\partial x} \\ \sum_{i=1}^{n} \frac{\partial g_2}{\partial u_i} \frac{\partial u_i}{\partial x} \\ \sum_{i=1}^{n} \frac{\partial g_3}{\partial u_i} \frac{\partial u_i}{\partial x} \\ \vdots \\ \sum_{i=1}^{n} \frac{\partial g_m}{\partial u_i} \frac{\partial u_i}{\partial x} \\ \end{matrix} \right)_{m \times 1} = RHS. % \qed
LHS=∂x∂g=⎝⎜⎜⎜⎜⎜⎛∂x∂g1∂x∂g2∂x∂g3⋮∂x∂gm⎠⎟⎟⎟⎟⎟⎞m×1=⎝⎜⎜⎜⎜⎜⎜⎛∑i=1n∂ui∂g1∂x∂ui∑i=1n∂ui∂g2∂x∂ui∑i=1n∂ui∂g3∂x∂ui⋮∑i=1n∂ui∂gm∂x∂ui⎠⎟⎟⎟⎟⎟⎟⎞m×1=RHS.
1.3矩阵对标量求导
∂
Y
∂
x
=
(
∂
y
⃗
1
∂
x
,
∂
y
⃗
2
∂
x
,
⋯
,
∂
y
⃗
q
∂
x
)
=
(
∂
y
11
∂
x
∂
y
12
∂
x
∂
y
13
∂
x
⋯
∂
y
1
q
∂
x
∂
y
21
∂
x
∂
y
22
∂
x
∂
y
23
∂
x
⋯
∂
y
2
q
∂
x
∂
y
31
∂
x
∂
y
32
∂
x
∂
y
33
∂
x
⋯
∂
y
3
q
∂
x
⋮
⋮
⋮
⋱
⋮
∂
y
p
1
∂
x
∂
y
p
2
∂
x
∂
y
p
3
∂
x
⋯
∂
y
p
q
∂
x
)
p
×
q
\frac{\partial Y}{\partial x} = \left( \begin{matrix} \frac{\partial \vec{y}_1}{\partial x}, & \frac{\partial \vec{y}_2}{\partial x}, & \cdots, & \frac{\partial \vec{y}_q}{\partial x} \end{matrix} \right) = \left( \begin{matrix} \frac{\partial y_{11}}{\partial x} & \frac{\partial y_{12}}{\partial x} & \frac{\partial y_{13}}{\partial x} & \cdots & \frac{\partial y_{1q}}{\partial x} \\ \frac{\partial y_{21}}{\partial x} & \frac{\partial y_{22}}{\partial x} & \frac{\partial y_{23}}{\partial x} & \cdots & \frac{\partial y_{2q}}{\partial x} \\ \frac{\partial y_{31}}{\partial x} & \frac{\partial y_{32}}{\partial x} & \frac{\partial y_{33}}{\partial x} & \cdots & \frac{\partial y_{3q}}{\partial x} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y_{p1}}{\partial x} & \frac{\partial y_{p2}}{\partial x} & \frac{\partial y_{p3}}{\partial x} & \cdots & \frac{\partial y_{pq}}{\partial x} \\ \end{matrix} \right)_{p \times q}
∂x∂Y=(∂x∂y1,∂x∂y2,⋯,∂x∂yq)=⎝⎜⎜⎜⎜⎜⎜⎛∂x∂y11∂x∂y21∂x∂y31⋮∂x∂yp1∂x∂y12∂x∂y22∂x∂y32⋮∂x∂yp2∂x∂y13∂x∂y23∂x∂y33⋮∂x∂yp3⋯⋯⋯⋱⋯∂x∂y1q∂x∂y2q∂x∂y3q⋮∂x∂ypq⎠⎟⎟⎟⎟⎟⎟⎞p×q
因此有
d
y
i
j
=
∂
y
i
j
∂
x
d
x
,
i
=
1
,
2
,
⋯
,
p
,
j
=
1
,
2
,
⋯
,
q
\mathrm{d}y_{ij} = \frac{\partial y_{ij}}{\partial x} \mathrm{d}x, i = 1,2,\cdots, p, j = 1,2,\cdots, q
dyij=∂x∂yijdx,i=1,2,⋯,p,j=1,2,⋯,q。即
d
Y
=
(
∂
y
⃗
1
∂
x
d
x
,
∂
y
⃗
2
∂
x
d
x
,
⋯
,
∂
y
⃗
q
∂
x
d
x
)
=
(
∂
y
11
∂
x
d
x
∂
y
12
∂
x
d
x
∂
y
13
∂
x
d
x
⋯
∂
y
1
q
∂
x
d
x
∂
y
21
∂
x
d
x
∂
y
22
∂
x
d
x
∂
y
23
∂
x
d
x
⋯
∂
y
2
q
∂
x
d
x
∂
y
31
∂
x
d
x
∂
y
32
∂
x
d
x
∂
y
33
∂
x
d
x
⋯
∂
y
3
q
∂
x
d
x
⋮
⋮
⋮
⋱
⋮
∂
y
p
1
∂
x
d
x
∂
y
p
2
∂
x
d
x
∂
y
p
3
∂
x
d
x
⋯
∂
y
p
q
∂
x
d
x
)
p
×
q
=
∂
Y
∂
x
⊗
d
x
\mathrm{d}Y = \left( \begin{matrix} \frac{\partial \vec{y}_1}{\partial x} \mathrm{d}x, & \frac{\partial \vec{y}_2}{\partial x} \mathrm{d}x, & \cdots, & \frac{\partial \vec{y}_q}{\partial x} \mathrm{d}x \end{matrix} \right) = \left( \begin{matrix} \frac{\partial y_{11}}{\partial x}\mathrm{d}x & \frac{\partial y_{12}}{\partial x}\mathrm{d}x & \frac{\partial y_{13}}{\partial x}\mathrm{d}x & \cdots & \frac{\partial y_{1q}}{\partial x}\mathrm{d}x \\ \frac{\partial y_{21}}{\partial x}\mathrm{d}x & \frac{\partial y_{22}}{\partial x}\mathrm{d}x & \frac{\partial y_{23}}{\partial x}\mathrm{d}x & \cdots & \frac{\partial y_{2q}}{\partial x}\mathrm{d}x \\ \frac{\partial y_{31}}{\partial x}\mathrm{d}x & \frac{\partial y_{32}}{\partial x}\mathrm{d}x & \frac{\partial y_{33}}{\partial x}\mathrm{d}x & \cdots & \frac{\partial y_{3q}}{\partial x}\mathrm{d}x \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y_{p1}}{\partial x}\mathrm{d}x & \frac{\partial y_{p2}}{\partial x}\mathrm{d}x & \frac{\partial y_{p3}}{\partial x}\mathrm{d}x & \cdots & \frac{\partial y_{pq}}{\partial x}\mathrm{d}x \\ \end{matrix} \right)_{p \times q} = \frac{\partial Y}{\partial x} \otimes \mathrm{d}x
dY=(∂x∂y1dx,∂x∂y2dx,⋯,∂x∂yqdx)=⎝⎜⎜⎜⎜⎜⎜⎛∂x∂y11dx∂x∂y21dx∂x∂y31dx⋮∂x∂yp1dx∂x∂y12dx∂x∂y22dx∂x∂y32dx⋮∂x∂yp2dx∂x∂y13dx∂x∂y23dx∂x∂y33dx⋮∂x∂yp3dx⋯⋯⋯⋱⋯∂x∂y1qdx∂x∂y2qdx∂x∂y3qdx⋮∂x∂ypqdx⎠⎟⎟⎟⎟⎟⎟⎞p×q=∂x∂Y⊗dx
性质
- MS1(矩阵数乘):对于 ∀ U ( x ) ∈ F m × n \forall U(x) \in \mathbb{F}^{m \times n} ∀U(x)∈Fm×n, 有 ∂ ( a U ) ∂ x = a ∂ U ∂ x \frac{\partial (aU)}{\partial x} = a \frac{\partial U}{\partial x} ∂x∂(aU)=a∂x∂U。
- MS2(矩阵乘):对于 ∀ U ( x ) ∈ F m × n , A ∈ F r × m , B ∈ F n × s \forall U(x) \in \mathbb{F}^{m \times n}, A \in \mathbb{F}^{r \times m}, B \in \mathbb{F}^{n \times s} ∀U(x)∈Fm×n,A∈Fr×m,B∈Fn×s, 有 ∂ ( A U B ) ∂ x = A ∂ U ∂ x B \frac{\partial (AUB)}{\partial x} = A \frac{\partial U}{\partial x} B ∂x∂(AUB)=A∂x∂UB。
- MS3(线性):对于 ∀ U ( x ) , V ( x ) ∈ F m × n \forall U(x),V(x) \in \mathbb{F}^{m \times n} ∀U(x),V(x)∈Fm×n, 有 ∂ ( U + V ) ∂ x = ∂ U ∂ x + ∂ V ∂ x \frac{\partial (U + V)}{\partial x} = \frac{\partial U}{\partial x} +\frac{\partial V}{\partial x} ∂x∂(U+V)=∂x∂U+∂x∂V。
- MS4(分部):对于 ∀ U ( x ) ∈ F m × n , V ( x ) ∈ F n × l \forall U(x) \in \mathbb{F}^{m \times n}, V(x) \in \mathbb{F}^{n \times l} ∀U(x)∈Fm×n,V(x)∈Fn×l, 有 ∂ ( U V ) ∂ x = ∂ U ∂ x V + U ∂ V ∂ x \frac{\partial (UV)}{\partial x} = \frac{\partial U}{\partial x} V + U \frac{\partial V}{\partial x} ∂x∂(UV)=∂x∂UV+U∂x∂V。
先证MS4(分部)。
为了书写上的方便,记
∂
Y
∂
x
=
(
∂
y
i
j
∂
x
)
p
×
q
\frac{\partial Y}{\partial x} = \left( \frac{\partial y_{ij}}{\partial x} \right)_{p \times q}
∂x∂Y=(∂x∂yij)p×q。
∂
(
U
V
)
∂
x
=
(
∑
k
=
1
n
∂
(
u
i
k
v
k
j
)
∂
x
)
m
×
l
=
(
∑
k
=
1
n
(
∂
u
i
k
∂
x
v
k
j
+
u
i
k
∂
v
k
j
∂
x
)
)
m
×
l
=
(
∑
k
=
1
n
∂
u
i
k
∂
x
v
k
j
)
m
×
l
+
(
∑
k
=
1
n
u
i
k
∂
v
k
j
∂
x
)
m
×
l
=
∂
U
∂
x
V
+
U
∂
V
∂
x
.
\begin{aligned} \frac{\partial (UV)}{\partial x} = & \left( \sum_{k=1}^{n} \frac{\partial \left( u_{ik} v_{kj} \right)}{\partial x} \right)_{m \times l} \\ = & \left( \sum_{k=1}^{n} \left( \frac{\partial u_{ik}}{\partial x}v_{kj} + u_{ik}\frac{\partial v_{kj}}{\partial x} \right) \right)_{m \times l} \\ = & \left( \sum_{k=1}^{n} \frac{\partial u_{ik}}{\partial x}v_{kj} \right)_{m \times l} + \left( \sum_{k=1}^{n} u_{ik}\frac{\partial v_{kj}}{\partial x} \right)_{m \times l}\\ = & \frac{\partial U}{\partial x} V + U \frac{\partial V}{\partial x}. \end{aligned}
∂x∂(UV)====(k=1∑n∂x∂(uikvkj))m×l(k=1∑n(∂x∂uikvkj+uik∂x∂vkj))m×l(k=1∑n∂x∂uikvkj)m×l+(k=1∑nuik∂x∂vkj)m×l∂x∂UV+U∂x∂V.
根据 MS4(分部) 再证 MS2(矩阵乘) 。
∂
(
A
U
B
)
∂
x
=
∂
A
∂
x
U
B
+
A
(
∂
U
B
∂
x
)
=
∂
A
∂
x
U
B
+
A
(
∂
U
∂
x
B
+
U
∂
B
∂
x
)
=
0
U
B
+
A
∂
U
∂
x
B
+
A
U
0
=
A
∂
U
∂
x
B
.
\begin{aligned} \frac{\partial (AUB)}{\partial x} = & \frac{\partial A}{\partial x}UB + A \left( \frac{\partial UB}{\partial x} \right) \\ = & \frac{\partial A}{\partial x}UB + A \left( \frac{\partial U}{\partial x}B + U\frac{\partial B}{\partial x} \right)\\ = & 0UB + A\frac{\partial U}{\partial x}B + AU0 \\ = & A\frac{\partial U}{\partial x}B. \end{aligned}
∂x∂(AUB)====∂x∂AUB+A(∂x∂UB)∂x∂AUB+A(∂x∂UB+U∂x∂B)0UB+A∂x∂UB+AU0A∂x∂UB.
2.对向量求导
2.1标量对向量求导
∂
y
∂
x
⃗
=
(
∂
y
∂
x
1
∂
y
∂
x
2
∂
y
∂
x
3
⋮
∂
y
∂
x
n
)
n
×
1
\frac{\partial y}{\partial \vec{x}} = \left( \begin{matrix} \frac{\partial y}{\partial x_1} \\ \frac{\partial y}{\partial x_2} \\ \frac{\partial y}{\partial x_3} \\ \vdots \\ \frac{\partial y}{\partial x_n} \\ \end{matrix} \right)_{n \times 1}
∂x∂y=⎝⎜⎜⎜⎜⎜⎜⎛∂x1∂y∂x2∂y∂x3∂y⋮∂xn∂y⎠⎟⎟⎟⎟⎟⎟⎞n×1
上式俗称梯度。
根据全微分公式:
d
y
=
∑
i
=
1
n
∂
y
∂
x
i
d
x
i
=
(
∂
y
∂
x
1
,
∂
y
∂
x
2
,
∂
y
∂
x
3
,
⋯
,
∂
y
∂
x
n
)
×
(
d
x
1
d
x
2
d
x
3
⋮
d
x
n
)
=
(
∂
y
∂
x
⃗
)
T
d
x
⃗
\mathrm{d}y = \sum_{i=1}^{n} \frac{\partial y}{\partial x_i} \mathrm{d}x_i = \left( \begin{matrix} \frac{\partial y}{\partial x_1}, & \frac{\partial y}{\partial x_2}, & \frac{\partial y}{\partial x_3}, & \cdots, & \frac{\partial y}{\partial x_n} \end{matrix} \right) \times \left( \begin{matrix} \mathrm{d}x_1 \\ \mathrm{d}x_2 \\ \mathrm{d}x_3 \\ \vdots \\ \mathrm{d}x_n \end{matrix} \right) = \left( \frac{\partial y}{\partial \vec{x}} \right)^T \mathrm{d}\vec{x}
dy=i=1∑n∂xi∂ydxi=(∂x1∂y,∂x2∂y,∂x3∂y,⋯,∂xn∂y)×⎝⎜⎜⎜⎜⎜⎛dx1dx2dx3⋮dxn⎠⎟⎟⎟⎟⎟⎞=(∂x∂y)Tdx
性质
- SV1(数乘) :对于 ∀ u ( x ) , a ∈ F \forall u(x), a \in \mathbb{F} ∀u(x),a∈F, 有 ∂ ( a u ) ∂ x ⃗ = a ∂ u ∂ x ⃗ \frac{\partial (au)}{\partial \vec{x}} = a \frac{\partial u}{\partial \vec{x}} ∂x∂(au)=a∂x∂u。
- SV2(线性):对于 ∀ u ( x ) , v ( x ) ∈ F \forall u(x),v(x) \in \mathbb{F} ∀u(x),v(x)∈F, 有 ∂ ( u + v ) ∂ x ⃗ = ∂ u ∂ x ⃗ + ∂ v ∂ x ⃗ \frac{\partial (u + v)}{\partial \vec{x}} = \frac{\partial u}{\partial \vec{x}} + \frac{\partial v}{\partial \vec{x}} ∂x∂(u+v)=∂x∂u+∂x∂v。
- SV3(分部):对于 ∀ u ( x ) , v ( x ) ∈ F \forall u(x),v(x) \in \mathbb{F} ∀u(x),v(x)∈F, 有 ∂ ( u v ) ∂ x = ∂ u ∂ x ⃗ v + u ∂ v ∂ x ⃗ \frac{\partial (uv)}{\partial x} = \frac{\partial u}{\partial \vec{x}} v + u \frac{\partial v}{\partial \vec{x}} ∂x∂(uv)=∂x∂uv+u∂x∂v。
- SV4(链式):对于 ∀ u ( x ) , g ( u ) ∈ F \forall u(x),g(u) \in \mathbb{F} ∀u(x),g(u)∈F, 有 ∂ g ( u ) ∂ x ⃗ = ∂ g ( u ) ∂ u ∂ u ∂ x ⃗ \frac{\partial g(u)}{\partial \vec{x}} = \frac{\partial g(u)}{\partial u}\frac{\partial u}{\partial \vec{x}} ∂x∂g(u)=∂u∂g(u)∂x∂u。
- SV5:对于 ∀ u ⃗ ( x ⃗ ) , v ⃗ ( x ⃗ ) ∈ F m \forall \vec{u}(\vec{x}), \vec{v}(\vec{x}) \in \mathbb{F}^{m} ∀u(x),v(x)∈Fm, 有 ∂ ( u ⃗ T v ⃗ ) ∂ x ⃗ = ∂ v ⃗ ∂ x ⃗ u ⃗ + ∂ u ⃗ ∂ x ⃗ v ⃗ \frac{\partial (\vec{u}^T \vec{v})}{\partial \vec{x}} = \frac{\partial \vec{v}}{\partial \vec{x}} \vec{u} + \frac{\partial \vec{u}}{\partial \vec{x}} \vec{v} ∂x∂(uTv)=∂x∂vu+∂x∂uv。
- SV6:对于 ∀ A ∈ F m × n , u ⃗ ( x ⃗ ) ∈ F m , v ⃗ ( x ⃗ ) ∈ F n \forall A \in \mathbb{F}^{m \times n}, \vec{u}(\vec{x}) \in \mathbb{F}^{m}, \vec{v}(\vec{x}) \in \mathbb{F}^{n} ∀A∈Fm×n,u(x)∈Fm,v(x)∈Fn,有 ∂ ( u ⃗ T A v ⃗ ) ∂ x ⃗ = ∂ v ⃗ ∂ x ⃗ A T u ⃗ + ∂ u ⃗ ∂ x ⃗ A v ⃗ \frac{\partial \left( \vec{u}^T A \vec{v} \right)}{\partial \vec{x}} = \frac{\partial \vec{v}}{\partial \vec{x}} A^T \vec{u} + \frac{\partial \vec{u} }{\partial \vec{x}} A \vec{v} ∂x∂(uTAv)=∂x∂vATu+∂x∂uAv。
首先对SV5做简要证明。左右两边都是
n
×
1
n \times 1
n×1向量,只需证每行相等即可。对于第
i
i
i行,
L
H
S
i
=
∂
u
⃗
T
v
⃗
∂
x
i
=
∂
∂
x
i
(
∑
j
=
1
m
u
i
v
i
)
=
∑
j
=
1
m
(
∂
u
i
v
i
∂
x
i
)
=
∑
j
=
1
m
(
∂
v
i
∂
x
i
u
i
+
∂
u
i
∂
x
i
v
i
)
.
\begin{aligned} LHS_i = & \frac{\partial \vec{u}^T \vec{v}}{\partial x_i} = \frac{\partial }{\partial x_i} \left( \sum_{j=1}^{m} u_i v_i \right) \\ = & \sum_{j=1}^{m} \left( \frac{\partial u_i v_i}{\partial x_i} \right) \\ = & \sum_{j=1}^{m} \left( \frac{\partial v_i}{\partial x_i}u_i + \frac{\partial u_i}{\partial x_i}v_i \right). \\ \end{aligned}
LHSi===∂xi∂uTv=∂xi∂(j=1∑muivi)j=1∑m(∂xi∂uivi)j=1∑m(∂xi∂viui+∂xi∂uivi).
根据向量对向量求导,右边第
i
i
i行为
R
H
S
i
=
(
∂
v
1
∂
x
i
,
∂
v
2
∂
x
i
,
∂
v
3
∂
x
i
,
⋯
,
∂
v
m
∂
x
i
)
(
u
1
u
2
u
3
⋮
u
m
)
+
(
∂
u
1
∂
x
i
,
∂
u
2
∂
x
i
,
∂
u
3
∂
x
i
,
⋯
,
∂
u
m
∂
x
i
)
(
v
1
v
2
v
3
⋮
v
m
)
=
∑
j
=
1
m
(
∂
v
i
∂
x
i
u
i
)
+
∑
j
=
1
m
(
∂
u
i
∂
x
i
v
i
)
=
L
H
S
i
.
\begin{aligned} RHS_i = & \left( \begin{matrix} \frac{\partial v_1}{\partial x_i}, & \frac{\partial v_2}{\partial x_i}, & \frac{\partial v_3}{\partial x_i}, & \cdots, & \frac{\partial v_m}{\partial x_i} \end{matrix} \right) \left( \begin{matrix} u_1 \\ u_2 \\ u_3 \\ \vdots \\ u_m \end{matrix} \right) + \left( \begin{matrix} \frac{\partial u_1}{\partial x_i}, & \frac{\partial u_2}{\partial x_i}, & \frac{\partial u_3}{\partial x_i}, & \cdots, & \frac{\partial u_m}{\partial x_i} \end{matrix} \right) \left( \begin{matrix} v_1 \\ v_2 \\ v_3 \\ \vdots \\ v_m \end{matrix} \right) \\ = & \sum_{j=1}^{m} \left( \frac{\partial v_i}{\partial x_i}u_i \right) + \sum_{j=1}^{m} \left( \frac{\partial u_i}{\partial x_i}v_i \right) \\ = & LHS_i. \end{aligned}
RHSi===(∂xi∂v1,∂xi∂v2,∂xi∂v3,⋯,∂xi∂vm)⎝⎜⎜⎜⎜⎜⎛u1u2u3⋮um⎠⎟⎟⎟⎟⎟⎞+(∂xi∂u1,∂xi∂u2,∂xi∂u3,⋯,∂xi∂um)⎝⎜⎜⎜⎜⎜⎛v1v2v3⋮vm⎠⎟⎟⎟⎟⎟⎞j=1∑m(∂xi∂viui)+j=1∑m(∂xi∂uivi)LHSi.
关于SV6,证明如下:
∂
(
u
⃗
T
A
v
⃗
)
∂
x
⃗
=
S
V
5
∂
A
v
⃗
∂
x
⃗
u
⃗
+
∂
u
⃗
∂
x
⃗
A
v
⃗
=
V
V
3
∂
v
⃗
∂
x
⃗
A
T
u
⃗
+
∂
u
⃗
∂
x
⃗
A
v
⃗
.
\frac{\partial \left( \vec{u}^T A \vec{v} \right)}{\partial \vec{x}} \overset{SV5}{=} \frac{\partial A\vec{v}}{\partial \vec{x}} \vec{u} + \frac{\partial \vec{u}}{\partial \vec{x}} A\vec{v} \overset{VV3}{=} \frac{\partial \vec{v}}{\partial \vec{x}} A^T \vec{u} + \frac{\partial \vec{u} }{\partial \vec{x}} A \vec{v}.
∂x∂(uTAv)=SV5∂x∂Avu+∂x∂uAv=VV3∂x∂vATu+∂x∂uAv.
2.2向量对向量求导
∂
y
i
∂
x
⃗
=
(
∂
y
i
∂
x
1
∂
y
i
∂
x
2
∂
y
i
∂
x
3
⋮
∂
y
i
∂
x
n
)
n
×
1
,
i
=
1
,
2
,
⋯
,
m
.
\frac{\partial y_i}{\partial \vec{x}} = \left( \begin{matrix} \frac{\partial y_i}{\partial x_1} \\ \frac{\partial y_i}{\partial x_2} \\ \frac{\partial y_i}{\partial x_3} \\ \vdots \\ \frac{\partial y_i}{\partial x_n} \end{matrix} \right)_{n \times 1} , i = 1, 2, \cdots, m.
∂x∂yi=⎝⎜⎜⎜⎜⎜⎜⎛∂x1∂yi∂x2∂yi∂x3∂yi⋮∂xn∂yi⎠⎟⎟⎟⎟⎟⎟⎞n×1,i=1,2,⋯,m.
因此
∂
y
⃗
∂
x
⃗
=
(
∂
y
1
∂
x
⃗
,
∂
y
2
∂
x
⃗
,
∂
y
3
∂
x
⃗
,
⋯
,
∂
y
m
∂
x
⃗
)
=
(
∂
y
1
∂
x
1
∂
y
2
∂
x
1
∂
y
3
∂
x
1
⋯
∂
y
m
∂
x
1
∂
y
1
∂
x
2
∂
y
2
∂
x
2
∂
y
3
∂
x
2
⋯
∂
y
m
∂
x
2
∂
y
1
∂
x
3
∂
y
2
∂
x
3
∂
y
3
∂
x
3
⋯
∂
y
m
∂
x
3
⋮
⋮
⋮
⋱
⋮
∂
y
1
∂
x
n
∂
y
2
∂
x
n
∂
y
3
∂
x
n
⋯
∂
y
m
∂
x
n
)
n
×
m
\frac{\partial \vec{y}}{\partial \vec{x}} = \left( \begin{matrix} \frac{\partial y_1}{\partial \vec{x}}, & \frac{\partial y_2}{\partial \vec{x}}, & \frac{\partial y_3}{\partial \vec{x}}, & \cdots, & \frac{\partial y_m}{\partial \vec{x}} \end{matrix} \right) = \left( \begin{matrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_2}{\partial x_1} & \frac{\partial y_3}{\partial x_1} & \cdots & \frac{\partial y_m}{\partial x_1} \\ \frac{\partial y_1}{\partial x_2} & \frac{\partial y_2}{\partial x_2} & \frac{\partial y_3}{\partial x_2} & \cdots & \frac{\partial y_m}{\partial x_2} \\ \frac{\partial y_1}{\partial x_3} & \frac{\partial y_2}{\partial x_3} & \frac{\partial y_3}{\partial x_3} & \cdots & \frac{\partial y_m}{\partial x_3} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y_1}{\partial x_n} & \frac{\partial y_2}{\partial x_n} & \frac{\partial y_3}{\partial x_n} & \cdots & \frac{\partial y_m}{\partial x_n} \\ \end{matrix} \right)_{n \times m}
∂x∂y=(∂x∂y1,∂x∂y2,∂x∂y3,⋯,∂x∂ym)=⎝⎜⎜⎜⎜⎜⎜⎛∂x1∂y1∂x2∂y1∂x3∂y1⋮∂xn∂y1∂x1∂y2∂x2∂y2∂x3∂y2⋮∂xn∂y2∂x1∂y3∂x2∂y3∂x3∂y3⋮∂xn∂y3⋯⋯⋯⋱⋯∂x1∂ym∂x2∂ym∂x3∂ym⋮∂xn∂ym⎠⎟⎟⎟⎟⎟⎟⎞n×m
由上面的标量对向量求导,可知
d
y
i
=
(
∂
y
i
∂
x
⃗
)
T
d
x
⃗
\mathrm{d}y_i = \left( \frac{\partial y_i}{\partial \vec{x}} \right)^T \mathrm{d}\vec{x}
dyi=(∂x∂yi)Tdx.因此
d
y
⃗
=
(
d
y
1
d
y
2
d
y
3
⋮
d
y
m
)
=
[
(
∂
y
1
∂
x
⃗
)
T
d
x
⃗
(
∂
y
2
∂
x
⃗
)
T
d
x
⃗
(
∂
y
3
∂
x
⃗
)
T
d
x
⃗
⋮
(
∂
y
m
∂
x
⃗
)
T
d
x
⃗
]
=
[
(
∂
y
1
∂
x
⃗
)
T
(
∂
y
2
∂
x
⃗
)
T
(
∂
y
3
∂
x
⃗
)
T
⋮
(
∂
y
m
∂
x
⃗
)
T
]
d
x
⃗
=
(
∂
y
1
∂
x
1
∂
y
2
∂
x
1
∂
y
3
∂
x
1
⋯
∂
y
m
∂
x
1
∂
y
1
∂
x
2
∂
y
2
∂
x
2
∂
y
3
∂
x
2
⋯
∂
y
m
∂
x
2
∂
y
1
∂
x
3
∂
y
2
∂
x
3
∂
y
3
∂
x
3
⋯
∂
y
m
∂
x
3
⋮
⋮
⋮
⋱
⋮
∂
y
1
∂
x
n
∂
y
2
∂
x
n
∂
y
3
∂
x
n
⋯
∂
y
m
∂
x
n
)
T
d
x
⃗
=
(
∂
y
⃗
∂
x
⃗
)
T
d
x
⃗
\mathrm{d} \vec{y} = \left( \begin{matrix} \mathrm{d}y_1 \\ \mathrm{d}y_2 \\ \mathrm{d}y_3 \\ \vdots \\ \mathrm{d}y_m \end{matrix} \right) = \left[ \begin{matrix} \left( \frac{\partial y_1}{\partial \vec{x}} \right)^T \mathrm{d}\vec{x} \\ \left( \frac{\partial y_2}{\partial \vec{x}} \right)^T \mathrm{d}\vec{x} \\ \left( \frac{\partial y_3}{\partial \vec{x}} \right)^T \mathrm{d}\vec{x} \\ \vdots \\ \left( \frac{\partial y_m}{\partial \vec{x}} \right)^T \mathrm{d}\vec{x} \end{matrix} \right] = \left[ \begin{matrix} \left( \frac{\partial y_1}{\partial \vec{x}} \right)^T \\ \left( \frac{\partial y_2}{\partial \vec{x}} \right)^T \\ \left( \frac{\partial y_3}{\partial \vec{x}} \right)^T \\ \vdots \\ \left( \frac{\partial y_m}{\partial \vec{x}} \right)^T \end{matrix} \right] \mathrm{d}\vec{x} = \left( \begin{matrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_2}{\partial x_1} & \frac{\partial y_3}{\partial x_1} & \cdots & \frac{\partial y_m}{\partial x_1} \\ \frac{\partial y_1}{\partial x_2} & \frac{\partial y_2}{\partial x_2} & \frac{\partial y_3}{\partial x_2} & \cdots & \frac{\partial y_m}{\partial x_2} \\ \frac{\partial y_1}{\partial x_3} & \frac{\partial y_2}{\partial x_3} & \frac{\partial y_3}{\partial x_3} & \cdots & \frac{\partial y_m}{\partial x_3} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y_1}{\partial x_n} & \frac{\partial y_2}{\partial x_n} & \frac{\partial y_3}{\partial x_n} & \cdots & \frac{\partial y_m}{\partial x_n} \\ \end{matrix} \right)^{T} \mathrm{d}\vec{x} = \left( \frac{\partial \vec{y}}{\partial \vec{x}} \right)^T \mathrm{d}\vec{x}
dy=⎝⎜⎜⎜⎜⎜⎛dy1dy2dy3⋮dym⎠⎟⎟⎟⎟⎟⎞=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡(∂x∂y1)Tdx(∂x∂y2)Tdx(∂x∂y3)Tdx⋮(∂x∂ym)Tdx⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡(∂x∂y1)T(∂x∂y2)T(∂x∂y3)T⋮(∂x∂ym)T⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤dx=⎝⎜⎜⎜⎜⎜⎜⎛∂x1∂y1∂x2∂y1∂x3∂y1⋮∂xn∂y1∂x1∂y2∂x2∂y2∂x3∂y2⋮∂xn∂y2∂x1∂y3∂x2∂y3∂x3∂y3⋮∂xn∂y3⋯⋯⋯⋱⋯∂x1∂ym∂x2∂ym∂x3∂ym⋮∂xn∂ym⎠⎟⎟⎟⎟⎟⎟⎞Tdx=(∂x∂y)Tdx
- VV1(数乘):对于 ∀ u ⃗ ( x ⃗ ) ∈ F m , a ( x ⃗ ) ∈ F \forall \vec{u}(\vec{x}) \in \mathbb{F}^{m}, a(\vec{x}) \in \mathbb{F} ∀u(x)∈Fm,a(x)∈F, 有 ∂ ( a u ⃗ ) ∂ x ⃗ = a ∂ u ⃗ ∂ x ⃗ + ∂ a ∂ x ⃗ u ⃗ T \frac{\partial (a\vec{u})}{\partial \vec{x}} = a \frac{\partial \vec{u}}{\partial \vec{x}} + \frac{\partial a}{\partial \vec{x}} \vec{u}^T ∂x∂(au)=a∂x∂u+∂x∂auT。
- VV2(线性):对于 ∀ u ⃗ ( x ⃗ ) , v ⃗ ( x ⃗ ) ∈ F m \forall \vec{u}(\vec{x}), \vec{v}(\vec{x}) \in \mathbb{F}^{m} ∀u(x),v(x)∈Fm, 有 ∂ ( u ⃗ + v ⃗ ) ∂ x ⃗ = ∂ u ⃗ ∂ x ⃗ + ∂ v ⃗ ∂ x ⃗ \frac{\partial (\vec{u} + \vec{v})}{\partial \vec{x}} = \frac{\partial \vec{u}}{\partial \vec{x}} + \frac{\partial \vec{v}}{\partial \vec{x}} ∂x∂(u+v)=∂x∂u+∂x∂v。
- VV3(乘矩阵):对于 ∀ u ⃗ ( x ⃗ ) ∈ F m , A ∈ F p × m \forall \vec{u}(\vec{x}) \in \mathbb{F}^{m}, A \in \mathbb{F}^{p \times m} ∀u(x)∈Fm,A∈Fp×m, 有 ∂ ( A u ⃗ ) ∂ x ⃗ = ∂ u ⃗ ∂ x ⃗ A T \frac{\partial (A \vec{u})}{\partial \vec{x}} = \frac{\partial \vec{u}}{\partial \vec{x}} A^T ∂x∂(Au)=∂x∂uAT。
- VV4(链式):对于 ∀ u ⃗ ( x ⃗ ) ∈ F p , g ⃗ ( u ⃗ ) ∈ F q \forall \vec{u}(\vec{x}) \in \mathbb{F}^{p}, \vec{g}(\vec{u}) \in \mathbb{F}^{q} ∀u(x)∈Fp,g(u)∈Fq, 有 ∂ g ⃗ ( u ⃗ ) ∂ x ⃗ = ∂ u ⃗ ∂ x ⃗ ∂ g ⃗ ∂ u ⃗ \frac{\partial \vec{g}(\vec{u})}{\partial \vec{x}} = \frac{\partial \vec{u}}{\partial \vec{x}} \frac{\partial \vec{g}}{\partial \vec{u}} ∂x∂g(u)=∂x∂u∂u∂g。
证VV1(数乘),同样为了书写方便记
∂
u
⃗
x
⃗
=
(
∂
u
j
∂
x
i
)
n
×
m
\frac{\partial \vec{u}}{\vec{x}} = \left( \frac{\partial u_j}{\partial x_i} \right)_{n \times m}
x∂u=(∂xi∂uj)n×m。
∂
(
a
u
⃗
)
∂
x
⃗
=
(
∂
a
u
j
∂
x
i
)
n
×
m
=
(
a
∂
u
j
∂
x
i
+
∂
a
∂
x
i
u
j
)
n
×
m
=
(
a
∂
u
j
∂
x
i
)
n
×
m
+
(
∂
a
∂
x
i
u
j
)
n
×
m
=
a
(
∂
u
j
∂
x
i
)
n
×
m
+
(
∂
a
∂
x
1
∂
a
∂
x
2
∂
a
∂
x
3
⋮
∂
a
∂
x
n
)
(
u
1
,
u
2
,
u
3
,
⋯
,
u
m
)
=
a
∂
u
⃗
∂
x
⃗
+
∂
a
∂
x
⃗
u
⃗
T
.
\begin{aligned} \frac{\partial (a\vec{u})}{\partial \vec{x}} = & \left( \frac{\partial a u_j}{\partial x_i} \right)_{n \times m} \\ = & \left( a \frac{\partial u_j}{\partial x_i} + \frac{\partial a}{\partial x_i} u_j \right)_{n \times m} \\ = & \left( a \frac{\partial u_j}{\partial x_i} \right)_{n \times m} + \left( \frac{\partial a}{\partial x_i} u_j \right)_{n \times m} \\ = & a \left( \frac{\partial u_j}{\partial x_i} \right)_{n \times m} + \left( \begin{matrix} \frac{\partial a}{\partial x_1} \\ \frac{\partial a}{\partial x_2} \\ \frac{\partial a}{\partial x_3} \\ \vdots \\ \frac{\partial a}{\partial x_n} \end{matrix} \right) \left( \begin{matrix} u_1, & u_2, & u_3, & \cdots, u_m \end{matrix} \right) \\ = & a \frac{\partial \vec{u}}{\partial \vec{x}} + \frac{\partial a}{\partial \vec{x}} \vec{u}^T. \end{aligned}
∂x∂(au)=====(∂xi∂auj)n×m(a∂xi∂uj+∂xi∂auj)n×m(a∂xi∂uj)n×m+(∂xi∂auj)n×ma(∂xi∂uj)n×m+⎝⎜⎜⎜⎜⎜⎜⎛∂x1∂a∂x2∂a∂x3∂a⋮∂xn∂a⎠⎟⎟⎟⎟⎟⎟⎞(u1,u2,u3,⋯,um)a∂x∂u+∂x∂auT.
证VV3(乘矩阵),向量
A
u
⃗
A\vec{u}
Au记成
(
∑
k
=
1
m
a
j
k
u
k
)
p
\left( \sum_{k=1}^{m} a_{jk} u_k \right)_p
(∑k=1majkuk)p,即其第
j
j
j行元素为
∑
k
=
1
m
a
j
k
u
k
\sum_{k=1}^{m} a_{jk} u_k
∑k=1majkuk。
根据向量对向量求导的特点,可以得到
∂
∂
x
⃗
(
∑
k
=
1
m
a
j
k
u
k
)
=
(
∂
∂
x
1
(
∑
k
=
1
m
a
j
k
u
k
)
∂
∂
x
2
(
∑
k
=
1
m
a
j
k
u
k
)
∂
∂
x
3
(
∑
k
=
1
m
a
j
k
u
k
)
⋮
∂
∂
x
n
(
∑
k
=
1
m
a
j
k
u
k
)
)
\frac{\partial}{\partial \vec{x}} \left( \sum_{k=1}^{m} a_{jk} u_k \right) = \left( \begin{matrix} \frac{\partial}{\partial x_1} \left( \sum_{k=1}^{m} a_{jk} u_k \right) \\ \frac{\partial}{\partial x_2} \left( \sum_{k=1}^{m} a_{jk} u_k \right) \\ \frac{\partial}{\partial x_3} \left( \sum_{k=1}^{m} a_{jk} u_k \right) \\ \vdots \\ \frac{\partial}{\partial x_n} \left( \sum_{k=1}^{m} a_{jk} u_k \right) \end{matrix} \right)
∂x∂(k=1∑majkuk)=⎝⎜⎜⎜⎜⎜⎜⎛∂x1∂(∑k=1majkuk)∂x2∂(∑k=1majkuk)∂x3∂(∑k=1majkuk)⋮∂xn∂(∑k=1majkuk)⎠⎟⎟⎟⎟⎟⎟⎞
因此LHS可以写为
∂
(
A
u
⃗
)
∂
x
⃗
=
(
∂
∂
x
i
(
∑
k
=
1
m
a
j
k
u
k
)
)
n
×
p
=
(
∑
k
=
1
m
(
∂
u
k
∂
x
i
a
j
k
)
)
n
×
p
.
\frac{\partial (A \vec{u})}{\partial \vec{x}} = \left( \frac{\partial}{\partial x_i} \left( \sum_{k=1}^{m} a_{jk} u_k \right) \right)_{n \times p} = \left( \sum_{k=1}^{m} \left( \frac{\partial u_k}{\partial x_i} a_{jk} \right) \right)_{n \times p}.
∂x∂(Au)=(∂xi∂(k=1∑majkuk))n×p=(k=1∑m(∂xi∂ukajk))n×p.
即
L
H
S
i
,
j
=
∑
k
=
1
m
(
∂
u
k
∂
x
i
a
j
k
)
LHS_{i,j} = \sum_{k=1}^{m} \left( \frac{\partial u_k}{\partial x_i} a_{jk} \right)
LHSi,j=∑k=1m(∂xi∂ukajk)。
现在考虑
R
H
S
i
,
j
RHS_{i,j}
RHSi,j,它是由
∂
u
⃗
∂
x
⃗
\frac{\partial \vec{u}}{\partial \vec{x}}
∂x∂u的第
i
i
i行乘
A
T
A^T
AT的第
j
j
j列得到的。
R
H
S
i
,
j
=
(
∂
u
1
∂
x
i
,
∂
u
2
∂
x
i
,
⋯
,
∂
u
m
∂
x
i
)
(
a
j
1
a
j
2
⋮
a
j
m
)
=
∑
k
=
1
m
(
∂
u
k
∂
x
i
a
j
k
)
=
L
H
S
i
,
j
.
RHS_{i,j} = \left( \frac{\partial u_1}{\partial x_i}, \frac{\partial u_2}{\partial x_i}, \cdots ,\frac{\partial u_m}{\partial x_i} \right) \left( \begin{matrix} a_{j1} \\ a_{j2} \\ \vdots \\ a_{jm} \\ \end{matrix} \right) = \sum_{k=1}^{m} \left( \frac{\partial u_k}{\partial x_i} a_{jk} \right) = LHS_{i,j}.
RHSi,j=(∂xi∂u1,∂xi∂u2,⋯,∂xi∂um)⎝⎜⎜⎜⎛aj1aj2⋮ajm⎠⎟⎟⎟⎞=k=1∑m(∂xi∂ukajk)=LHSi,j.
最后证VV4(链式)。
∂
u
⃗
∂
x
⃗
∂
g
⃗
∂
u
⃗
=
(
∂
u
j
∂
x
i
)
n
×
p
(
∂
g
k
∂
u
j
)
p
×
q
=
(
∑
j
=
1
p
(
∂
u
j
∂
x
i
∂
g
k
∂
u
j
)
)
n
×
q
\frac{\partial \vec{u}}{\partial \vec{x}} \frac{\partial \vec{g}}{\partial \vec{u}} = \left( \frac{\partial u_j}{ \partial x_i} \right)_{n \times p} \left( \frac{\partial g_k}{ \partial u_j} \right)_{p \times q} = \left( \sum_{j=1}^{p} \left( \frac{\partial u_j}{ \partial x_i} \frac{\partial g_k}{ \partial u_j} \right) \right)_{n \times q}
∂x∂u∂u∂g=(∂xi∂uj)n×p(∂uj∂gk)p×q=(j=1∑p(∂xi∂uj∂uj∂gk))n×q
即
R
H
S
i
,
k
=
∑
j
=
1
p
(
∂
u
j
∂
x
i
∂
g
k
∂
u
j
)
RHS_{i,k} = \sum_{j=1}^{p} \left( \frac{\partial u_j}{ \partial x_i} \frac{\partial g_k}{ \partial u_j} \right)
RHSi,k=∑j=1p(∂xi∂uj∂uj∂gk)。
而
∂
g
⃗
(
u
⃗
)
∂
x
⃗
=
(
∂
g
k
∂
x
i
)
n
×
q
\frac{\partial \vec{g}(\vec{u})}{\partial \vec{x}} = \left( \frac{\partial g_k}{ \partial x_i} \right)_{n \times q}
∂x∂g(u)=(∂xi∂gk)n×q,即
L
H
S
i
,
k
=
∂
g
k
∂
x
i
=
S
S
3
∂
g
k
∂
u
1
∂
u
1
∂
x
i
+
∂
g
k
∂
u
2
∂
u
2
∂
x
i
+
⋯
+
∂
g
k
∂
u
p
∂
u
p
∂
x
i
=
∑
j
=
1
p
(
∂
u
j
∂
x
i
∂
g
k
∂
u
j
)
=
R
H
S
i
,
k
.
\begin{aligned} LHS_{i,k} = & \frac{\partial g_k}{ \partial x_i} \\ \overset{SS3}{=} & \frac{\partial g_k}{ \partial u_1} \frac{\partial u_1}{ \partial x_i} + \frac{\partial g_k}{ \partial u_2} \frac{\partial u_2}{ \partial x_i} + \cdots + \frac{\partial g_k}{ \partial u_p} \frac{\partial u_p}{ \partial x_i} \\ = & \sum_{j=1}^{p} \left( \frac{\partial u_j}{ \partial x_i} \frac{\partial g_k}{ \partial u_j} \right) \\ = & RHS_{i,k}. \end{aligned}
LHSi,k==SS3==∂xi∂gk∂u1∂gk∂xi∂u1+∂u2∂gk∂xi∂u2+⋯+∂up∂gk∂xi∂upj=1∑p(∂xi∂uj∂uj∂gk)RHSi,k.
2.3矩阵对向量求导
首先将矩阵
Y
Y
Y按列优先向量化,即
v
e
c
(
Y
p
×
q
)
=
v
e
c
(
(
y
11
y
12
y
13
⋯
y
1
q
y
21
y
22
y
23
⋯
y
2
q
y
31
y
32
y
33
⋯
y
3
q
⋮
⋮
⋮
⋱
⋮
y
p
1
y
p
2
y
p
3
⋯
y
p
q
)
p
×
q
)
=
(
y
⃗
1
,
y
⃗
2
,
y
⃗
3
,
⋯
,
y
⃗
q
)
T
=
(
y
11
y
21
⋮
y
p
1
y
12
y
22
⋮
y
p
2
⋮
⋮
y
1
q
y
2
q
⋮
y
p
q
)
p
q
×
1
.
vec(Y_{p \times q}) = vec \left( \left( \begin{matrix} y_{11} & y_{12} & y_{13} & \cdots & y_{1q} \\ y_{21} & y_{22} & y_{23} & \cdots & y_{2q} \\ y_{31} & y_{32} & y_{33} & \cdots & y_{3q} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ y_{p1} & y_{p2} & y_{p3} & \cdots & y_{pq} \end{matrix} \right)_{p \times q} \right) = \left( \vec{y}_1, \vec{y}_2, \vec{y}_3, \cdots, \vec{y}_q \right)^T = \left( \begin{matrix} y_{11} \\ y_{21} \\ \vdots \\ y_{p1} \\ y_{12} \\ y_{22} \\ \vdots \\ y_{p2} \\ \vdots \\ \vdots \\ y_{1q} \\ y_{2q} \\ \vdots \\ y_{pq} \end{matrix} \right)_{pq \times 1}.
vec(Yp×q)=vec⎝⎜⎜⎜⎜⎜⎛⎝⎜⎜⎜⎜⎜⎛y11y21y31⋮yp1y12y22y32⋮yp2y13y23y33⋮yp3⋯⋯⋯⋱⋯y1qy2qy3q⋮ypq⎠⎟⎟⎟⎟⎟⎞p×q⎠⎟⎟⎟⎟⎟⎞=(y1,y2,y3,⋯,yq)T=⎝⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎛y11y21⋮yp1y12y22⋮yp2⋮⋮y1qy2q⋮ypq⎠⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎞pq×1.
根据向量对向量求导,有
∂
y
⃗
i
∂
x
⃗
=
(
∂
y
1
i
∂
x
⃗
,
∂
y
2
i
∂
x
⃗
,
∂
y
3
i
∂
x
⃗
,
⋯
,
∂
y
p
i
∂
x
⃗
)
=
(
∂
y
1
i
∂
x
1
∂
y
2
i
∂
x
1
∂
y
3
i
∂
x
1
⋯
∂
y
p
i
∂
x
1
∂
y
1
i
∂
x
2
∂
y
2
i
∂
x
2
∂
y
3
i
∂
x
2
⋯
∂
y
p
i
∂
x
2
∂
y
1
i
∂
x
3
∂
y
2
i
∂
x
3
∂
y
3
i
∂
x
3
⋯
∂
y
p
i
∂
x
3
⋮
⋮
⋮
⋱
⋮
∂
y
1
i
∂
x
n
∂
y
2
i
∂
x
n
∂
y
3
i
∂
x
n
⋯
∂
y
p
i
∂
x
n
)
n
×
p
\frac{\partial \vec{y}_i}{\partial \vec{x}} = \left( \begin{matrix} \frac{\partial y_{1i}}{\partial \vec{x}}, & \frac{\partial y_{2i}}{\partial \vec{x}}, & \frac{\partial y_{3i}}{\partial \vec{x}}, & \cdots, & \frac{\partial y_{pi}}{\partial \vec{x}} \end{matrix} \right) = \left( \begin{matrix} \frac{\partial y_{1i}}{\partial x_1} & \frac{\partial y_{2i}}{\partial x_1} & \frac{\partial y_{3i}}{\partial x_1} & \cdots & \frac{\partial y_{pi}}{\partial x_1} \\ \frac{\partial y_{1i}}{\partial x_2} & \frac{\partial y_{2i}}{\partial x_2} & \frac{\partial y_{3i}}{\partial x_2} & \cdots & \frac{\partial y_{pi}}{\partial x_2} \\ \frac{\partial y_{1i}}{\partial x_3} & \frac{\partial y_{2i}}{\partial x_3} & \frac{\partial y_{3i}}{\partial x_3} & \cdots & \frac{\partial y_{pi}}{\partial x_3} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y_{1i}}{\partial x_n} & \frac{\partial y_{2i}}{\partial x_n} & \frac{\partial y_{3i}}{\partial x_n} & \cdots & \frac{\partial y_{pi}}{\partial x_n} \\ \end{matrix} \right)_{n \times p}
∂x∂yi=(∂x∂y1i,∂x∂y2i,∂x∂y3i,⋯,∂x∂ypi)=⎝⎜⎜⎜⎜⎜⎜⎛∂x1∂y1i∂x2∂y1i∂x3∂y1i⋮∂xn∂y1i∂x1∂y2i∂x2∂y2i∂x3∂y2i⋮∂xn∂y2i∂x1∂y3i∂x2∂y3i∂x3∂y3i⋮∂xn∂y3i⋯⋯⋯⋱⋯∂x1∂ypi∂x2∂ypi∂x3∂ypi⋮∂xn∂ypi⎠⎟⎟⎟⎟⎟⎟⎞n×p
因此
∂
v
e
c
(
Y
)
∂
x
⃗
=
(
∂
y
11
∂
x
⃗
,
∂
y
21
∂
x
⃗
,
⋯
,
∂
y
p
1
∂
x
⃗
,
∂
y
22
∂
x
⃗
,
⋯
,
∂
y
p
2
∂
x
⃗
,
⋯
,
⋯
,
∂
y
p
q
∂
x
⃗
)
=
(
∂
y
11
∂
x
1
,
∂
y
21
∂
x
1
,
⋯
,
∂
y
p
1
∂
x
1
,
∂
y
22
∂
x
1
,
⋯
,
∂
y
p
2
∂
x
1
,
⋯
,
⋯
,
∂
y
p
q
∂
x
1
∂
y
11
∂
x
2
,
∂
y
21
∂
x
2
,
⋯
,
∂
y
p
1
∂
x
2
,
∂
y
22
∂
x
2
,
⋯
,
∂
y
p
2
∂
x
2
,
⋯
,
⋯
,
∂
y
p
q
∂
x
2
∂
y
11
∂
x
3
,
∂
y
21
∂
x
3
,
⋯
,
∂
y
p
1
∂
x
3
,
∂
y
22
∂
x
3
,
⋯
,
∂
y
p
2
∂
x
3
,
⋯
,
⋯
,
∂
y
p
q
∂
x
3
⋮
⋮
⋱
,
⋮
⋮
⋱
,
⋮
⋱
,
⋯
,
∂
y
p
q
∂
x
2
∂
y
11
∂
x
n
,
∂
y
21
∂
x
n
,
⋯
,
∂
y
p
1
∂
x
n
,
∂
y
22
∂
x
n
,
⋯
,
∂
y
p
2
∂
x
n
,
⋯
,
⋯
,
∂
y
p
q
∂
x
n
)
n
×
p
q
\begin{aligned} \frac{\partial vec(Y)}{\partial \vec{x}} = & \left( \begin{matrix} \frac{\partial y_{11}}{\partial \vec{x}}, & \frac{\partial y_{21}}{\partial \vec{x}}, & \cdots, & \frac{\partial y_{p1}}{\partial \vec{x}}, & % \frac{\partial y_{12}}{\partial \vec{x}}, & \frac{\partial y_{22}}{\partial \vec{x}}, & \cdots, & \frac{\partial y_{p2}} {\partial \vec{x}}, & \cdots, & % \cdots, & % \frac{\partial y_{1q}} {\partial \vec{x}}, & % \frac{\partial y_{2q}}{\partial \vec{x}}, & \cdots, & \frac{\partial y_{pq}}{\partial \vec{x}} \end{matrix} \right) \\ = & \left( \begin{matrix} \frac{\partial y_{11}}{\partial x_1}, & \frac{\partial y_{21}}{\partial x_1}, & \cdots, & \frac{\partial y_{p1}}{\partial x_1}, & % \frac{\partial y_{12}}{\partial \vec{x}}, & \frac{\partial y_{22}}{\partial x_1}, & \cdots, & \frac{\partial y_{p2}} {\partial x_1}, & \cdots, & % \cdots, & % \frac{\partial y_{1q}} {\partial \vec{x}}, & % \frac{\partial y_{2q}}{\partial \vec{x}}, & \cdots, & \frac{\partial y_{pq}}{\partial x_1} \\ \frac{\partial y_{11}}{\partial x_2}, & \frac{\partial y_{21}}{\partial x_2}, & \cdots, & \frac{\partial y_{p1}}{\partial x_2}, & % \frac{\partial y_{12}}{\partial \vec{x}}, & \frac{\partial y_{22}}{\partial x_2}, & \cdots, & \frac{\partial y_{p2}} {\partial x_2}, & \cdots, & % \cdots, & % \frac{\partial y_{1q}} {\partial \vec{x}}, & % \frac{\partial y_{2q}}{\partial \vec{x}}, & \cdots, & \frac{\partial y_{pq}}{\partial x_2} \\ \frac{\partial y_{11}}{\partial x_3}, & \frac{\partial y_{21}}{\partial x_3}, & \cdots, & \frac{\partial y_{p1}}{\partial x_3}, & % \frac{\partial y_{12}}{\partial \vec{x}}, & \frac{\partial y_{22}}{\partial x_3}, & \cdots, & \frac{\partial y_{p2}} {\partial x_3}, & \cdots, & % \cdots, & % \frac{\partial y_{1q}} {\partial \vec{x}}, & % \frac{\partial y_{2q}}{\partial \vec{x}}, & \cdots, & \frac{\partial y_{pq}}{\partial x_3} \\ \vdots & \vdots & \ddots, & \vdots & % \frac{\partial y_{12}}{\partial \vec{x}}, & \vdots & \ddots, & \vdots & \ddots, & % \cdots, & % \frac{\partial y_{1q}} {\partial \vec{x}}, & % \frac{\partial y_{2q}}{\partial \vec{x}}, & \cdots, & \frac{\partial y_{pq}}{\partial x_2} \\ \frac{\partial y_{11}}{\partial x_n}, & \frac{\partial y_{21}}{\partial x_n}, & \cdots, & \frac{\partial y_{p1}}{\partial x_n}, & % \frac{\partial y_{12}}{\partial \vec{x}}, & \frac{\partial y_{22}}{\partial x_n}, & \cdots, & \frac{\partial y_{p2}} {\partial x_n}, & \cdots, & % \cdots, & % \frac{\partial y_{1q}} {\partial \vec{x}}, & % \frac{\partial y_{2q}}{\partial \vec{x}}, & \cdots, & \frac{\partial y_{pq}}{\partial x_n} \\ \end{matrix} \right)_{n \times pq} \end{aligned}
∂x∂vec(Y)==(∂x∂y11,∂x∂y21,⋯,∂x∂yp1,∂x∂y22,⋯,∂x∂yp2,⋯,⋯,∂x∂ypq)⎝⎜⎜⎜⎜⎜⎜⎛∂x1∂y11,∂x2∂y11,∂x3∂y11,⋮∂xn∂y11,∂x1∂y21,∂x2∂y21,∂x3∂y21,⋮∂xn∂y21,⋯,⋯,⋯,⋱,⋯,∂x1∂yp1,∂x2∂yp1,∂x3∂yp1,⋮∂xn∂yp1,∂x1∂y22,∂x2∂y22,∂x3∂y22,⋮∂xn∂y22,⋯,⋯,⋯,⋱,⋯,∂x1∂yp2,∂x2∂yp2,∂x3∂yp2,⋮∂xn∂yp2,⋯,⋯,⋯,⋱,⋯,⋯,⋯,⋯,⋯,⋯,∂x1∂ypq∂x2∂ypq∂x3∂ypq∂x2∂ypq∂xn∂ypq⎠⎟⎟⎟⎟⎟⎟⎞n×pq
得
v
e
c
(
d
Y
)
=
(
∂
v
e
c
(
Y
)
∂
x
⃗
)
T
d
x
⃗
vec(\mathrm{d}Y) = \left( \frac{\partial vec(Y)}{\partial \vec{x}} \right)^T \mathrm{d} \vec{x}
vec(dY)=(∂x∂vec(Y))Tdx
3.对矩阵求导
3.1标量对矩阵求导
∂
y
∂
X
=
(
∂
y
∂
x
⃗
1
,
∂
y
∂
x
⃗
2
,
⋯
,
∂
y
∂
x
⃗
s
)
=
(
∂
y
∂
x
11
∂
y
∂
x
12
∂
y
∂
x
13
⋯
∂
y
∂
x
1
s
∂
y
∂
x
21
∂
y
∂
x
22
∂
y
∂
x
23
⋯
∂
y
∂
x
2
s
∂
y
∂
x
31
∂
y
∂
x
32
∂
y
∂
x
33
⋯
∂
y
∂
x
3
s
⋮
⋮
⋮
⋱
⋮
∂
y
∂
x
r
1
∂
y
∂
x
r
2
∂
y
∂
x
r
3
⋯
∂
y
∂
x
r
s
)
r
×
s
\frac{\partial y}{\partial X} = \left( \begin{matrix} \frac{\partial y}{\partial \vec{x}_1}, & \frac{\partial y}{\partial \vec{x}_2}, & \cdots, & \frac{\partial y}{\partial \vec{x}_s} \end{matrix} \right) = \left( \begin{matrix} \frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{12}} & \frac{\partial y}{\partial x_{13}} & \cdots & \frac{\partial y}{\partial x_{1s}} \\ \frac{\partial y}{\partial x_{21}} & \frac{\partial y}{\partial x_{22}} & \frac{\partial y}{\partial x_{23}} & \cdots & \frac{\partial y}{\partial x_{2s}} \\ \frac{\partial y}{\partial x_{31}} & \frac{\partial y}{\partial x_{32}} & \frac{\partial y}{\partial x_{33}} & \cdots & \frac{\partial y}{\partial x_{3s}} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y}{\partial x_{r1}} & \frac{\partial y}{\partial x_{r2}} & \frac{\partial y}{\partial x_{r3}} & \cdots & \frac{\partial y}{\partial x_{rs}} \\ \end{matrix} \right)_{r \times s}
∂X∂y=(∂x1∂y,∂x2∂y,⋯,∂xs∂y)=⎝⎜⎜⎜⎜⎜⎜⎛∂x11∂y∂x21∂y∂x31∂y⋮∂xr1∂y∂x12∂y∂x22∂y∂x32∂y⋮∂xr2∂y∂x13∂y∂x23∂y∂x33∂y⋮∂xr3∂y⋯⋯⋯⋱⋯∂x1s∂y∂x2s∂y∂x3s∂y⋮∂xrs∂y⎠⎟⎟⎟⎟⎟⎟⎞r×s
同样的,由全微分公式有
d
y
=
∑
i
=
1
r
∑
j
=
1
s
∂
y
∂
x
i
j
d
x
i
j
\mathrm{d}y = \sum_{i=1}^{r} \sum_{j=1}^{s} \frac{\partial y}{\partial x_{ij}} \mathrm{d}x_{ij}
dy=∑i=1r∑j=1s∂xij∂ydxij。
(
∂
y
∂
X
)
T
d
X
=
(
∂
y
∂
x
11
∂
y
∂
x
21
∂
y
∂
x
31
⋯
∂
y
∂
x
r
1
∂
y
∂
x
12
∂
y
∂
x
22
∂
y
∂
x
32
⋯
∂
y
∂
x
r
2
∂
y
∂
x
13
∂
y
∂
x
23
∂
y
∂
x
33
⋯
∂
y
∂
x
r
3
⋮
⋮
⋮
⋱
⋮
∂
y
∂
x
1
s
∂
y
∂
x
2
s
∂
y
∂
x
3
s
⋯
∂
y
∂
x
r
s
)
s
×
r
×
(
d
x
11
d
x
12
d
x
13
⋯
d
x
1
s
d
x
21
d
x
22
d
x
23
⋯
d
x
2
s
d
x
31
d
x
32
d
x
33
⋯
d
x
3
s
⋮
⋮
⋮
⋱
⋮
d
x
r
1
d
x
r
2
d
x
r
3
⋯
d
x
r
s
)
r
×
s
=
(
∑
i
=
1
r
∂
y
∂
x
i
1
d
x
i
1
⋯
⋯
⋯
⋯
⋯
∑
i
=
1
r
∂
y
∂
x
i
2
d
x
i
2
⋯
⋯
⋯
⋯
⋯
∑
i
=
1
r
∂
y
∂
x
i
3
d
x
i
3
⋯
⋯
⋮
⋮
⋮
⋱
⋮
⋯
⋯
⋯
⋯
∑
i
=
1
r
∂
y
∂
x
i
s
d
x
i
s
)
s
×
s
\begin{aligned} \left( \frac{\partial y}{\partial X} \right)^T \mathrm{d}X = & \left( \begin{matrix} \frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{21}} & \frac{\partial y}{\partial x_{31}} & \cdots & \frac{\partial y}{\partial x_{r1}} \\ \frac{\partial y}{\partial x_{12}} & \frac{\partial y}{\partial x_{22}} & \frac{\partial y}{\partial x_{32}} & \cdots & \frac{\partial y}{\partial x_{r2}} \\ \frac{\partial y}{\partial x_{13}} & \frac{\partial y}{\partial x_{23}} & \frac{\partial y}{\partial x_{33}} & \cdots & \frac{\partial y}{\partial x_{r3}} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y}{\partial x_{1s}} & \frac{\partial y}{\partial x_{2s}} & \frac{\partial y}{\partial x_{3s}} & \cdots & \frac{\partial y}{\partial x_{rs}} \end{matrix} \right)_{s \times r} \times \left( \begin{matrix} \mathrm{d}x_{11} & \mathrm{d}x_{12} & \mathrm{d}x_{13} & \cdots & \mathrm{d}x_{1s} \\ \mathrm{d}x_{21} & \mathrm{d}x_{22} & \mathrm{d}x_{23} & \cdots & \mathrm{d}x_{2s} \\ \mathrm{d}x_{31} & \mathrm{d}x_{32} & \mathrm{d}x_{33} & \cdots & \mathrm{d}x_{3s} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \mathrm{d}x_{r1} & \mathrm{d}x_{r2} & \mathrm{d}x_{r3} & \cdots & \mathrm{d}x_{rs} \end{matrix} \right)_{r \times s} \\ = & \left( \begin{matrix} \sum_{i=1}^{r} \frac{\partial y}{\partial x_{i1}} \mathrm{d}x_{i1} & \cdots & \cdots & \cdots & \cdots \\ \cdots & \sum_{i=1}^{r} \frac{\partial y}{\partial x_{i2}} \mathrm{d}x_{i2} & \cdots & \cdots & \cdots \\ \cdots & \cdots & \sum_{i=1}^{r} \frac{\partial y}{\partial x_{i3}} \mathrm{d}x_{i3} & \cdots & \cdots \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \cdots & \cdots & \cdots & \cdots & \sum_{i=1}^{r} \frac{\partial y}{\partial x_{is}} \mathrm{d}x_{is} \end{matrix} \right)_{s \times s} \end{aligned}
(∂X∂y)TdX==⎝⎜⎜⎜⎜⎜⎜⎛∂x11∂y∂x12∂y∂x13∂y⋮∂x1s∂y∂x21∂y∂x22∂y∂x23∂y⋮∂x2s∂y∂x31∂y∂x32∂y∂x33∂y⋮∂x3s∂y⋯⋯⋯⋱⋯∂xr1∂y∂xr2∂y∂xr3∂y⋮∂xrs∂y⎠⎟⎟⎟⎟⎟⎟⎞s×r×⎝⎜⎜⎜⎜⎜⎛dx11dx21dx31⋮dxr1dx12dx22dx32⋮dxr2dx13dx23dx33⋮dxr3⋯⋯⋯⋱⋯dx1sdx2sdx3s⋮dxrs⎠⎟⎟⎟⎟⎟⎞r×s⎝⎜⎜⎜⎜⎜⎜⎛∑i=1r∂xi1∂ydxi1⋯⋯⋮⋯⋯∑i=1r∂xi2∂ydxi2⋯⋮⋯⋯⋯∑i=1r∂xi3∂ydxi3⋮⋯⋯⋯⋯⋱⋯⋯⋯⋯⋮∑i=1r∂xis∂ydxis⎠⎟⎟⎟⎟⎟⎟⎞s×s
因此
t
r
(
(
∂
y
∂
X
)
T
d
X
)
=
∑
j
=
1
s
∑
i
=
1
r
∂
y
∂
x
i
j
d
x
i
j
=
∑
i
=
1
r
∑
j
=
1
s
∂
y
∂
x
i
j
d
x
i
j
=
d
y
\begin{aligned} tr\left( \left( \frac{\partial y}{\partial X} \right)^T \mathrm{d}X\right) = & \sum_{j=1}^{s} \sum_{i=1}^{r} \frac{\partial y}{\partial x_{ij}} \mathrm{d}x_{ij} = & \sum_{i=1}^{r} \sum_{j=1}^{s} \frac{\partial y}{\partial x_{ij}} \mathrm{d}x_{ij} = & \mathrm{d}y \end{aligned}
tr((∂X∂y)TdX)=j=1∑si=1∑r∂xij∂ydxij=i=1∑rj=1∑s∂xij∂ydxij=dy
3.2向量对矩阵求导
d y ⃗ = ( d y 1 d y 2 ⋮ d y m ) = ( t r ( ( ∂ y 1 ∂ X ) T d X ) t r ( ( ∂ y 2 ∂ X ) T d X ) ⋮ t r ( ( ∂ y m ∂ X ) T d X ) ) \mathrm{d}\vec{y} = \left( \begin{matrix} \mathrm{d}y_1 \\ \mathrm{d}y_2 \\ \vdots \\ \mathrm{d}y_m \end{matrix} \right) = \left( \begin{matrix} tr\left( \left( \frac{\partial y_1}{\partial X} \right)^T \mathrm{d}X\right) \\ tr\left( \left( \frac{\partial y_2}{\partial X} \right)^T \mathrm{d}X\right) \\ \vdots \\ tr\left( \left( \frac{\partial y_m}{\partial X} \right)^T \mathrm{d}X\right) \\ \end{matrix} \right) dy=⎝⎜⎜⎜⎛dy1dy2⋮dym⎠⎟⎟⎟⎞=⎝⎜⎜⎜⎜⎜⎜⎜⎜⎜⎛tr((∂X∂y1)TdX)tr((∂X∂y2)TdX)⋮tr((∂X∂ym)TdX)⎠⎟⎟⎟⎟⎟⎟⎟⎟⎟⎞
3.3矩阵对矩阵求导
如果采用向量对矩阵求导,我们有
d
Y
=
(
d
y
⃗
1
,
d
y
⃗
2
,
⋯
,
d
y
⃗
q
)
=
(
t
r
(
(
∂
y
11
∂
X
)
T
d
X
)
t
r
(
(
∂
y
12
∂
X
)
T
d
X
)
t
r
(
(
∂
y
13
∂
X
)
T
d
X
)
⋯
t
r
(
(
∂
y
1
q
∂
X
)
T
d
X
)
t
r
(
(
∂
y
21
∂
X
)
T
d
X
)
t
r
(
(
∂
y
22
∂
X
)
T
d
X
)
t
r
(
(
∂
y
23
∂
X
)
T
d
X
)
⋯
t
r
(
(
∂
y
2
q
∂
X
)
T
d
X
)
t
r
(
(
∂
y
31
∂
X
)
T
d
X
)
t
r
(
(
∂
y
32
∂
X
)
T
d
X
)
t
r
(
(
∂
y
33
∂
X
)
T
d
X
)
⋯
t
r
(
(
∂
y
3
q
∂
X
)
T
d
X
)
⋮
⋮
⋮
⋱
⋮
t
r
(
(
∂
y
p
1
∂
X
)
T
d
X
)
t
r
(
(
∂
y
p
2
∂
X
)
T
d
X
)
t
r
(
(
∂
y
p
3
∂
X
)
T
d
X
)
⋯
t
r
(
(
∂
y
p
q
∂
X
)
T
d
X
)
)
r
×
s
\begin{aligned} \mathrm{d}Y = & \left( \begin{matrix} \mathrm{d}\vec{y}_1, & \mathrm{d}\vec{y}_2, & \cdots, & \mathrm{d}\vec{y}_q \end{matrix} \right) \\ = & \left( \begin{matrix} tr\left( \left( \frac{\partial y_{11}}{\partial X} \right)^T \mathrm{d}X\right) & tr\left( \left( \frac{\partial y_{12}}{\partial X} \right)^T \mathrm{d}X\right) & tr\left( \left( \frac{\partial y_{13}}{\partial X} \right)^T \mathrm{d}X\right) & \cdots & tr\left( \left( \frac{\partial y_{1q}}{\partial X} \right)^T \mathrm{d}X\right) \\ tr\left( \left( \frac{\partial y_{21}}{\partial X} \right)^T \mathrm{d}X\right) & tr\left( \left( \frac{\partial y_{22}}{\partial X} \right)^T \mathrm{d}X\right) & tr\left( \left( \frac{\partial y_{23}}{\partial X} \right)^T \mathrm{d}X\right) & \cdots & tr\left( \left( \frac{\partial y_{2q}}{\partial X} \right)^T \mathrm{d}X\right) \\ tr\left( \left( \frac{\partial y_{31}}{\partial X} \right)^T \mathrm{d}X\right) & tr\left( \left( \frac{\partial y_{32}}{\partial X} \right)^T \mathrm{d}X\right) & tr\left( \left( \frac{\partial y_{33}}{\partial X} \right)^T \mathrm{d}X\right) & \cdots & tr\left( \left( \frac{\partial y_{3q}}{\partial X} \right)^T \mathrm{d}X\right) \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ tr\left( \left( \frac{\partial y_{p1}}{\partial X} \right)^T \mathrm{d}X\right) & tr\left( \left( \frac{\partial y_{p2}}{\partial X} \right)^T \mathrm{d}X\right) & tr\left( \left( \frac{\partial y_{p3}}{\partial X} \right)^T \mathrm{d}X\right) & \cdots & tr\left( \left( \frac{\partial y_{pq}}{\partial X} \right)^T \mathrm{d}X\right) \\ \end{matrix} \right)_{r \times s} \end{aligned}
dY==(dy1,dy2,⋯,dyq)⎝⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎛tr((∂X∂y11)TdX)tr((∂X∂y21)TdX)tr((∂X∂y31)TdX)⋮tr((∂X∂yp1)TdX)tr((∂X∂y12)TdX)tr((∂X∂y22)TdX)tr((∂X∂y32)TdX)⋮tr((∂X∂yp2)TdX)tr((∂X∂y13)TdX)tr((∂X∂y23)TdX)tr((∂X∂y33)TdX)⋮tr((∂X∂yp3)TdX)⋯⋯⋯⋱⋯tr((∂X∂y1q)TdX)tr((∂X∂y2q)TdX)tr((∂X∂y3q)TdX)⋮tr((∂X∂ypq)TdX)⎠⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎞r×s
当然,如果采用将矩阵向量化,则有
v
e
c
(
d
Y
)
=
(
∂
v
e
c
(
Y
)
∂
v
e
c
(
x
)
)
T
v
e
c
(
d
X
)
vec(\mathrm{d}Y) = \left( \frac{\partial vec(Y)}{\partial vec(x)} \right)^T vec(\mathrm{d}X)
vec(dY)=(∂vec(x)∂vec(Y))Tvec(dX)
矩阵微分算子[2]
- D1(线性): d ( X ± Y ) = d X ± d Y \mathrm{d} \left( X \pm Y \right) = \mathrm{d}X \pm \mathrm{d}Y d(X±Y)=dX±dY。
- D2(矩阵乘法): d ( X Y ) = ( d X ) Y + X ( d Y ) \mathrm{d} \left( X Y \right) = (\mathrm{d}X)Y + X(\mathrm{d}Y) d(XY)=(dX)Y+X(dY)。
- D3(转置): d ( X T ) = ( d X ) T \mathrm{d}(X^T) = (\mathrm{d}X)^T d(XT)=(dX)T。
- D4(迹): d ( t r ( X ) ) = t r ( d X ) \mathrm{d}\left( tr(X) \right) = tr(\mathrm{d}X) d(tr(X))=tr(dX)。
- D5(逆):若 X X X可逆, d ( X − 1 ) = ( d X ) − 1 \mathrm{d}(X^{-1}) = (\mathrm{d}X)^{-1} d(X−1)=(dX)−1。
- D6(行列式): d ∣ X ∣ = t r ( X a d j u g a t e ( d X ) ) \mathrm{d} |X| = tr\left(X^{adjugate}(\mathrm{d}X)\right) d∣X∣=tr(Xadjugate(dX)),若 X X X可逆,则 d ∣ X ∣ = ∣ X ∣ t r ( X − 1 d X ) \mathrm{d} |X| = |X|tr\left( X^{-1} \mathrm{d}X \right) d∣X∣=∣X∣tr(X−1dX)。
- D7(逐元素乘法): d ( X ⊙ Y ) = ( d X ) ⊙ Y + X ⊙ ( d Y ) \mathrm{d}\left( X \odot Y \right) = (\mathrm{d}X) \odot Y + X \odot (\mathrm{d}Y) d(X⊙Y)=(dX)⊙Y+X⊙(dY)。
- D7(逐元素函数): d f ( X ) = f ′ ( X ) ⊙ ( d X ) \mathrm{d} f(X) = f^{'}(X) \odot ( \mathrm{d} X ) df(X)=f′(X)⊙(dX)。
4.行列式对矩阵求导
行列式对矩阵求导,同样也属于标量对矩阵求导类型。
- DM1:对于 ∀ X ∈ F n × n , A ∈ F p × n , B ∈ F n × q \forall X \in \mathbb{F}^{n \times n}, A \in \mathbb{F}^{p \times n}, B \in \mathbb{F}^{n \times q} ∀X∈Fn×n,A∈Fp×n,B∈Fn×q, 有 ∂ ∣ A X B ∣ ∂ X = ∣ A X B ∣ ( X − 1 ) T \frac{\partial |AXB|}{\partial X} = |AXB|(X^{-1})^T ∂X∂∣AXB∣=∣AXB∣(X−1)T。
- DM2:对于 ∀ X ∈ F n × n \forall X \in \mathbb{F}^{n \times n} ∀X∈Fn×n, 有 ∂ l n ( ∣ X ∣ ) ∂ X = ( X − 1 ) T \frac{\partial ln(|X|)}{\partial X} = (X^{-1})^T ∂X∂ln(∣X∣)=(X−1)T。
- DM3:对于 ∀ X ( z ) ∈ F n × n , z ∈ F \forall X(z) \in \mathbb{F}^{n \times n}, z \in \mathbb{F} ∀X(z)∈Fn×n,z∈F, 有 ∂ l n ( ∣ X ( z ) ∣ ) ∂ z = t r ( X − 1 ∂ X ∂ z ) \frac{\partial ln(|X(z)|)}{\partial z} = tr\left( X^{-1} \frac{\partial X}{\partial z} \right) ∂z∂ln(∣X(z)∣)=tr(X−1∂z∂X)。
- DM4:对于 ∀ X ∈ F n × m , A ∈ F n × n \forall X \in \mathbb{F}^{n \times m}, A \in \mathbb{F}^{n \times n} ∀X∈Fn×m,A∈Fn×n, 有 ∂ ∣ X T A X ∣ ∂ X = ∣ X T A X ∣ ( A X ( X T A X ) − 1 + A T X ( X T A T X ) − 1 ) \frac{\partial |X^T A X|}{\partial X} = |X^T A X| \left( AX \left(X^TAX\right)^{-1} + A^TX \left(X^TA^TX\right)^{-1} \right) ∂X∂∣XTAX∣=∣XTAX∣(AX(XTAX)−1+ATX(XTATX)−1)。
5.迹对矩阵求导
迹对矩阵求导,本质上属于标量对矩阵求导类型。
迹的性质
- TR1(标量):对于 ∀ a ∈ F 1 \forall a \in \mathbb{F}^1 ∀a∈F1, 都有 a = t r ( a ) a = tr(a) a=tr(a)。
- TR2(转置):对于 ∀ A ∈ F m × n \forall A \in \mathbb{F}^{m \times n} ∀A∈Fm×n, 都有 t r ( A ) = t r ( A T ) tr(A) = tr(A^T) tr(A)=tr(AT)。
- TR3(线性):对于 ∀ A , B ∈ F m × n \forall A,B \in \mathbb{F}^{m \times n} ∀A,B∈Fm×n, 都有 t r ( A ± B ) = t r ( A ) ± t r ( B ) tr(A \pm B) = tr(A) \pm tr(B) tr(A±B)=tr(A)±tr(B)。
- TR4(对矩阵乘法交换律):对于 ∀ A ∈ F m × n , B ∈ F n × m \forall A \in \mathbb{F}^{m \times n}, B \in \mathbb{F}^{n \times m} ∀A∈Fm×n,B∈Fn×m, 都有 t r ( A B ) = t r ( B A ) tr(AB) = tr(BA) tr(AB)=tr(BA)。
- TR5(对矩阵乘法/逐元素乘法交换律):对于 ∀ A , B , C ∈ F m × n \forall A,B,C \in \mathbb{F}^{m \times n} ∀A,B,C∈Fm×n, 有 t r ( A T ( B ⊙ C ) ) = t r ( ( A ⊙ B ) T C ) tr\left( A^T(B \odot C) \right) = tr \left( (A \odot B)^T C \right) tr(AT(B⊙C))=tr((A⊙B)TC)。
常用的函数[2]
f f f | d f \mathrm{d} f df | ∂ f ∂ X \frac{\partial f}{\partial X} ∂X∂f |
---|---|---|
t r ( X ) tr(X) tr(X) | t r ( I d X ) tr(I \mathrm{d}X) tr(IdX) | I I I |
t r ( X T ) tr(X^T) tr(XT) | 2 t r ( X T d X ) 2tr(X^T \mathrm{d}X) 2tr(XTdX) | 2 X 2X 2X |
t r ( X 2 ) tr(X^2) tr(X2) | 2 t r ( X d X ) 2tr(X \mathrm{d}X) 2tr(XdX) | 2 X T 2 X^T 2XT |
t r ( A X ) tr(A X) tr(AX) | t r ( A d X ) tr(A \mathrm{d}X) tr(AdX) | A T A^T AT |
t r ( X T A X ) tr(X^T A X) tr(XTAX) | t r ( X T ( A + A T ) d X ) tr(X^T(A + A^T) \mathrm{d}X) tr(XT(A+AT)dX) | ( A + A T ) X (A + A^T)X (A+AT)X |
t r ( X A X T ) tr(X A X^T) tr(XAXT) | t r ( ( A + A T ) X T d X ) tr((A + A^T)X^T \mathrm{d}X) tr((A+AT)XTdX) | X ( A + A T ) X(A + A^T) X(A+AT) |
t r ( X A X ) tr(X A X) tr(XAX) | t r ( ( A X + X A ) d X ) tr((AX + XA) \mathrm{d}X) tr((AX+XA)dX) | X T A T + A T X T X^T A^T + A^T X^T XTAT+ATXT |
t r ( A X − 1 ) tr(A X^{-1}) tr(AX−1) | − t r ( X − 1 A X − 1 d X ) -tr(X^{-1}AX^{-1} \mathrm{d}X) −tr(X−1AX−1dX) | − ( X − 1 A X − 1 ) T -\left( X^{-1} A X^{-1} \right)^T −(X−1AX−1)T |
t r ( X A X B ) tr(X A X B) tr(XAXB) | t r ( ( A X B + B X A ) d X ) tr((AXB + BXA) \mathrm{d}X) tr((AXB+BXA)dX) | ( A X B + B X A ) T \left( A X B + B X A \right)^T (AXB+BXA)T |
t r ( X A X T B ) tr(X A X^T B) tr(XAXTB) | t r ( ( A X T B + A T X T B T ) d X ) tr((A X^T B + A^T X^T B^T) \mathrm{d}X) tr((AXTB+ATXTBT)dX) | B T X A T + B X A B^T X A^T + B X A BTXAT+BXA |
- TM1:对于
∀
X
∈
F
n
×
n
\forall X \in \mathbb{F}^{n \times n}
∀X∈Fn×n,
有 ∂ t r ( X ) ∂ X = I \frac{\partial tr(X)}{\partial X} = I ∂X∂tr(X)=I。 - TM2:对于 ∀ X ∈ F n × m , A ∈ F m × n \forall X \in \mathbb{F}^{n \times m}, A \in \mathbb{F}^{m \times n} ∀X∈Fn×m,A∈Fm×n,有 ∂ t r ( X A ) ∂ X = ∂ t r ( A X ) ∂ X = A T \frac{\partial tr(XA)}{\partial X} = \frac{\partial tr(AX)}{\partial X} = A^T ∂X∂tr(XA)=∂X∂tr(AX)=AT。
- TM3:对于 ∀ X ∈ F n × m , A ∈ F n × n \forall X \in \mathbb{F}^{n \times m}, A \in \mathbb{F}^{n \times n} ∀X∈Fn×m,A∈Fn×n, 有 ∂ t r ( X T A X ) ∂ X = ( A + A T ) X \frac{\partial tr(X^T A X)}{\partial X} = (A+A^T)X ∂X∂tr(XTAX)=(A+AT)X。
- TM4:对于 ∀ X , A ∈ F n × n \forall X, A \in \mathbb{F}^{n \times n} ∀X,A∈Fn×n, 有 ∂ t r ( X − 1 A ) ∂ X = − X − 1 A T X − 1 \frac{\partial tr(X^{-1} A)}{\partial X} = - X^{-1} A^T X^{-1} ∂X∂tr(X−1A)=−X−1ATX−1。
这里为了描述方便,用 x i X , j X x_{i_X,j_X} xiX,jX表示矩阵 X X X的第 i i i、 j j j列元素。
证TM1,
∂
t
r
(
X
)
∂
X
=
(
∂
t
r
(
X
)
∂
x
i
X
,
j
X
)
n
×
n
=
(
∂
∂
x
i
X
,
j
X
(
∑
i
=
1
n
x
i
,
i
)
)
n
×
n
=
(
∂
x
i
X
,
i
X
∂
x
i
X
,
j
X
)
n
×
n
=
I
.
\begin{aligned} \frac{\partial tr(X)}{\partial X} = & \left( \frac{\partial tr(X)}{\partial x_{i_X,j_X}} \right)_{n \times n} \\ = & \left( \frac{\partial }{\partial x_{i_X,j_X}} \left( \sum_{i=1}^{n}x_{i,i} \right) \right)_{n \times n} \\ = & \left( \frac{\partial x_{i_X,i_X}}{\partial x_{i_X,j_X}}\right)_{n \times n} \\ = & I. \end{aligned}
∂X∂tr(X)====(∂xiX,jX∂tr(X))n×n(∂xiX,jX∂(i=1∑nxi,i))n×n(∂xiX,jX∂xiX,iX)n×nI.
证TM2,
∂
t
r
(
X
A
)
∂
X
=
(
∂
t
r
(
X
A
)
∂
x
i
X
,
j
X
)
n
×
m
=
(
∂
∂
x
i
X
,
j
X
(
∑
i
=
1
n
∑
k
=
1
m
x
i
,
k
a
k
,
i
)
)
n
×
m
=
(
∂
x
i
X
,
j
X
a
j
X
,
i
X
∂
x
i
X
,
j
X
)
n
×
m
=
(
a
j
X
,
i
X
)
n
×
m
=
A
T
.
\begin{aligned} \frac{\partial tr(XA)}{\partial X} = & \left( \frac{\partial tr(XA)}{\partial x_{i_X,j_X}} \right)_{n \times m} \\ = & \left( \frac{\partial }{\partial x_{i_X,j_X}} \left( \sum_{i=1}^{n} \sum_{k=1}^{m} x_{i,k}a_{k,i} \right) \right)_{n \times m} \\ = & \left( \frac{\partial x_{i_X,j_X}a_{j_X,i_X}}{\partial x_{i_X,j_X}}\right)_{n \times m} \\ = & \left( a_{j_X,i_X} \right)_{n \times m} \\ = & A^T. \end{aligned}
∂X∂tr(XA)=====(∂xiX,jX∂tr(XA))n×m(∂xiX,jX∂(i=1∑nk=1∑mxi,kak,i))n×m(∂xiX,jX∂xiX,jXajX,iX)n×m(ajX,iX)n×mAT.
证TM3,记
X
=
(
x
i
X
1
,
j
X
1
)
n
×
m
,
X
T
=
(
x
j
X
1
,
i
X
1
)
m
×
n
,
A
=
(
a
j
A
,
i
A
)
n
×
n
,
X=\left( x_{i_{X1},j_{X1}} \right)_{n \times m}, X^T=\left( x_{j_{X1},i_{X1}} \right)_{m \times n}, A=\left( a_{j_{A},i_{A}} \right)_{n \times n},
X=(xiX1,jX1)n×m,XT=(xjX1,iX1)m×n,A=(ajA,iA)n×n,
X
T
A
=
(
∑
i
=
1
n
x
j
X
1
,
i
a
i
,
j
A
)
m
×
n
,
X
T
A
X
=
(
∑
j
=
1
n
∑
i
=
1
n
x
j
X
1
,
i
a
i
,
j
x
j
,
j
X
2
)
m
×
m
.
X^T A = \left( \sum_{i=1}^{n} x_{j_{X1},i} a_{i,j_{A}} \right)_{m \times n}, X^T A X = \left( \sum_{j=1}^{n} \sum_{i=1}^{n} x_{j_{X1},i} a_{i,j}x_{j,j_{X2}} \right)_{m \times m}.
XTA=(i=1∑nxjX1,iai,jA)m×n,XTAX=(j=1∑ni=1∑nxjX1,iai,jxj,jX2)m×m.
A
X
=
(
∑
i
=
1
n
a
i
A
,
i
x
i
,
j
X
)
n
×
m
,
A
T
X
=
(
∑
i
=
1
n
a
i
,
j
A
x
i
,
j
X
)
n
×
m
,
A X = \left( \sum_{i=1}^{n} a_{i_A,i}x_{i,j_{X}} \right)_{n \times m}, A^T X = \left( \sum_{i=1}^{n} a_{i,j_A}x_{i,j_{X}} \right)_{n \times m},
AX=(i=1∑naiA,ixi,jX)n×m,ATX=(i=1∑nai,jAxi,jX)n×m,
因此
t
r
(
X
T
A
X
)
=
∑
k
=
1
m
∑
j
=
1
n
∑
i
=
1
n
x
k
,
i
a
i
,
j
x
j
,
k
.
tr(X^T A X) = \sum_{k=1}^{m} \sum_{j=1}^{n} \sum_{i=1}^{n} x_{k,i} a_{i,j}x_{j,k} .
tr(XTAX)=k=1∑mj=1∑ni=1∑nxk,iai,jxj,k.
∂
t
r
(
X
T
A
X
)
∂
X
=
(
∂
t
r
(
X
T
A
X
)
∂
x
i
X
,
j
X
)
n
×
m
=
(
∂
∂
x
i
X
,
j
X
(
∑
k
=
1
m
∑
j
=
1
n
∑
i
=
1
n
x
k
,
i
a
i
,
j
x
j
,
k
)
)
n
×
m
=
(
∂
∂
x
i
X
,
j
X
(
∑
j
=
1
n
x
i
X
,
j
X
a
j
X
,
j
x
j
,
i
X
)
+
∂
∂
x
i
X
,
j
X
(
∑
i
=
1
n
x
j
X
,
i
a
i
,
i
X
x
i
X
,
j
X
)
)
n
×
m
=
(
(
∑
j
=
1
n
a
j
X
,
j
x
j
,
i
X
)
+
(
∑
i
=
1
n
x
j
X
,
i
a
i
,
i
X
)
)
n
×
m
=
A
X
+
A
T
X
=
(
A
+
A
T
)
X
.
\begin{aligned} \frac{\partial tr(X^T A X)}{\partial X} = & \left( \frac{\partial tr(X^T A X)}{\partial x_{i_X, j_X}} \right)_{n \times m} \\ = & \left( \frac{\partial }{\partial x_{i_X, j_X}} \left( \sum_{k=1}^{m} \sum_{j=1}^{n} \sum_{i=1}^{n} x_{k,i} a_{i,j}x_{j,k} \right) \right)_{n \times m} \\ = & \left( \frac{\partial }{\partial x_{i_X, j_X}} \left( \sum_{j=1}^{n} x_{i_X,j_X} a_{j_X,j}x_{j,i_{X}} \right) + \frac{\partial }{\partial x_{i_X, j_X}} \left( \sum_{i=1}^{n} x_{j_X,i} a_{i,i_X}x_{i_X,j_X} \right) \right)_{n \times m} \\ = & \left( \left( \sum_{j=1}^{n} a_{j_X,j}x_{j,i_{X}} \right) + \left( \sum_{i=1}^{n} x_{j_X,i} a_{i,i_X} \right) \right)_{n \times m} \\ = & A X + A^T X \\ = & (A + A^T) X. \end{aligned}
∂X∂tr(XTAX)======(∂xiX,jX∂tr(XTAX))n×m(∂xiX,jX∂(k=1∑mj=1∑ni=1∑nxk,iai,jxj,k))n×m(∂xiX,jX∂(j=1∑nxiX,jXajX,jxj,iX)+∂xiX,jX∂(i=1∑nxjX,iai,iXxiX,jX))n×m((j=1∑najX,jxj,iX)+(i=1∑nxjX,iai,iX))n×mAX+ATX(A+AT)X.
6.例题
这里的例题均摘录自[3]。
【例1】, f = a ⃗ T X b ⃗ , a ⃗ ∈ F m × 1 , X ∈ F m × n , b ⃗ ∈ F n × 1 f = \vec{a}^T X \vec{b}, \vec{a} \in \mathbb{F}^{m \times 1}, X \in \mathbb{F}^{m \times n}, \vec{b} \in \mathbb{F}^{n \times 1} f=aTXb,a∈Fm×1,X∈Fm×n,b∈Fn×1, 求 ∂ f ∂ X \frac{\partial f}{\partial X} ∂X∂f。
【解】 ∂ f ∂ X = ∂ t r ( a ⃗ T X b ⃗ ) ∂ X = T R 4 ∂ t r ( b ⃗ a ⃗ T X ) ∂ X = T M 2 a ⃗ b ⃗ T \frac{\partial f}{\partial X} = \frac{\partial tr\left( \vec{a}^T X \vec{b} \right)}{\partial X} \overset{TR4}{=} \frac{\partial tr\left( \vec{b} \vec{a}^T X \right)}{\partial X} \overset{TM2}{=} \vec{a} \vec{b}^T ∂X∂f=∂X∂tr(aTXb)=TR4∂X∂tr(baTX)=TM2abT。
【例2】 f = a ⃗ T e x p ( X b ⃗ ) , a ⃗ ∈ F m × 1 , X ∈ F m × n , b ⃗ ∈ F n × 1 f = \vec{a}^T exp(X \vec{b}), \vec{a} \in \mathbb{F}^{m \times 1}, X \in \mathbb{F}^{m \times n}, \vec{b} \in \mathbb{F}^{n \times 1} f=aTexp(Xb),a∈Fm×1,X∈Fm×n,b∈Fn×1,求 ∂ f ∂ X \frac{\partial f}{\partial X} ∂X∂f。
【解】 先采用微分算子操作 d f = a ⃗ T ( e x p ( X b ⃗ ) ⊙ ( d X b ⃗ ) ) \mathrm{d}f = \vec{a}^T \left( exp(X \vec{b}) \odot (\mathrm{d}X \vec{b}) \right) df=aT(exp(Xb)⊙(dXb))。
两边取迹,然后凑成TM2形式。
d
f
=
t
r
(
a
⃗
T
(
e
x
p
(
X
b
⃗
)
⊙
(
d
X
b
⃗
)
)
)
=
T
R
5
t
r
(
(
a
⃗
⊙
e
x
p
(
X
b
⃗
)
)
T
(
d
X
b
⃗
)
)
=
T
R
4
t
r
(
b
⃗
(
a
⃗
⊙
e
x
p
(
X
b
⃗
)
)
T
d
X
)
\begin{aligned} \mathrm{d}f = & tr \left( \vec{a}^T \left( exp(X \vec{b}) \odot (\mathrm{d}X \vec{b}) \right) \right) \\ \overset{TR5}{=} & tr \left( \left( \vec{a} \odot exp(X \vec{b}) \right)^T (\mathrm{d}X \vec{b}) \right) \\ \overset{TR4}{=} & tr \left( \vec{b} \left( \vec{a} \odot exp(X \vec{b}) \right)^T \mathrm{d}X \right) \end{aligned}
df==TR5=TR4tr(aT(exp(Xb)⊙(dXb)))tr((a⊙exp(Xb))T(dXb))tr(b(a⊙exp(Xb))TdX)
得到
∂
f
∂
X
=
(
b
⃗
(
a
⃗
⊙
e
x
p
(
X
b
⃗
)
)
T
)
T
=
(
a
⃗
⊙
e
x
p
(
X
b
⃗
)
)
b
⃗
T
\frac{\partial f}{\partial X} = \left( \vec{b} \left( \vec{a} \odot exp(X \vec{b}) \right)^T \right)^T = \left( \vec{a} \odot exp(X \vec{b}) \right) \vec{b}^T
∂X∂f=(b(a⊙exp(Xb))T)T=(a⊙exp(Xb))bT。
【例3】 f = t r ( Y T M Y ) , Y = σ ( W X ) f = tr\left( Y^T M Y \right), Y = \sigma \left( WX \right) f=tr(YTMY),Y=σ(WX),求 ∂ f ∂ X \frac{\partial f}{\partial X} ∂X∂f。其中 W ∈ F l × m , X ∈ F m × n , Y ∈ F l × n , M ∈ F l × l W \in \mathrm{F}^{l \times m}, X \in \mathrm{F}^{m \times n}, Y \in \mathrm{F}^{l \times n}, M \in \mathrm{F}^{l \times l} W∈Fl×m,X∈Fm×n,Y∈Fl×n,M∈Fl×l, σ \sigma σ是逐元素函数, f f f是标量。
【解】 先求
∂
f
∂
Y
\frac{\partial f}{\partial Y}
∂Y∂f部分,
∂
f
∂
Y
=
(
M
+
M
T
)
Y
.
\frac{\partial f}{\partial Y} = \left( M + M^T \right)Y.
∂Y∂f=(M+MT)Y.
得到
d
f
\mathrm{d}f
df与
d
Y
\mathrm{d}Y
dY的关系
d
f
=
t
r
(
∂
f
∂
Y
T
d
Y
)
=
t
r
(
Y
T
(
M
+
M
T
)
d
Y
)
\mathrm{d}f = tr\left( \frac{\partial f}{\partial Y}^T \mathrm{d}Y \right) = tr\left( Y^T \left( M + M^T \right) \mathrm{d}Y \right)
df=tr(∂Y∂fTdY)=tr(YT(M+MT)dY)。
再求
d
Y
\mathrm{d}Y
dY,
d
Y
=
D
7
σ
′
(
W
X
)
⊙
d
(
W
X
)
=
σ
′
(
W
X
)
⊙
(
W
d
X
)
.
\begin{aligned} \mathrm{d}Y \overset{D7}{=} & \sigma^{'}(W X) \odot \mathrm{d}(WX) \\ = & \sigma^{'}(W X) \odot \left( W \mathrm{d}X \right). \end{aligned}
dY=D7=σ′(WX)⊙d(WX)σ′(WX)⊙(WdX).
合并得到
d
f
=
t
r
(
Y
T
(
M
+
M
T
)
σ
′
(
W
X
)
⊙
(
W
d
X
)
)
=
T
R
5
t
r
(
(
(
M
+
M
T
)
Y
⊙
σ
′
(
W
X
)
)
T
W
d
X
)
.
\begin{aligned} \mathrm{d}f & = tr \left( Y^T \left( M + M^T \right) \sigma^{'}(W X) \odot \left( W \mathrm{d}X \right) \right) \\ \overset{TR5}{=} & tr \left( \left( (M + M^T)Y \odot \sigma^{'}(W X) \right)^T W \mathrm{d}X \right). \end{aligned}
df=TR5=tr(YT(M+MT)σ′(WX)⊙(WdX))tr(((M+MT)Y⊙σ′(WX))TWdX).
得
∂
f
∂
X
=
W
T
(
(
M
+
M
T
)
Y
⊙
σ
′
(
W
X
)
)
\frac{\partial f}{\partial X}= W^T \left( (M + M^T)Y \odot \sigma^{'}(W X) \right)
∂X∂f=WT((M+MT)Y⊙σ′(WX))。
【例4】 l = ∥ X w ⃗ − y ⃗ ∥ 2 , y ⃗ ∈ F m × 1 , X ∈ F m × n , w ⃗ ∈ F m × 1 l = \| X \vec{w} - \vec{y} \|^2, \vec{y} \in \mathbb{F}^{m \times 1}, X \in \mathbb{F}^{m \times n}, \vec{w} \in \mathbb{F}^{m \times 1} l=∥Xw−y∥2,y∈Fm×1,X∈Fm×n,w∈Fm×1,求 w ⃗ \vec{w} w的最小二乘估计。
【解】
l
=
∥
X
w
⃗
−
y
⃗
∥
2
=
(
X
w
⃗
−
y
⃗
)
T
(
X
w
⃗
−
y
⃗
)
=
(
w
⃗
T
X
T
−
y
⃗
T
)
(
X
w
⃗
−
y
⃗
)
=
w
⃗
T
X
T
X
w
⃗
−
w
⃗
T
X
T
y
⃗
−
y
⃗
T
X
w
⃗
+
y
⃗
T
y
⃗
.
\begin{aligned} l = & \| X \vec{w} - \vec{y} \|^2 \\ = & \left( X \vec{w} - \vec{y} \right)^T \left( X \vec{w} - \vec{y} \right) \\ = & \left( \vec{w}^T X^T - \vec{y}^T \right) \left( X \vec{w} - \vec{y} \right) \\ = & \vec{w}^T X^T X \vec{w} - \vec{w}^T X^T \vec{y} - \vec{y}^T X \vec{w} + \vec{y}^T \vec{y}. \end{aligned}
l====∥Xw−y∥2(Xw−y)T(Xw−y)(wTXT−yT)(Xw−y)wTXTXw−wTXTy−yTXw+yTy.
∂
l
∂
w
⃗
=
∂
t
r
(
l
)
∂
w
⃗
=
∂
∂
w
⃗
t
r
(
w
⃗
T
X
T
X
w
⃗
−
w
⃗
T
X
T
y
⃗
−
y
⃗
T
X
w
⃗
+
y
⃗
T
y
⃗
)
=
∂
∂
w
⃗
t
r
(
w
⃗
T
X
T
X
w
⃗
)
−
∂
∂
w
⃗
t
r
(
2
y
⃗
T
X
w
⃗
)
=
2
(
X
T
X
)
w
⃗
−
2
X
T
y
⃗
=
0.
\begin{aligned} \frac{\partial l}{\partial \vec{w}} = & \frac{\partial tr(l)}{\partial \vec{w}} \\ = & \frac{\partial }{\partial \vec{w}} tr \left( \vec{w}^T X^T X \vec{w} - \vec{w}^T X^T \vec{y} - \vec{y}^T X \vec{w} + \vec{y}^T \vec{y} \right) \\ = & \frac{\partial }{\partial \vec{w}} tr \left( \vec{w}^T X^T X \vec{w} \right) - \frac{\partial }{\partial \vec{w}} tr \left( 2\vec{y}^T X \vec{w} \right) \\ = & 2(X^T X) \vec{w} - 2X^T \vec{y} \\ = & 0. \end{aligned}
∂w∂l=====∂w∂tr(l)∂w∂tr(wTXTXw−wTXTy−yTXw+yTy)∂w∂tr(wTXTXw)−∂w∂tr(2yTXw)2(XTX)w−2XTy0.
得
w
⃗
=
(
X
T
X
)
−
1
X
T
y
⃗
\vec{w} = (X^T X)^{-1} X^T \vec{y}
w=(XTX)−1XTy。
【例5】 样本
x
⃗
1
,
⋯
,
x
⃗
N
∼
N
(
μ
⃗
,
Σ
)
\vec{x}_1,\cdots,\vec{x}_N \thicksim \mathcal{N}\left( \vec{\mu}, \Sigma \right)
x1,⋯,xN∼N(μ,Σ),
求方差
Σ
\Sigma
Σ的极大似然估计。
【解】 对数似然函数为 l = l n ∣ Σ ∣ + 1 N ∑ i = 1 N ( x ⃗ i − x ⃗ ˉ ) T Σ − 1 ( x ⃗ i − x ⃗ ˉ ) . l = ln|\Sigma| + \frac{1}{N}\sum_{i=1}^{N} \left( \vec{x}_i - \bar{\vec{x}} \right)^T\Sigma^{-1} \left( \vec{x}_i - \bar{\vec{x}} \right). l=ln∣Σ∣+N1∑i=1N(xi−xˉ)TΣ−1(xi−xˉ).
因此
∂
l
∂
Σ
=
∂
∂
Σ
(
l
n
∣
Σ
∣
+
1
N
∑
i
=
1
N
(
x
⃗
i
−
x
⃗
ˉ
)
T
Σ
−
1
(
x
⃗
i
−
x
⃗
ˉ
)
)
=
D
M
2
(
Σ
−
1
)
T
+
∂
∂
Σ
t
r
(
1
N
∑
i
=
1
N
(
x
⃗
i
−
x
⃗
ˉ
)
T
Σ
−
1
(
x
⃗
i
−
x
⃗
ˉ
)
)
=
(
Σ
−
1
)
T
+
1
N
∑
i
=
1
N
∂
∂
Σ
t
r
(
(
x
⃗
i
−
x
⃗
ˉ
)
(
Σ
−
1
)
T
(
x
⃗
i
−
x
⃗
ˉ
)
T
)
=
(
Σ
−
1
)
T
+
1
N
∑
i
=
1
N
∂
∂
Σ
t
r
(
(
Σ
−
1
)
T
(
x
⃗
i
−
x
⃗
ˉ
)
T
(
x
⃗
i
−
x
⃗
ˉ
)
)
=
(
Σ
−
1
)
T
−
1
N
∑
i
=
1
N
(
(
Σ
−
1
)
T
(
x
⃗
i
−
x
⃗
ˉ
)
(
x
⃗
i
−
x
⃗
ˉ
)
T
(
Σ
−
1
)
T
)
=
(
Σ
−
1
)
T
−
(
Σ
−
1
)
T
(
1
N
∑
i
=
1
N
(
x
⃗
i
−
x
⃗
ˉ
)
(
x
⃗
i
−
x
⃗
ˉ
)
T
)
(
Σ
−
1
)
T
=
(
Σ
−
1
)
T
−
(
Σ
−
1
)
T
S
2
(
Σ
−
1
)
T
=
(
Σ
−
1
−
Σ
−
1
S
2
Σ
−
1
)
T
=
0.
\begin{aligned} \frac{\partial l}{\partial \Sigma} = & \frac{\partial}{\partial \Sigma} \left( ln|\Sigma| + \frac{1}{N}\sum_{i=1}^{N} \left( \vec{x}_i - \bar{\vec{x}} \right)^T \Sigma^{-1} \left( \vec{x}_i - \bar{\vec{x}} \right) \right) \\ \overset{DM2}{=} & \left( \Sigma^{-1} \right)^T + \frac{\partial}{\partial \Sigma} tr \left( \frac{1}{N}\sum_{i=1}^{N} \left( \vec{x}_i - \bar{\vec{x}} \right)^T \Sigma^{-1} \left( \vec{x}_i - \bar{\vec{x}} \right) \right) \\ = & \left( \Sigma^{-1} \right)^T + \frac{1}{N}\sum_{i=1}^{N} \frac{\partial}{\partial \Sigma} tr \left( \left( \vec{x}_i - \bar{\vec{x}} \right) \left( \Sigma^{-1} \right)^T \left( \vec{x}_i - \bar{\vec{x}} \right)^T \right) \\ = & \left( \Sigma^{-1} \right)^T + \frac{1}{N}\sum_{i=1}^{N} \frac{\partial}{\partial \Sigma} tr \left( \left( \Sigma^{-1} \right)^T \left( \vec{x}_i - \bar{\vec{x}} \right)^T \left( \vec{x}_i - \bar{\vec{x}} \right) \right) \\ = & \left( \Sigma^{-1} \right)^T - \frac{1}{N}\sum_{i=1}^{N} \left( \left( \Sigma^{-1} \right)^T \left( \vec{x}_i - \bar{\vec{x}} \right) \left( \vec{x}_i - \bar{\vec{x}} \right)^T \left( \Sigma^{-1} \right)^T \right) \\ = & \left( \Sigma^{-1} \right)^T - \left( \Sigma^{-1} \right)^T \left( \frac{1}{N}\sum_{i=1}^{N} \left( \vec{x}_i - \bar{\vec{x}} \right) \left( \vec{x}_i - \bar{\vec{x}} \right)^T \right) \left( \Sigma^{-1} \right)^T \\ = & \left( \Sigma^{-1} \right)^T - \left( \Sigma^{-1} \right)^T S^2 \left( \Sigma^{-1} \right)^T \\ = & \left( \Sigma^{-1} - \Sigma^{-1} S^2 \Sigma^{-1} \right)^T \\ = & 0. \end{aligned}
∂Σ∂l==DM2=======∂Σ∂(ln∣Σ∣+N1i=1∑N(xi−xˉ)TΣ−1(xi−xˉ))(Σ−1)T+∂Σ∂tr(N1i=1∑N(xi−xˉ)TΣ−1(xi−xˉ))(Σ−1)T+N1i=1∑N∂Σ∂tr((xi−xˉ)(Σ−1)T(xi−xˉ)T)(Σ−1)T+N1i=1∑N∂Σ∂tr((Σ−1)T(xi−xˉ)T(xi−xˉ))(Σ−1)T−N1i=1∑N((Σ−1)T(xi−xˉ)(xi−xˉ)T(Σ−1)T)(Σ−1)T−(Σ−1)T(N1i=1∑N(xi−xˉ)(xi−xˉ)T)(Σ−1)T(Σ−1)T−(Σ−1)TS2(Σ−1)T(Σ−1−Σ−1S2Σ−1)T0.
得到方差估计
Σ
=
S
2
\Sigma = S^2
Σ=S2。
【例6】 l = − y ⃗ T l o g s o f t m a x ( W x ⃗ ) , y ⃗ ∈ F m × 1 , W ∈ F m × n , x ⃗ ∈ F n × 1 l = - \vec{y}^T log softmax(W \vec{x}), \vec{y} \in \mathbb{F}^{m \times 1}, W \in \mathbb{F}^{m \times n}, \vec{x} \in \mathbb{F}^{n \times 1} l=−yTlogsoftmax(Wx),y∈Fm×1,W∈Fm×n,x∈Fn×1。求 ∂ l ∂ W \frac{\partial l}{\partial W} ∂W∂l。其中 y ⃗ \vec{y} y只有一个元素为 1 1 1,其他都是 0 0 0。
【解】 首先,对于
u
⃗
∈
F
n
×
1
,
c
∈
F
1
\vec{u} \in \mathbb{F}^{n \times 1}, c \in \mathbb{F}^{1}
u∈Fn×1,c∈F1,
有
l
o
g
(
u
⃗
c
)
=
l
o
g
(
u
⃗
)
−
1
⃗
l
o
g
(
c
)
log(\frac{\vec{u}}{c}) = log(\vec{u}) - \vec{1}log(c)
log(cu)=log(u)−1log(c)。
因此
l
=
−
y
⃗
T
l
o
g
s
o
f
t
m
a
x
(
W
x
⃗
)
=
−
y
⃗
T
l
o
g
(
e
x
p
(
W
x
⃗
)
1
⃗
T
e
x
p
(
W
x
⃗
)
)
=
−
y
⃗
T
(
W
x
⃗
−
1
⃗
l
o
g
(
1
⃗
T
e
x
p
(
W
x
⃗
)
)
)
=
−
y
⃗
T
W
x
⃗
+
l
o
g
(
1
⃗
T
e
x
p
(
W
x
⃗
)
)
.
\begin{aligned} l = & - \vec{y}^T log softmax(W \vec{x}) \\ = & - \vec{y}^T log \left( \frac{exp(W \vec{x})}{\vec{1}^T exp(W \vec{x})} \right) \\ = & - \vec{y}^T \left( W \vec{x} - \vec{1} log \left( \vec{1}^T exp(W \vec{x} ) \right)\right) \\ = & - \vec{y}^T W \vec{x} + log \left( \vec{1}^T exp(W \vec{x} ) \right). \end{aligned}
l====−yTlogsoftmax(Wx)−yTlog(1Texp(Wx)exp(Wx))−yT(Wx−1log(1Texp(Wx)))−yTWx+log(1Texp(Wx)).
第一部分
∂
∂
W
(
−
y
⃗
T
W
x
⃗
)
=
∂
∂
W
t
r
(
−
x
⃗
y
⃗
T
W
)
=
−
y
⃗
x
⃗
T
.
\frac{\partial }{\partial W} \left( - \vec{y}^T W \vec{x} \right) = \frac{\partial }{\partial W} tr \left( - \vec{x} \vec{y}^T W \right) = - \vec{y} \vec{x}^T.
∂W∂(−yTWx)=∂W∂tr(−xyTW)=−yxT.
第二部分
d
(
l
o
g
(
1
⃗
T
e
x
p
(
W
x
⃗
)
)
)
=
d
t
r
(
l
o
g
(
1
⃗
T
e
x
p
(
W
x
⃗
)
)
)
=
d
t
r
(
1
⃗
T
(
e
x
p
(
W
x
⃗
)
⊙
(
d
W
x
⃗
)
)
1
⃗
T
e
x
p
(
W
x
⃗
)
)
=
d
t
r
(
(
1
⃗
⊙
e
x
p
(
W
x
⃗
)
T
)
(
d
W
x
⃗
)
1
⃗
T
e
x
p
(
W
x
⃗
)
)
=
d
t
r
(
x
⃗
e
x
p
(
W
x
⃗
)
T
(
d
W
)
1
⃗
T
e
x
p
(
W
x
⃗
)
)
\begin{aligned} \mathrm{d} \left( log \left( \vec{1}^T exp(W \vec{x} ) \right) \right) = & \mathrm{d} tr \left( log \left( \vec{1}^T exp(W \vec{x} ) \right) \right) \\ = & \mathrm{d} tr \left( \frac{\vec{1}^T \left( exp(W\vec{x}) \odot \left(\mathrm{d}W \vec{x}\right) \right) }{ \vec{1}^T exp(W\vec{x}) } \right) \\ = & \mathrm{d} tr \left( \frac{ \left( \vec{1} \odot exp(W\vec{x})^T \right) \left( \mathrm{d}W \vec{x} \right) }{ \vec{1}^T exp(W\vec{x}) } \right) \\ = & \mathrm{d} tr \left( \frac{ \vec{x} exp(W\vec{x})^T \left( \mathrm{d}W \right) }{ \vec{1}^T exp(W\vec{x}) } \right) \\ \end{aligned}
d(log(1Texp(Wx)))====dtr(log(1Texp(Wx)))dtr(1Texp(Wx)1T(exp(Wx)⊙(dWx)))dtr⎝⎛1Texp(Wx)(1⊙exp(Wx)T)(dWx)⎠⎞dtr(1Texp(Wx)xexp(Wx)T(dW))
故得
∂
l
∂
W
=
−
y
⃗
x
⃗
T
+
s
o
f
t
m
a
x
(
W
x
⃗
)
x
⃗
T
=
(
s
o
f
t
m
a
x
(
W
x
⃗
)
−
y
⃗
)
x
⃗
T
.
\frac{\partial l}{\partial W} = - \vec{y} \vec{x}^T + softmax(W \vec{x})\vec{x}^T = \left( softmax(W \vec{x}) - \vec{y} \right) \vec{x}^T.
∂W∂l=−yxT+softmax(Wx)xT=(softmax(Wx)−y)xT.
【例7】 有样本
(
x
⃗
1
,
y
⃗
1
)
,
(
x
⃗
2
,
y
⃗
2
)
,
⋯
,
(
x
⃗
N
,
y
⃗
N
)
(\vec{x}_1, \vec{y}_1), (\vec{x}_2, \vec{y}_2), \cdots, (\vec{x}_N, \vec{y}_N)
(x1,y1),(x2,y2),⋯,(xN,yN)。
y
⃗
i
∈
F
m
×
1
\vec{y}_i \in \mathbb{F}^{m \times 1}
yi∈Fm×1,
y
⃗
i
\vec{y}_i
yi只有一个元素为
1
1
1,其他都是
0
0
0,
x
⃗
i
∈
F
n
×
1
\vec{x}_i \in \mathbb{F}^{n \times 1}
xi∈Fn×1,
W
1
∈
F
p
×
n
W_1 \in \mathbb{F}^{p \times n}
W1∈Fp×n,
W
2
∈
F
m
×
p
W_2 \in \mathbb{F}^{m \times p}
W2∈Fm×p,
b
⃗
1
∈
F
p
×
1
\vec{b}_1 \in \mathbb{F}^{p \times 1}
b1∈Fp×1,
b
⃗
2
∈
F
m
×
1
\vec{b}_2 \in \mathbb{F}^{m \times 1}
b2∈Fm×1,
a
⃗
1
,
i
=
W
1
x
⃗
i
+
b
⃗
1
\vec{a}_{1,i} = W_1 \vec{x}_i + \vec{b}_1
a1,i=W1xi+b1,
h
1
,
i
⃗
=
σ
(
a
⃗
1
,
i
)
\vec{h_{1,i}} = \sigma (\vec{a}_{1,i})
h1,i=σ(a1,i),
a
⃗
2
,
i
=
W
1
h
⃗
1
,
i
+
b
⃗
2
\vec{a}_{2,i} = W_1 \vec{h}_{1,i} + \vec{b}_2
a2,i=W1h1,i+b2, 定义损失函数为
l
=
−
∑
i
=
1
N
y
⃗
i
T
log
s
o
f
t
m
a
x
(
a
⃗
2
,
i
)
l = - \sum_{i=1}^{N} \vec{y}_i^T \log softmax(\vec{a}_{2,i})
l=−∑i=1NyiTlogsoftmax(a2,i).
【解】 先求损失对第2层输出的微分
∂
l
∂
a
⃗
2
,
i
=
s
o
f
t
m
a
x
(
a
⃗
2
,
i
)
−
y
⃗
i
\frac{ \partial l }{ \partial \vec{a}_{2,i} } = softmax(\vec{a}_{2,i}) - \vec{y}_i
∂a2,i∂l=softmax(a2,i)−yi。
再求损失对第1层输出、连接第1-2层间的权重的微分。这里由于没有定义对矩阵求导的一些链式法则,因此使用导数与微分的关系。
d
l
=
t
r
(
∑
i
=
1
N
(
∂
l
∂
a
⃗
2
,
i
)
T
d
a
⃗
2
,
i
)
=
∑
i
=
1
N
t
r
(
(
∂
l
∂
a
⃗
2
,
i
)
T
d
(
W
2
h
⃗
1
,
i
+
b
⃗
2
)
)
=
∑
i
=
1
N
t
r
(
(
∂
l
∂
a
⃗
2
,
i
)
T
d
(
W
2
)
h
⃗
1
,
i
)
+
∑
i
=
1
N
t
r
(
(
∂
l
∂
a
⃗
2
,
i
)
T
W
2
d
(
h
⃗
1
,
i
)
)
+
∑
i
=
1
N
t
r
(
(
∂
l
∂
a
⃗
2
,
i
)
T
d
(
b
⃗
2
)
)
=
∑
i
=
1
N
t
r
(
h
⃗
1
,
i
(
∂
l
∂
a
⃗
2
,
i
)
T
d
(
W
2
)
)
+
∑
i
=
1
N
t
r
(
(
∂
l
∂
a
⃗
2
,
i
)
T
W
2
d
(
h
⃗
1
,
i
)
)
+
∑
i
=
1
N
t
r
(
(
∂
l
∂
a
⃗
2
,
i
)
T
d
(
b
⃗
2
)
)
.
\begin{aligned} \mathrm{d} l = & tr\left( \sum_{i=1}^{N} \left( \frac{ \partial l }{ \partial \vec{a}_{2,i} } \right)^T \mathrm{d} \vec{a}_{2,i} \right) \\ = & \sum_{i=1}^{N} tr\left( \left( \frac{ \partial l }{ \partial \vec{a}_{2,i} } \right)^T \mathrm{d} \left( W_2 \vec{h}_{1,i} + \vec{b}_2 \right) \right) \\ = & \sum_{i=1}^{N} tr\left( \left( \frac{ \partial l }{ \partial \vec{a}_{2,i} } \right)^T \mathrm{d} \left( W_2 \right) \vec{h}_{1,i} \right) + \sum_{i=1}^{N} tr\left( \left( \frac{ \partial l }{ \partial \vec{a}_{2,i} } \right)^T W_2 \mathrm{d} \left( \vec{h}_{1,i} \right) \right) + \sum_{i=1}^{N} tr\left( \left( \frac{ \partial l }{ \partial \vec{a}_{2,i} } \right)^T \mathrm{d} \left( \vec{b}_2 \right) \right) \\ = & \sum_{i=1}^{N} tr\left( \vec{h}_{1,i} \left( \frac{ \partial l }{ \partial \vec{a}_{2,i} } \right)^T \mathrm{d} \left( W_2 \right) \right) + \sum_{i=1}^{N} tr\left( \left( \frac{ \partial l }{ \partial \vec{a}_{2,i} } \right)^T W_2 \mathrm{d} \left( \vec{h}_{1,i} \right) \right) + \sum_{i=1}^{N} tr\left( \left( \frac{ \partial l }{ \partial \vec{a}_{2,i} } \right)^T \mathrm{d} \left( \vec{b}_2 \right) \right) . \end{aligned}
dl====tr(i=1∑N(∂a2,i∂l)Tda2,i)i=1∑Ntr((∂a2,i∂l)Td(W2h1,i+b2))i=1∑Ntr((∂a2,i∂l)Td(W2)h1,i)+i=1∑Ntr((∂a2,i∂l)TW2d(h1,i))+i=1∑Ntr((∂a2,i∂l)Td(b2))i=1∑Ntr(h1,i(∂a2,i∂l)Td(W2))+i=1∑Ntr((∂a2,i∂l)TW2d(h1,i))+i=1∑Ntr((∂a2,i∂l)Td(b2)).
得到
∂
l
∂
W
2
=
∑
i
=
1
N
∂
l
∂
a
⃗
2
,
i
h
⃗
1
,
i
T
\frac{\partial l}{\partial W_2} = \sum_{i=1}^{N} \frac{ \partial l }{ \partial \vec{a}_{2,i} } \vec{h}_{1,i}^T
∂W2∂l=i=1∑N∂a2,i∂lh1,iT.
∂
l
∂
b
2
=
∑
i
=
1
N
∂
l
∂
a
⃗
2
,
i
.
\frac{\partial l}{\partial b_2} = \sum_{i=1}^{N} \frac{ \partial l }{ \partial \vec{a}_{2,i} }.
∂b2∂l=i=1∑N∂a2,i∂l.
∂
l
∂
h
1
,
i
=
W
2
T
∂
l
∂
a
⃗
2
,
i
.
\frac{\partial l}{\partial h_{1,i}} = W_2^T \frac{ \partial l }{ \partial \vec{a}_{2,i} }.
∂h1,i∂l=W2T∂a2,i∂l.
再求损失对第1层输入的微分。
∂
l
∂
a
⃗
1
,
i
=
∂
l
∂
h
1
,
i
⊙
σ
′
(
a
⃗
1
,
i
)
.
\frac{\partial l}{\partial \vec{a}_{1,i}} = \frac{\partial l}{\partial h_{1,i}} \odot \sigma^{'}(\vec{a}_{1,i}).
∂a1,i∂l=∂h1,i∂l⊙σ′(a1,i).
最后再求损失对连接输入层到第1层的权重的微分。
d
l
=
t
f
(
∑
i
=
1
N
(
∂
l
∂
a
⃗
1
,
i
)
T
d
a
⃗
1
,
i
)
=
t
f
(
∑
i
=
1
N
(
∂
l
∂
a
⃗
1
,
i
)
T
d
(
W
1
x
⃗
i
+
b
⃗
i
)
)
=
t
f
(
∑
i
=
1
N
(
∂
l
∂
a
⃗
1
,
i
)
T
d
W
1
x
⃗
i
)
+
t
f
(
∑
i
=
1
N
(
∂
l
∂
a
⃗
1
,
i
)
T
d
b
⃗
i
)
=
t
f
(
∑
i
=
1
N
x
⃗
i
(
∂
l
∂
a
⃗
1
,
i
)
T
d
W
1
)
+
t
f
(
∑
i
=
1
N
(
∂
l
∂
a
⃗
1
,
i
)
T
d
b
⃗
i
)
\begin{aligned} \mathrm{d} l = & tf \left( \sum_{i=1}^{N} \left( \frac{\partial l}{ \partial \vec{a}_{1,i}} \right)^T \mathrm{d} \vec{a}_{1,i} \right) \\ = & tf \left( \sum_{i=1}^{N} \left( \frac{\partial l}{ \partial \vec{a}_{1,i}} \right)^T \mathrm{d} \left( W_1 \vec{x}_i + \vec{b}_i \right) \right) \\ = & tf \left( \sum_{i=1}^{N} \left( \frac{\partial l}{ \partial \vec{a}_{1,i}} \right)^T \mathrm{d} W_1 \vec{x}_i \right) + tf \left( \sum_{i=1}^{N} \left( \frac{\partial l}{ \partial \vec{a}_{1,i}} \right)^T \mathrm{d} \vec{b}_i \right) \\ = & tf \left( \sum_{i=1}^{N} \vec{x}_i \left( \frac{\partial l}{ \partial \vec{a}_{1,i}} \right)^T \mathrm{d} W_1 \right) + tf \left( \sum_{i=1}^{N} \left( \frac{\partial l}{ \partial \vec{a}_{1,i}} \right)^T \mathrm{d} \vec{b}_i \right) \\ \end{aligned}
dl====tf(i=1∑N(∂a1,i∂l)Tda1,i)tf(i=1∑N(∂a1,i∂l)Td(W1xi+bi))tf(i=1∑N(∂a1,i∂l)TdW1xi)+tf(i=1∑N(∂a1,i∂l)Tdbi)tf(i=1∑Nxi(∂a1,i∂l)TdW1)+tf(i=1∑N(∂a1,i∂l)Tdbi)
得
∂
l
∂
W
1
=
∑
i
=
1
N
∂
l
∂
a
⃗
1
,
i
x
⃗
i
T
\frac{\partial l}{\partial W_1} = \sum_{i=1}^{N} \frac{\partial l}{ \partial \vec{a}_{1,i}} \vec{x}_i^T
∂W1∂l=i=1∑N∂a1,i∂lxiT.
∂
l
∂
b
⃗
1
=
∑
i
=
1
N
∂
l
∂
a
⃗
1
,
i
.
\frac{\partial l}{\partial \vec{b}_1} = \sum_{i=1}^{N} \frac{\partial l}{ \partial \vec{a}_{1,i}}.
∂b1∂l=i=1∑N∂a1,i∂l.
【例8】 将上题给成矩阵形式,
X
=
[
x
⃗
1
,
⋯
,
x
⃗
N
]
X = [\vec{x}_1,\cdots,\vec{x}_N]
X=[x1,⋯,xN],
A
1
=
[
a
⃗
1
,
1
,
⋯
,
a
⃗
1
,
N
]
=
W
1
X
+
b
⃗
1
1
⃗
T
A_1 = [\vec{a}_{1,1},\cdots, \vec{a}_{1,N}] = W_1X+\vec{b}_1 \vec{1}^T
A1=[a1,1,⋯,a1,N]=W1X+b11T,
H
1
=
[
h
⃗
1
,
1
,
⋯
,
h
⃗
1
,
N
]
=
σ
(
A
1
)
H_1 = [\vec{h}_{1,1},\cdots, \vec{h}_{1,N}] = \sigma(A_1)
H1=[h1,1,⋯,h1,N]=σ(A1),
A
2
=
[
a
⃗
2
,
1
,
⋯
,
a
⃗
2
,
N
]
=
W
2
H
1
+
b
⃗
2
1
⃗
T
A_2 = [\vec{a}_{2,1},\cdots, \vec{a}_{2,N}] = W_2H_1+\vec{b}_2 \vec{1}^T
A2=[a2,1,⋯,a2,N]=W2H1+b21T.
【解】 先求损失对第2层输出的微分
∂
l
∂
A
2
=
[
s
o
f
t
m
a
x
(
a
2
,
1
⃗
)
−
y
⃗
1
,
⋯
,
s
o
f
t
m
a
x
(
a
2
,
N
⃗
)
−
y
⃗
N
]
\frac{\partial l}{\partial A_2} = [softmax(\vec{a_{2,1}}) - \vec{y}_1, \cdots, softmax(\vec{a_{2,N}}) - \vec{y}_N]
∂A2∂l=[softmax(a2,1)−y1,⋯,softmax(a2,N)−yN]。
再求损失对第1层输出、连接第1-2层间的权重的微分。这里由于没有定义对矩阵求导的一些链式法则,因此使用导数与微分的关系。
d
l
=
t
f
(
(
∂
l
∂
A
2
)
T
d
A
2
)
=
t
f
(
(
∂
l
∂
A
2
)
T
d
(
W
2
H
1
+
b
⃗
2
1
⃗
T
)
)
=
t
f
(
H
1
(
∂
l
∂
A
2
)
T
d
(
W
2
)
)
+
t
f
(
(
∂
l
∂
A
2
)
T
W
2
d
(
H
1
)
)
+
t
f
(
(
∂
l
∂
A
2
1
⃗
)
T
d
b
⃗
2
)
.
\begin{aligned} \mathrm{d} l = & tf \left( \left( \frac{\partial l}{\partial A_2} \right)^T \mathrm{d} A_2 \right) \\ = & tf \left( \left( \frac{\partial l}{\partial A_2} \right)^T \mathrm{d} \left( W_2H_1+\vec{b}_2 \vec{1}^T \right) \right) \\ = & tf \left( H_1 \left( \frac{\partial l}{\partial A_2} \right)^T \mathrm{d} (W_2) \right) + tf \left( \left( \frac{\partial l}{\partial A_2} \right)^T W_2 \mathrm{d} (H_1) \right) + tf \left( \left( \frac{\partial l}{\partial A_2} \vec{1} \right)^T \mathrm{d} \vec{b}_2 \right). \end{aligned}
dl===tf((∂A2∂l)TdA2)tf((∂A2∂l)Td(W2H1+b21T))tf(H1(∂A2∂l)Td(W2))+tf((∂A2∂l)TW2d(H1))+tf((∂A2∂l1)Tdb2).
得
∂
l
∂
W
2
=
∂
l
∂
A
2
H
1
T
\frac{\partial l}{\partial W_2} = \frac{\partial l}{\partial A_2} H_1^T
∂W2∂l=∂A2∂lH1T.
∂
l
∂
H
1
=
W
2
T
∂
l
∂
A
2
\frac{\partial l}{\partial H_1} = W_2^T \frac{\partial l}{\partial A_2}
∂H1∂l=W2T∂A2∂l.
∂
l
∂
b
⃗
2
=
∂
l
∂
A
2
1
⃗
\frac{\partial l}{\partial \vec{b}_2} = \frac{\partial l}{\partial A_2} \vec{1}
∂b2∂l=∂A2∂l1.
再求损失对第1层输入的微分。
∂
l
∂
A
1
=
∂
l
∂
H
1
⊙
σ
′
(
A
1
)
\frac{\partial l}{\partial A_1} = \frac{\partial l}{\partial H_1} \odot \sigma^{'}(A_1)
∂A1∂l=∂H1∂l⊙σ′(A1).
再求损失对第1层输出、连接第1-2层间的权重的微分。
d
l
=
t
f
(
(
∂
l
∂
A
1
)
T
d
A
1
)
=
t
f
(
(
∂
l
∂
A
1
)
T
d
(
W
1
X
+
b
⃗
1
1
⃗
T
)
)
=
t
f
(
(
∂
l
∂
A
1
)
T
(
d
W
1
)
X
)
+
t
f
(
(
∂
l
∂
A
1
)
T
W
1
(
d
X
)
)
+
t
f
(
(
∂
l
∂
A
1
)
T
(
d
b
⃗
1
)
1
⃗
T
)
=
t
f
(
X
(
∂
l
∂
A
1
)
T
(
d
W
1
)
)
+
t
f
(
(
∂
l
∂
A
1
)
T
W
1
(
d
X
)
)
+
t
f
(
1
⃗
T
(
∂
l
∂
A
1
)
T
(
d
b
⃗
1
)
)
.
\begin{aligned} \mathrm{d} l = & tf\left( \left( \frac{\partial l}{\partial A_1} \right)^T \mathrm{d} A_1 \right) \\ = & tf\left( \left( \frac{\partial l}{\partial A_1} \right)^T \mathrm{d} \left( W_1X+\vec{b}_1 \vec{1}^T \right) \right) \\ = & tf\left( \left( \frac{\partial l}{\partial A_1} \right)^T (\mathrm{d} W_1) X \right) + tf \left( \left( \frac{\partial l}{\partial A_1} \right)^T W_1 (\mathrm{d} X) \right) + tf\left( \left( \frac{\partial l}{\partial A_1} \right)^T ( \mathrm{d} \vec{b}_1 ) \vec{1}^T \right) \\ = & tf\left( X \left( \frac{\partial l}{\partial A_1} \right)^T (\mathrm{d} W_1) \right) + tf \left( \left( \frac{\partial l}{\partial A_1} \right)^T W_1 (\mathrm{d} X) \right) + tf\left( \vec{1}^T \left( \frac{\partial l}{\partial A_1} \right)^T ( \mathrm{d} \vec{b}_1 ) \right). \end{aligned}
dl====tf((∂A1∂l)TdA1)tf((∂A1∂l)Td(W1X+b11T))tf((∂A1∂l)T(dW1)X)+tf((∂A1∂l)TW1(dX))+tf((∂A1∂l)T(db1)1T)tf(X(∂A1∂l)T(dW1))+tf((∂A1∂l)TW1(dX))+tf(1T(∂A1∂l)T(db1)).
得
∂
l
∂
W
1
=
∂
l
∂
A
1
X
T
.
\frac{\partial l}{\partial W_1} = \frac{\partial l}{\partial A_1} X^T.
∂W1∂l=∂A1∂lXT.
∂
l
∂
b
⃗
1
=
∂
l
∂
A
1
1
⃗
.
\frac{\partial l}{\partial \vec{b}_1} = \frac{\partial l}{\partial A_1} \vec{1}.
∂b1∂l=∂A1∂l1.
参考文献
[1] KHENG L W. Matrix differentiation,cs5240 theoretical foundations in multimedia.
[2] 张贤达. 矩阵分析与应用[M]. 北京: 清华大学出版社, 2004: 255-285.
[3] 长躯鬼侠. 矩阵求导术(上).