推导了复合函数梯度的链式法统一形式。
首创了的链式记号,非常易记:分子右挪+分数约分,特别是它强调了链的表达次序,由于矩阵积没有交换律,故该链的次序不可交换。
注:修正了一般教材中的错误次序(在标量时正确)
链式法则
在此基础上,我们讨论复合函数的链式法则(只讨论复合后为标量函数的情况,即 z z z为标量)。
1.当自变量为标量 x x x时,梯度为标量: ∂ z ∂ x \frac{\partial z}{\partial x} ∂x∂z
(1)当中间变量为标量
y
y
y时,即
z
=
g
(
y
)
,
y
=
f
(
x
)
z=g(y),\quad y=f(x)
z=g(y),y=f(x)
∇
x
z
=
∂
z
∂
x
=
∂
y
∂
x
∂
z
∂
y
=
∇
x
y
∇
y
z
\begin{align} {\nabla}_{x}z=\frac{\partial z}{\partial x} & =\frac{\partial y}{\partial x}\frac{\partial z}{\partial y} \tag{A52} \\ & ={\nabla}_{x}y{\nabla}_{y}z \tag{A53} \end{align}
∇xz=∂x∂z=∂x∂y∂y∂z=∇xy∇yz(A52)(A53)
此式即为【西瓜书附录式(A.31)】复合函数的链式法则。
注1:此时变量与函数均为标量,故通常将
∂
\partial
∂替换为
d
\mathrm{d}
d。
注2:为与后面统一起见,我们这里调整了式中偏导的次序,即将通常的次序
∂
z
∂
y
∂
y
∂
x
\frac{\partial z}{\partial y}\frac{\partial y}{\partial x}
∂y∂z∂x∂y调整为
∂
y
∂
x
∂
z
∂
y
\frac{\partial y}{\partial x}\frac{\partial z}{\partial y}
∂x∂y∂y∂z。即:修正了一般教材中的错误次序(在标量时正确)。
简易的记忆方法:让偏导数表达符的左右两边 x , z x,z x,z的左右次序一致,即记 ∇ x z = ∂ x ∂ z {\nabla}_{x}z =\frac{\quad }{\partial x}\frac{\partial z}{\quad } ∇xz=∂x∂z,再在上下空格处插入 y y y的偏微分 ∂ y {\partial y} ∂y,再回到 ∇ {\nabla} ∇符的表达式(A53)。
(2) 当中间变量为向量
y
\boldsymbol{y}
y时,即
z
=
g
(
y
)
,
y
=
f
(
x
)
z=g(\boldsymbol{y}),\quad \boldsymbol{y}=f({x})
z=g(y),y=f(x),有
∇
x
z
=
∂
z
∂
x
=
∑
i
∂
y
i
∂
x
∂
z
∂
y
i
(多元复合函数求导链式法则)
=
(
[
∂
y
i
∂
x
]
)
T
(
[
∂
z
∂
y
i
]
)
=
∂
y
∂
x
∂
z
∂
y
(由式(A46)、式(A47))
=
∇
x
y
∇
y
z
\begin{align} {\nabla}_{x}z & =\frac{\partial z}{\partial {x}}\notag \\ & =\sum_i \frac{\partial y_i}{\partial x} \frac{\partial z}{\partial y_i}\qquad \text{(多元复合函数求导链式法则)}\notag \\ & = \left( \left[ \frac{\partial y_i}{\partial x} \right] \right)^{\mathrm{T}} \left( \left[ \frac{\partial z}{\partial y_i} \right] \right)\notag \\ & =\frac{\partial {\boldsymbol{y}}}{\partial {x}}\frac{\partial z}{\partial {{\boldsymbol{y}}}}\qquad \text{(由式(A46)、式(A47))} \tag{A54} \\ & ={\nabla}_{x}\boldsymbol{y}{\nabla}_{\boldsymbol{y}}z \tag{A55} \end{align}
∇xz=∂x∂z=i∑∂x∂yi∂yi∂z(多元复合函数求导链式法则)=([∂x∂yi])T([∂yi∂z])=∂x∂y∂y∂z(由式(A46)、式(A47))=∇xy∇yz(A54)(A55)
2.当自变量为向量 x \boldsymbol{x} x时,由式(A50),梯度为向量: ∂ z ∂ x \frac{\partial z}{\partial \boldsymbol{x}} ∂x∂z
(1) 当中间变量为标量
y
y
y时,即
z
=
g
(
y
)
,
y
=
f
(
x
)
z=g(y),\quad y=f(\boldsymbol{x})
z=g(y),y=f(x)
∇
x
z
=
∂
z
∂
x
=
(
[
∂
z
∂
x
i
]
)
(由式(A50))
=
(
[
∂
y
∂
x
i
∂
z
∂
y
]
)
(由式(A52))
=
(
[
∂
y
∂
x
i
]
)
∂
z
∂
y
(提取公因子)
=
∇
x
y
∇
y
z
(由式(A46))
\begin{align} {\nabla}_{\boldsymbol{x}}z & =\frac{\partial z}{\partial \boldsymbol{x}}\notag \\ & =\left(\left[\frac{\partial z}{\partial x_i}\right]\right)\quad \text{(由式(A50))}\notag \\ & =\left(\left[\frac{\partial y}{\partial x_i}\frac{\partial z}{\partial y}\right]\right)\quad \text{(由式(A52))}\notag \\ & =\left(\left[\frac{\partial y}{\partial x_i}\right]\right)\frac{\partial z}{\partial y}\quad \text{(提取公因子)}\notag \\ & ={\nabla}_{\boldsymbol{x}}y{\nabla}_{y}z\qquad \text{(由式(A46))} \tag{A56} \end{align}
∇xz=∂x∂z=([∂xi∂z])(由式(A50))=([∂xi∂y∂y∂z])(由式(A52))=([∂xi∂y])∂y∂z(提取公因子)=∇xy∇yz(由式(A46))(A56)
(2) 当中间变量为向量
y
\boldsymbol{y}
y时,即
z
=
g
(
y
)
,
y
=
f
(
x
)
z=g(\boldsymbol{y}),\quad \boldsymbol{y}=f(\boldsymbol{x})
z=g(y),y=f(x)
∇
x
z
=
∂
z
∂
x
=
(
[
∂
z
∂
x
i
]
)
(由式(A46))
=
(
[
∂
y
∂
x
i
∂
z
∂
y
]
)
(由式(A54))
=
(
[
∂
y
∂
x
i
]
)
∂
z
∂
y
(提取公因子)
=
∂
y
∂
x
∂
z
∂
y
(由式(A49))
=
∇
x
y
∇
y
z
\begin{align} {\nabla}_{\boldsymbol{x}}z & =\frac{\partial z}{\partial \boldsymbol{x}}\notag \\ & =\left( \left[ \frac{\partial z}{\partial {x}_i} \right] \right) \qquad \text{(由式(A46))}\notag \\ & =\left( \left[ \frac{\partial {\boldsymbol{y}}}{\partial {x}_i}\frac{\partial z}{\partial {{\boldsymbol{y}}}} \right] \right) \qquad \text{(由式(A54))}\notag \\ & =\left( \left[ \frac{\partial {\boldsymbol{y}}}{\partial {x}_i} \right] \right)\frac{\partial z}{\partial {{\boldsymbol{y}}}}\quad \text{(提取公因子)}\notag \\ & =\frac{\partial {{\boldsymbol{y}}}}{\partial {{\boldsymbol{x}}}}\frac{\partial z}{\partial {{\boldsymbol{y}}}}\qquad \text{(由式(A49))}\notag \\ & ={\nabla}_{\boldsymbol{x}}\boldsymbol{y}{\nabla}_{\boldsymbol{y}}z \tag{A57} \end{align}
∇xz=∂x∂z=([∂xi∂z])(由式(A46))=([∂xi∂y∂y∂z])(由式(A54))=([∂xi∂y])∂y∂z(提取公因子)=∂x∂y∂y∂z(由式(A49))=∇xy∇yz(A57)
综上,复合函数的梯度的链式法则可以统一到
∇
x
z
=
∇
x
y
∇
y
z
\begin{align} {\nabla}_{\boldsymbol{x}}z ={\nabla}_{\boldsymbol{x}}\boldsymbol{y}{\nabla}_{\boldsymbol{y}}z \tag{A58} \end{align}
∇xz=∇xy∇yz(A58)
其中,
∇
x
y
{\nabla}_{\boldsymbol{x}}\boldsymbol{y}
∇xy由式(A51)定义。
写成偏导数形式
∂
z
∂
x
=
∂
y
∂
x
∂
z
∂
y
\begin{align} \frac{\partial z}{\partial \boldsymbol{x}} =\frac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}} \frac{\partial z}{\partial \boldsymbol{y}} \tag{A59} \end{align}
∂x∂z=∂x∂y∂y∂z(A59)
写成易记形式
∂
z
∂
x
=
z
x
=
y
x
⋅
z
y
\begin{align} \frac{\partial z}{\partial \boldsymbol{x}} & =\frac{\qquad z\qquad}{\boldsymbol{x}\qquad\qquad}\notag \\ & =\frac{\qquad\boldsymbol{y}\qquad}{\boldsymbol{x}\qquad\qquad}\cdot \frac{\qquad z\qquad}{\boldsymbol{y}\qquad\qquad} \tag{A60} \end{align}
∂x∂z=xz=xy⋅yz(A60)
简记为 :“分子右挪”+“分数约分”。
注1:复合函数梯度的链式法则的记号:式(A58)、式(A59)、式(A60)是博主首创的链式记号,非常易记:分子右挪+分数约分,特别是它强调了链的表达次序,由于矩阵积没有交换律,故该链的次序不可交换,助记: x ← z = x ← y ← z x\leftarrow z=x\leftarrow y\leftarrow z x←z=x←y←z,当向量 x \boldsymbol{x} x或 y \boldsymbol{y} y退化为标量时即为前面的公式。
注2:“分子右挪”指:
∂
z
∂
x
=
z
x
=
x
⋅
z
\frac{\partial z}{\partial \boldsymbol{x}} =\frac{\ z\ }{\boldsymbol{x}\qquad} =\frac{\qquad}{\boldsymbol{x}\qquad}\cdot \frac{\ z\ }{\qquad}
∂x∂z=x z =x⋅ z
或
∂
z
∂
x
=
∂
x
⋅
∂
z
\frac{\partial z}{\partial \boldsymbol{x}}=\frac{\qquad}{\partial \boldsymbol{x}}\cdot\frac{\partial z}{\qquad}
∂x∂z=∂x⋅∂z。
注3:“分数约分”指添上: ∂ z ∂ x = y ⋅ y \frac{\partial z}{\partial \boldsymbol{x}} =\frac{\boldsymbol{y}}{\qquad}\cdot\frac{\qquad}{\boldsymbol{y}\qquad} ∂x∂z=y⋅y或 ∂ z ∂ x = ∂ y ⋅ ∂ y \frac{\partial z}{\partial \boldsymbol{x}} =\frac{\partial \boldsymbol{y}}{\qquad}\cdot \frac{\qquad}{\partial \boldsymbol{y}} ∂x∂z=∂y⋅∂y。
下面举一应用例,令
z
=
y
T
W
y
,
y
=
A
x
−
b
z=\boldsymbol{y}^\mathrm{T}\mathbf{W}\boldsymbol{y},\quad \boldsymbol{y}=\mathbf{Ax-b}
z=yTWy,y=Ax−b有
∂
z
∂
x
=
∂
y
∂
x
∂
y
T
W
y
∂
y
(由链式法则式(A59))
=
∂
y
∂
x
∂
t
r
(
y
T
W
y
)
∂
y
(将标量函数写成迹(技巧))
=
∂
y
∂
x
(
∂
t
r
(
y
T
W
y
)
∂
y
T
)
T
(上式迹配成【西瓜书附录式(A.29)】)
=
∂
y
∂
x
(
y
T
(
W
+
W
T
)
)
T
(由【西瓜书附录式(A.29)】)
=
∂
y
∂
x
(
W
+
W
T
)
y
=
∂
(
A
x
−
b
)
∂
x
(
W
+
W
T
)
y
=
A
T
(
W
+
W
T
)
(
A
x
−
b
)
(由【西瓜书附录式(A.22)】)
\begin{align} \frac{\partial z}{\partial \boldsymbol{x}} & = \frac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}} \frac{\partial \boldsymbol{y}^\mathrm{T}\mathbf{W}\boldsymbol{y}}{\partial \boldsymbol{y}} \qquad \text{(由链式法则式(A59))}\notag \\ & = \frac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}} \frac{\partial \mathrm{tr}\,(\boldsymbol{y}^\mathrm{T}\mathbf{W}\boldsymbol{y})}{\partial \boldsymbol{y}} \qquad \text{(将标量函数写成迹(技巧))}\notag \\ & = \frac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}} \left(\frac{\partial \mathrm{tr}\,(\boldsymbol{y}^\mathrm{T}\mathbf{W}\boldsymbol{y})}{\partial \boldsymbol{y}^\mathrm{T}} \right)^\mathrm{T} \qquad \text{(上式迹配成【西瓜书附录式(A.29)】)}\notag \\ & = \frac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}} \left( \boldsymbol{y}^\mathrm{T}(\mathbf{W}+\mathbf{W}^\mathrm{T}) \right)^\mathrm{T} \qquad \text{(由【西瓜书附录式(A.29)】)}\notag \\ & = \frac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}} (\mathbf{W}+\mathbf{W}^\mathrm{T})\boldsymbol{y} \notag \\ & = \frac{\partial (\mathbf{Ax-b})}{\partial \boldsymbol{x}} (\mathbf{W}+\mathbf{W}^\mathrm{T})\boldsymbol{y} \notag \\ & =\mathbf{A}^\mathrm{T}(\mathbf{W}+\mathbf{W}^\mathrm{T}) (\mathbf{Ax-b}) \qquad \text{(由【西瓜书附录式(A.22)】)} \tag{A61} \end{align}
∂x∂z=∂x∂y∂y∂yTWy(由链式法则式(A59))=∂x∂y∂y∂tr(yTWy)(将标量函数写成迹(技巧))=∂x∂y(∂yT∂tr(yTWy))T(上式迹配成【西瓜书附录式(A.29)】)=∂x∂y(yT(W+WT))T(由【西瓜书附录式(A.29)】)=∂x∂y(W+WT)y=∂x∂(Ax−b)(W+WT)y=AT(W+WT)(Ax−b)(由【西瓜书附录式(A.22)】)(A61)
当
W
\mathbf{W}
W为对称矩阵时即是【西瓜书附录式(A.32)】。
当
W
=
I
\mathbf{W}=\mathbf{I}
W=I时,得到最常用的二次型的偏导数
∂
∂
x
(
A
x
−
b
)
T
(
A
x
−
b
)
=
2
A
T
(
A
x
−
b
)
\begin{align} \frac{\partial }{\partial \boldsymbol{x}}(\mathbf{Ax-b})^\mathrm{T}(\mathbf{Ax-b}) =2\mathbf{A}^\mathrm{T} (\mathbf{Ax-b}) \tag{A62} \end{align}
∂x∂(Ax−b)T(Ax−b)=2AT(Ax−b)(A62)
本文为原创,您可以:
- 点赞(支持博主)
- 收藏(待以后看)
- 转发(他考研或学习,正需要)
- 评论(或讨论)
- 引用(支持原创)
- 不侵权
上一篇:2、偏导数与梯度(以矩阵的整体形式表述)
下一篇:4、神经网络中的梯度(链式法则的图形助记)