起源
本篇文章来自于百度AIStudio的学习课程,关于神经网络、前向推理、后向传播、后向梯度计算等等的例子。
课程中讲到一个买苹果的例子,然后类比介绍了有关神经网络的相关知识。
基础知识
先普及一下求导的加法法则和链式法则:
假设
z
=
f
(
v
)
+
g
(
v
)
z=f(v)+g(v)
z=f(v)+g(v), 那么要使
z
z
z对
v
v
v求层,即
d
z
d
v
=
d
f
(
v
)
d
v
+
d
f
(
v
)
d
v
\frac{dz}{dv}=\frac{df(v)}{dv}+\frac{df(v)}{dv}
dvdz=dvdf(v)+dvdf(v)。
假设
z
=
f
(
u
)
z=f(u)
z=f(u),其中
u
=
g
(
v
)
u = g(v)
u=g(v),即
z
=
f
(
g
(
v
)
)
z = f(g(v))
z=f(g(v))。那么要使
z
z
z对
v
v
v求导,即
∂
z
∂
v
=
∂
z
∂
u
∂
u
∂
v
\frac{\partial z}{\partial v} = \frac{\partial z}{\partial u} \frac{\partial u}{\partial v}
∂v∂z=∂u∂z∂v∂u。
图例
- 购买的项目如苹果、橘子,分别购买的个数,折扣,相当是已知条件,即输入单元,
- 最后钱包的数,就是你需要给的钱,即输出单元。
- 中间你需要进行加减乘除运算,相当于神经网络里的运算,即隐藏单元。
单一层神经网络
例子:
假如你想要买两个苹果,苹果每一个100元,所以就你需要花200元,如果商家给你打个9折,所以你最终需要花180元。
如上图所示,我们先计算前向推理:
- 计算2个苹果所要的价钱 r = a ⋅ b = 100 ⋅ 2 = 200 r=a \cdot b= 100 \cdot 2=200 r=a⋅b=100⋅2=200。
- 计算打9折后所需要多少钱 t = r ⋅ c = 200 ⋅ 0.9 = 180 t=r \cdot c = 200 \cdot 0.9 = 180 t=r⋅c=200⋅0.9=180。
- 你钱包要掏出多少钱 z = t = 180 z = t = 180 z=t=180。
直接下是反向传播,即计算梯度,即是求导:
- 求倒数一层的梯度,因为 z = t z = t z=t, 所以 d z d t = 1 \frac {dz} {dt} = 1 dtdz=1
- 求倒数二层的梯度,因为 t = r ⋅ c t = r \cdot c t=r⋅c,所以往 r r r的方向的梯度为 ∂ z ∂ r = ∂ z ∂ t ∂ t ∂ r = ∂ z ∂ t ∂ ( r ⋅ c ) ∂ r = 1 ⋅ c = 0.9 \frac{\partial z}{\partial r}=\frac {\partial z} {\partial t} \frac {\partial t} {\partial r}=\frac {\partial z} {\partial t} \frac {\partial (r \cdot c)} {\partial r}=1 \cdot c=0.9 ∂r∂z=∂t∂z∂r∂t=∂t∂z∂r∂(r⋅c)=1⋅c=0.9,往 c c c的方向的梯度为 ∂ z ∂ c = ∂ z ∂ t ∂ t ∂ c = ∂ z ∂ t ∂ ( r ⋅ c ) ∂ c = 1 ⋅ r = 200 \frac{\partial z}{\partial c}=\frac {\partial z} {\partial t} \frac {\partial t} {\partial c} = \frac {\partial z} {\partial t} \frac {\partial (r \cdot c)} {\partial c} = 1 \cdot r = 200 ∂c∂z=∂t∂z∂c∂t=∂t∂z∂c∂(r⋅c)=1⋅r=200
- 求倒数三层的梯度,因为 r = a ⋅ b r = a \cdot b r=a⋅b,所以往 a a a的方向的梯度为 ∂ z ∂ a = ∂ z ∂ t ∂ t ∂ r ∂ r ∂ a = ∂ z ∂ t ∂ t ∂ r ∂ ( a ⋅ b ) ∂ a = 1 ⋅ 0.9 ⋅ b = 1 ⋅ 0.9 ⋅ 2 = 1.8 \frac{\partial z}{\partial a} = \frac {\partial z}{\partial t} \frac{\partial t}{\partial r} \frac{\partial r}{\partial a} = \frac {\partial z}{\partial t} \frac{\partial t}{\partial r} \frac{\partial (a \cdot b)}{\partial a} = 1 \cdot 0.9 \cdot b = 1 \cdot 0.9 \cdot 2 = 1.8 ∂a∂z=∂t∂z∂r∂t∂a∂r=∂t∂z∂r∂t∂a∂(a⋅b)=1⋅0.9⋅b=1⋅0.9⋅2=1.8,所以往 b b b的方向的梯度为 ∂ z ∂ b = ∂ z ∂ t ∂ t ∂ r ∂ r ∂ b = ∂ z ∂ t ∂ t ∂ r ∂ ( a ⋅ b ) ∂ b = 1 ⋅ 0.9 ⋅ a = 1 ⋅ 0.9 ⋅ 100 = 90 \frac{\partial z}{\partial b} = \frac {\partial z}{\partial t} \frac{\partial t}{\partial r} \frac{\partial r}{\partial b} = \frac {\partial z}{\partial t} \frac{\partial t}{\partial r} \frac{\partial (a \cdot b)}{\partial b} = 1 \cdot 0.9 \cdot a = 1 \cdot 0.9 \cdot 100 = 90 ∂b∂z=∂t∂z∂r∂t∂b∂r=∂t∂z∂r∂t∂b∂(a⋅b)=1⋅0.9⋅a=1⋅0.9⋅100=90
多层神经网络
这里以两层隐藏层网络来举例。
如上图所示,我们先计算前向推理:
- 计算2个苹果所要的价钱 f = a ⋅ b = 100 ⋅ 2 = 200 f=a \cdot b= 100 \cdot 2=200 f=a⋅b=100⋅2=200。
- 计算2个橘子所要的价钱 g = c ⋅ d = 50 ⋅ 3 = 150 g=c \cdot d= 50 \cdot 3=150 g=c⋅d=50⋅3=150。
- 计算应该要付的价钱 r = f + g = 200 + 150 = 350 r=f + g= 200 + 150 = 350 r=f+g=200+150=350。
- 计算打8折后所需要多少钱 t = r ⋅ e = 350 ⋅ 0.8 = 280 t=r \cdot e = 350 \cdot 0.8 = 280 t=r⋅e=350⋅0.8=280。
- 你钱包要掏出多少钱 z = t = 280 z = t = 280 z=t=280。
直接下是反向传播,即计算梯度,即是求导:
- 求倒数一层的梯度,因为 z = t z = t z=t, 所以 d z d t = 1 \frac {dz} {dt} = 1 dtdz=1
- 求倒数二层的梯度,因为 t = r ⋅ e t = r \cdot e t=r⋅e,所以往 r r r的方向的梯度为 ∂ z ∂ r = ∂ z ∂ t ∂ t ∂ r = ∂ z ∂ t ∂ ( r ⋅ e ) ∂ r = 1 ⋅ e = 0.8 \frac{\partial z}{\partial r} = \frac {\partial z} {\partial t} \frac {\partial t} {\partial r} = \frac {\partial z} {\partial t} \frac {\partial (r \cdot e)} {\partial r}=1 \cdot e=0.8 ∂r∂z=∂t∂z∂r∂t=∂t∂z∂r∂(r⋅e)=1⋅e=0.8,往 e e e的方向的梯度为 ∂ z ∂ e = ∂ z ∂ t ∂ t ∂ e = ∂ z ∂ t ∂ r ⋅ e ∂ e = 1 ⋅ r = 350 \frac{\partial z}{\partial e} = \frac {\partial z} {\partial t} \frac {\partial t} {\partial e} = \frac {\partial z} {\partial t} \frac {\partial r \cdot e} {\partial e} = 1 \cdot r = 350 ∂e∂z=∂t∂z∂e∂t=∂t∂z∂e∂r⋅e=1⋅r=350
- 求倒数三层的梯度,因为 r = f + g r = f + g r=f+g,所以往 f f f的方向的梯度为 ∂ z ∂ f = ∂ z ∂ t ∂ t ∂ r ∂ r ∂ f = ∂ z ∂ t ∂ t ∂ r ∂ ( f + g ) ∂ f = 1 ⋅ 0.8 ⋅ ( 1 + 0 ) = 1 ⋅ 0.8 ⋅ 1 = 0.8 \frac{\partial z}{\partial f} = \frac {\partial z}{\partial t} \frac{\partial t}{\partial r} \frac{\partial r}{\partial f}= \frac {\partial z}{\partial t} \frac{\partial t}{\partial r} \frac{\partial (f+g)}{\partial f}= 1 \cdot 0.8 \cdot (1+0) = 1 \cdot 0.8 \cdot 1 = 0.8 ∂f∂z=∂t∂z∂r∂t∂f∂r=∂t∂z∂r∂t∂f∂(f+g)=1⋅0.8⋅(1+0)=1⋅0.8⋅1=0.8,所以往 g g g的方向的梯度为 ∂ z ∂ g = ∂ z ∂ t ∂ t ∂ r ∂ r ∂ g = ∂ z ∂ t ∂ t ∂ r ∂ ( f + g ) ∂ g = 1 ⋅ 0.8 ⋅ ( 0 + 1 ) = 1 ⋅ 0.8 ⋅ 1 = 0.8 \frac{\partial z}{\partial g} = \frac {\partial z}{\partial t} \frac{\partial t}{\partial r} \frac{\partial r}{\partial g}= \frac {\partial z}{\partial t} \frac{\partial t}{\partial r} \frac{\partial (f+g)}{\partial g}= 1 \cdot 0.8 \cdot (0+1) = 1 \cdot 0.8 \cdot 1 = 0.8 ∂g∂z=∂t∂z∂r∂t∂g∂r=∂t∂z∂r∂t∂g∂(f+g)=1⋅0.8⋅(0+1)=1⋅0.8⋅1=0.8
- 求倒数四层的梯度,因为 f = a ⋅ b f = a \cdot b f=a⋅b, g = c ⋅ d g = c \cdot d g=c⋅d,所以往 a a a的方向的梯度为 ∂ z ∂ a = ∂ z ∂ t ∂ t ∂ r ∂ r ∂ f ∂ f ∂ a = ∂ z ∂ t ∂ t ∂ r ∂ ( f + g ) ∂ f ∂ ( a ⋅ b ) ∂ a = 1 ⋅ 0.8 ⋅ ( 1 + 0 ) ⋅ b = 1 ⋅ 0.8 ⋅ 1 ⋅ 2 = 1.6 \frac{\partial z}{\partial a} = \frac {\partial z}{\partial t} \frac{\partial t}{\partial r} \frac{\partial r}{\partial f} \frac{\partial f}{\partial a}= \frac {\partial z}{\partial t} \frac{\partial t}{\partial r} \frac{\partial (f+g)}{\partial f} \frac{\partial (a \cdot b)}{\partial a}= 1 \cdot 0.8 \cdot (1+0) \cdot b = 1 \cdot 0.8 \cdot 1 \cdot 2 = 1.6 ∂a∂z=∂t∂z∂r∂t∂f∂r∂a∂f=∂t∂z∂r∂t∂f∂(f+g)∂a∂(a⋅b)=1⋅0.8⋅(1+0)⋅b=1⋅0.8⋅1⋅2=1.6,所以往 b b b的方向的梯度为 ∂ z ∂ b = ∂ z ∂ t ∂ t ∂ r ∂ r ∂ f ∂ f ∂ b = ∂ z ∂ t ∂ t ∂ r ∂ ( f + g ) ∂ f ∂ ( a ⋅ b ) ∂ b = 1 ⋅ 0.8 ⋅ ( 1 + 0 ) ⋅ a = 1 ⋅ 0.8 ⋅ 1 ⋅ 100 = 80 \frac{\partial z}{\partial b} = \frac {\partial z}{\partial t} \frac{\partial t}{\partial r} \frac{\partial r}{\partial f} \frac{\partial f}{\partial b}= \frac {\partial z}{\partial t} \frac{\partial t}{\partial r} \frac{\partial (f+g)}{\partial f} \frac{\partial (a \cdot b)}{\partial b}= 1 \cdot 0.8 \cdot (1+0) \cdot a = 1 \cdot 0.8 \cdot 1 \cdot 100 = 80 ∂b∂z=∂t∂z∂r∂t∂f∂r∂b∂f=∂t∂z∂r∂t∂f∂(f+g)∂b∂(a⋅b)=1⋅0.8⋅(1+0)⋅a=1⋅0.8⋅1⋅100=80,所以往 c c c的方向的梯度为 ∂ z ∂ c = ∂ z ∂ t ∂ t ∂ r ∂ r ∂ g ∂ g ∂ c = ∂ z ∂ t ∂ t ∂ r ∂ ( f + g ) ∂ g ∂ ( c ⋅ d ) ∂ c = 1 ⋅ 0.8 ⋅ ( 0 + 1 ) ⋅ d = 1 ⋅ 0.8 ⋅ 1 ⋅ 3 = 2.4 \frac{\partial z}{\partial c} = \frac {\partial z}{\partial t} \frac{\partial t}{\partial r} \frac{\partial r}{\partial g} \frac{\partial g}{\partial c}= \frac {\partial z}{\partial t} \frac{\partial t}{\partial r} \frac{\partial (f+g)}{\partial g} \frac{\partial (c \cdot d)}{\partial c}= 1 \cdot 0.8 \cdot (0+1) \cdot d = 1 \cdot 0.8 \cdot 1 \cdot 3 = 2.4 ∂c∂z=∂t∂z∂r∂t∂g∂r∂c∂g=∂t∂z∂r∂t∂g∂(f+g)∂c∂(c⋅d)=1⋅0.8⋅(0+1)⋅d=1⋅0.8⋅1⋅3=2.4,所以往 d d d的方向的梯度为 ∂ z ∂ d = ∂ z ∂ t ∂ t ∂ r ∂ r ∂ g ∂ g ∂ d = ∂ z ∂ t ∂ t ∂ r ∂ ( f + g ) ∂ g ∂ ( c ⋅ d ) ∂ d = 1 ⋅ 0.8 ⋅ ( 0 + 1 ) ⋅ c = 1 ⋅ 0.8 ⋅ 1 ⋅ 50 = 40 \frac{\partial z}{\partial d} = \frac {\partial z}{\partial t} \frac{\partial t}{\partial r} \frac{\partial r}{\partial g} \frac{\partial g}{\partial d}= \frac {\partial z}{\partial t} \frac{\partial t}{\partial r} \frac{\partial (f+g)}{\partial g} \frac{\partial (c \cdot d)}{\partial d}= 1 \cdot 0.8 \cdot (0+1) \cdot c = 1 \cdot 0.8 \cdot 1 \cdot 50 = 40 ∂d∂z=∂t∂z∂r∂t∂g∂r∂d∂g=∂t∂z∂r∂t∂g∂(f+g)∂d∂(c⋅d)=1⋅0.8⋅(0+1)⋅c=1⋅0.8⋅1⋅50=40
以上是我的看法,不知道计算得对不对,请大家指导。