在学这部分内容时,并不能理解
d
z
[
1
]
=
W
[
2
]
T
d
z
[
2
]
∗
g
[
1
]
′
(
z
[
1
]
)
dz^{[1]} =W^{[2]T}dz^{[2]}*g^{[1]'}(z^{[1]})
dz[1]=W[2]Tdz[2]∗g[1]′(z[1])是怎么推导的
其实就是简单的利用链式法则
还有一点就是当时没有注意
x
和
a
x和a
x和a
推导如下:
d
z
[
1
]
=
d
L
d
z
[
1
]
=
d
L
d
z
[
2
]
⋅
d
z
[
2
]
d
z
[
1
]
=
d
L
d
z
[
2
]
⋅
d
z
[
2
]
d
a
[
1
]
⋅
d
a
[
1
]
d
z
[
1
]
dz^{[1]}=\frac{dL}{dz^{[1]}}=\frac{dL}{dz^{[2]}}\cdot\frac{dz^{[2]}}{dz^{[1]}}=\frac{dL}{dz^{[2]}}\cdot\frac{dz^{[2]}}{da^{[1]}}\cdot\frac{da^{[1]}}{dz^{[1]}}
dz[1]=dz[1]dL=dz[2]dL⋅dz[1]dz[2]=dz[2]dL⋅da[1]dz[2]⋅dz[1]da[1](chain rule)
d L d z [ 2 ] = d z [ 2 ] \frac{dL}{dz^{[2]}}=dz^{[2]} dz[2]dL=dz[2]
d z [ 2 ] d a [ 1 ] = W [ 2 ] T \frac{dz^{[2]}}{da^[1]}=W^{[2]T} da[1]dz[2]=W[2]T当时主要就是这一步没有理解
图上 Z [ 2 ] = W [ 2 ] x + b [ 2 ] Z^{[2]}=W^{[2]}x+b^{[2]} Z[2]=W[2]x+b[2],其实这里的第二层输入的x,就是 a [ 1 ] a^{[1]} a[1]
即 Z [ 2 ] = W [ 2 ] a [ 1 ] + b [ 2 ] Z^{[2]}=W^{[2]}a^{[1]}+b^{[2]} Z[2]=W[2]a[1]+b[2],这么看的话结果显然是 W [ 2 ] W^{[2]} W[2]
但最后的结果还多一个转置,则是为了矩阵维度匹配
d a [ 1 ] d z [ 1 ] = g [ 1 ] ′ ( z [ 1 ] ) \frac{da^{[1]}}{dz^{[1]}}=g^{[1]'}(z^{[1]}) dz[1]da[1]=g[1]′(z[1]) 这是隐藏层所用激活函数的导数的写法
综上 d z [ 1 ] = d L d z [ 1 ] = d L d z [ 2 ] ⋅ d z [ 2 ] d z [ 1 ] = d L d z [ 2 ] ⋅ d z [ 2 ] d a [ 1 ] ⋅ d a [ 1 ] d z [ 1 ] dz^{[1]}=\frac{dL}{dz^{[1]}}=\frac{dL}{dz^{[2]}}\cdot\frac{dz^{[2]}}{dz^{[1]}}=\frac{dL}{dz^{[2]}}\cdot\frac{dz^{[2]}}{da^{[1]}}\cdot\frac{da^{[1]}}{dz^{[1]}} dz[1]=dz[1]dL=dz[2]dL⋅dz[1]dz[2]=dz[2]dL⋅da[1]dz[2]⋅dz[1]da[1]
= W [ 2 ] T d z [ 2 ] ∗ g [ 1 ] ′ ( z [ 1 ] ) =W^{[2]T}dz^{[2]}*g^{[1]'}(z^{[1]}) =W[2]Tdz[2]∗g[1]′(z[1])