- 一些笔记(未完待续)
文章中的英文描述,公式以及图片,均来自吴恩达深度学习课程的课后作业
∂ J ∂ z 2 ( i ) = 1 m ( a [ 2 ] ( i ) − y ( i ) ) \frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)} } = \frac{1}{m} (a^{[2](i)} - y^{(i)}) ∂z2(i)∂J=m1(a[2](i)−y(i))
∂ J ∂ W 2 = ∂ J ∂ z 2 ( i ) a [ 1 ] ( i ) T \frac{\partial \mathcal{J} }{ \partial W_2 } = \frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)} } a^{[1] (i) T} ∂W2∂J=∂z2(i)∂Ja[1](i)T
∂ J ∂ b 2 = ∑ i ∂ J ∂ z 2 ( i ) \frac{\partial \mathcal{J} }{ \partial b_2 } = \sum_i{\frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)}}} ∂b2∂J=∑i∂z2(i)∂J
∂ J ∂ z 1 ( i ) = W 2 T ∂ J ∂ z 2 ( i ) ∗ ( 1 − a [ 1 ] ( i ) 2 ) \frac{\partial \mathcal{J} }{ \partial z_{1}^{(i)} } = W_2^T \frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)} } * ( 1 - a^{[1] (i) 2}) ∂z1(i)∂J=W2T∂z2(i)∂J∗(1−a[1](i)2)
∂ J ∂ W 1 = ∂ J ∂ z 1 ( i ) X T \frac{\partial \mathcal{J} }{ \partial W_1 } = \frac{\partial \mathcal{J} }{ \partial z_{1}^{(i)} } X^T ∂W1∂J=∂z1(i)∂JXT
∂ J i ∂ b 1 = ∑ i ∂ J ∂ z 1 ( i ) \frac{\partial \mathcal{J} _i }{ \partial b_1 } = \sum_i{\frac{\partial \mathcal{J} }{ \partial z_{1}^{(i)}}} ∂b1∂Ji=∑i∂z1(i)∂J
下图是反向传播时的梯度计算,输出层激活函数为Sigmoid函数,隐藏层的激活函数是tanh()
,右侧是对应的向量化实现。
-
Note that ∗ * ∗ denotes elementwise multiplication.
-
The notation you will use is common in deep learning coding:
- dW1 = ∂ J ∂ W 1 \frac{\partial \mathcal{J} }{ \partial W_1 } ∂W1∂J
- db1 = ∂ J ∂ b 1 \frac{\partial \mathcal{J} }{ \partial b_1 } ∂b1∂J
- dW2 = ∂ J ∂ W 2 \frac{\partial \mathcal{J} }{ \partial W_2 } ∂W2∂J
- db2 = ∂ J ∂ b 2 \frac{\partial \mathcal{J} }{ \partial b_2 } ∂b2∂J
-
Tips:
- To compute dZ1 you’ll need to compute
g
[
1
]
′
(
Z
[
1
]
)
g^{[1]'}(Z^{[1]})
g[1]′(Z[1]). Since
g
[
1
]
(
.
)
g^{[1]}(.)
g[1](.) is the tanh activation function, if
a
=
g
[
1
]
(
z
)
a = g^{[1]}(z)
a=g[1](z) then
g
[
1
]
′
(
z
)
=
1
−
a
2
g^{[1]'}(z) = 1-a^2
g[1]′(z)=1−a2. So you can compute
g [ 1 ] ′ ( Z [ 1 ] ) g^{[1]'}(Z^{[1]}) g[1]′(Z[1]) using(1 - np.power(A1, 2))
.
- To compute dZ1 you’ll need to compute
g
[
1
]
′
(
Z
[
1
]
)
g^{[1]'}(Z^{[1]})
g[1]′(Z[1]). Since
g
[
1
]
(
.
)
g^{[1]}(.)
g[1](.) is the tanh activation function, if
a
=
g
[
1
]
(
z
)
a = g^{[1]}(z)
a=g[1](z) then
g
[
1
]
′
(
z
)
=
1
−
a
2
g^{[1]'}(z) = 1-a^2
g[1]′(z)=1−a2. So you can compute