正向传播推导
第i个样本
我们要搭建的神经网络模型如下图:
维度
这是一个2层神经网络模型,第0层为输入层,有n_x个特征;第1层为隐藏层,有n_h=4个隐藏单元;第2层为输出层,有n_y=1个输出单元。右上角 [ 0 ] ( i ) ^{[0](i)} [0](i)符号代表第0层第i个样本。令 a [ 0 ] ( i ) = x [ 0 ] ( i ) a^{[0](i)}=x^{[0](i)} a[0](i)=x[0](i)。 x [ 0 ] ( i ) x^{[0](i)} x[0](i)维度:2x1; w [ 1 ] ( i ) w^{[1](i)} w[1](i)维度=n_h x n_x=4x2; b [ 1 ] ( i ) b^{[1](i)} b[1](i)维度=n_h x 1=4x1; w [ 2 ] ( i ) w^{[2](i)} w[2](i)维度=n_y x n_h=1x4; b [ 2 ] ( i ) b^{[2](i)} b[2](i)维度=n_y x 1=1x1。
求 z [ 1 ] ( i ) 、 a [ 1 ] ( i ) 、 z [ 2 ] ( i ) 、 a [ 2 ] ( i ) z^{[1](i)}、a^{[1](i)}、z^{[2](i)}、a^{[2](i)} z[1](i)、a[1](i)、z[2](i)、a[2](i)
w
1
[
1
]
(
i
)
=
(
w
11
[
1
]
(
i
)
w
12
[
1
]
(
i
)
)
;
w
2
[
1
]
(
i
)
=
(
w
21
[
1
]
(
i
)
w
22
[
1
]
(
i
)
)
;
w
3
[
1
]
(
i
)
=
(
w
31
[
1
]
(
i
)
w
32
[
1
]
(
i
)
)
;
w
4
[
1
]
(
i
)
=
(
w
41
[
1
]
(
i
)
w
42
[
1
]
(
i
)
)
w_1^{[1](i)}=\begin{pmatrix} w_{11}^{[1](i)} \\ w_{12}^{[1](i)} \\ \end{pmatrix};w_2^{[1](i)}=\begin{pmatrix} w_{21}^{[1](i)} \\ w_{22}^{[1](i)} \\ \end{pmatrix};w_3^{[1](i)}=\begin{pmatrix} w_{31}^{[1](i)} \\ w_{32}^{[1](i)} \\ \end{pmatrix};w_4^{[1](i)}=\begin{pmatrix} w_{41}^{[1](i)} \\ w_{42}^{[1](i)} \\ \end{pmatrix}
w1[1](i)=(w11[1](i)w12[1](i));w2[1](i)=(w21[1](i)w22[1](i));w3[1](i)=(w31[1](i)w32[1](i));w4[1](i)=(w41[1](i)w42[1](i))
w
[
1
]
(
i
)
=
(
w
11
[
1
]
(
i
)
w
12
[
1
]
(
i
)
w
21
[
1
]
(
i
)
w
22
[
1
]
(
i
)
w
31
[
1
]
(
i
)
w
32
[
1
]
(
i
)
w
41
[
1
]
(
i
)
w
42
[
1
]
(
i
)
)
=
(
w
1
[
1
]
(
i
)
T
w
2
[
1
]
(
i
)
T
w
3
[
1
]
(
i
)
T
w
4
[
1
]
(
i
)
T
)
w^{[1](i)}=\begin{pmatrix} w_{11}^{[1](i)} & w_{12}^{[1](i)}\\ w_{21}^{[1](i)} & w_{22}^{[1](i)} \\ w_{31}^{[1](i)} & w_{32}^{[1](i)} \\ w_{41}^{[1](i)} & w_{42}^{[1](i)} \\ \end{pmatrix}=\begin{pmatrix} w_1^{[1](i)T}\\ w_2^{[1](i)T}\\ w_3^{[1](i)T }\\w_4^{[1](i)T}\\ \end{pmatrix}
w[1](i)=⎝⎜⎜⎜⎛w11[1](i)w21[1](i)w31[1](i)w41[1](i)w12[1](i)w22[1](i)w32[1](i)w42[1](i)⎠⎟⎟⎟⎞=⎝⎜⎜⎜⎛w1[1](i)Tw2[1](i)Tw3[1](i)Tw4[1](i)T⎠⎟⎟⎟⎞
z
[
1
]
(
i
)
=
(
z
1
[
1
]
(
i
)
z
2
[
1
]
(
i
)
z
3
[
1
]
(
i
)
z
4
[
1
]
(
i
)
)
=
w
[
1
]
(
i
)
x
[
0
]
(
i
)
+
b
[
1
]
(
i
)
z^{[1](i)}=\begin{pmatrix} z_1^{[1](i)}\\ z_2^{[1](i)}\\ z_3^{[1](i)}\\ z_4^{[1](i)}\\ \end{pmatrix}=w^{[1](i)}x^{[0](i)}+b^{[1](i)}
z[1](i)=⎝⎜⎜⎜⎛z1[1](i)z2[1](i)z3[1](i)z4[1](i)⎠⎟⎟⎟⎞=w[1](i)x[0](i)+b[1](i)
a
[
1
]
[
i
]
=
g
[
1
]
(
z
[
1
]
(
i
)
)
=
t
a
n
h
(
z
[
1
]
(
i
)
)
a^{[1][i]}=g^{[1]}(z^{[1](i)})=tanh(z^{[1](i)})
a[1][i]=g[1](z[1](i))=tanh(z[1](i))
其中,
z
[
1
]
(
i
)
z^{[1](i)}
z[1](i)维度为=4x1,
a
[
1
]
[
i
]
a^{[1][i]}
a[1][i]维度为=4x1。
z
[
2
]
(
i
)
=
w
[
2
]
(
i
)
a
[
1
]
(
i
)
+
b
[
2
]
(
i
)
z^{[2](i)}=w^{[2](i)}a^{[1](i)}+b^{[2](i)}
z[2](i)=w[2](i)a[1](i)+b[2](i)
y
^
=
a
[
2
]
[
i
]
=
g
[
2
]
(
z
[
2
]
(
i
)
)
=
s
i
g
m
o
i
d
(
z
[
2
]
(
i
)
)
\hat y =a^{[2][i]}=g^{[2]}(z^{[2](i)})=sigmoid(z^{[2](i)})
y^=a[2][i]=g[2](z[2](i))=sigmoid(z[2](i))
其中,
z
[
2
]
(
i
)
z^{[2](i)}
z[2](i)维度为=1x1,
a
[
2
]
[
i
]
a^{[2][i]}
a[2][i]维度为=1x1。
计算损失
J = − 1 m ∑ i = 0 m ( y l o g 10 a [ 2 ] [ i ] + ( 1 − y ) l o g 10 ( 1 − a [ 2 ] [ i ] ) ) J=-\frac{1}{m} \sum_{i=0}^m (ylog_{10}a^{[2][i]}+(1-y)log_{10}(1-a^{[2][i]})) J=−m1i=0∑m(ylog10a[2][i]+(1−y)log10(1−a[2][i]))
向量化
维度
令 A [ 0 ] = X [ 0 ] A^{[0]}=X^{[0]} A[0]=X[0];输入 X [ 0 ] X^{[0]} X[0]维度为n_x x m,其中有n_x个特征,m个样本; W [ 1 ] W^{[1]} W[1]维度=n_h x n_x=4 x n_x; b [ 1 ] b^{[1]} b[1]维度=n_h x 1=4x1; W [ 2 ] W^{[2]} W[2]维度=n_y x n_h=1x4; b [ 2 ] b^{[2]} b[2]维度=n_y x 1=1x1。
求 Z [ 1 ] 、 A [ 1 ] 、 Z [ 2 ] 、 A [ 2 ] Z^{[1]}、A^{[1]}、Z^{[2]}、A^{[2]} Z[1]、A[1]、Z[2]、A[2]
Z
[
1
]
=
W
[
1
]
X
[
0
]
+
b
[
1
]
=
W
[
1
]
A
[
0
]
+
b
[
1
]
Z^{[1]}=W^{[1]}X^{[0]}+b^{[1]}=W^{[1]}A^{[0]}+b^{[1]}
Z[1]=W[1]X[0]+b[1]=W[1]A[0]+b[1]
A
[
1
]
=
g
[
1
]
(
Z
[
1
]
)
A^{[1]}=g^{[1]}(Z^{[1]})
A[1]=g[1](Z[1])
其中,
Z
[
1
]
Z^{[1]}
Z[1]的维度为=4xm,
A
[
1
]
A^{[1]}
A[1]的维度为4xm。
Z
[
2
]
=
W
[
2
]
A
[
1
]
+
b
[
2
]
Z^{[2]}=W^{[2]}A^{[1]}+b^{[2]}
Z[2]=W[2]A[1]+b[2]
A
[
2
]
=
g
[
2
]
(
Z
[
2
]
)
A^{[2]}=g^{[2]}(Z^{[2]})
A[2]=g[2](Z[2])
其中,
Z
[
2
]
Z^{[2]}
Z[2]的维度为=1xm,
A
[
2
]
A^{[2]}
A[2]的维度为1xm。
反向传播推导
采用梯度下降法来求,所得公式如下:
第i个样本
维度
由前向传播得维度, x [ 0 ] ( i ) x^{[0](i)} x[0](i)维度:2x1; w [ 1 ] ( i ) w^{[1](i)} w[1](i)维度=n_h x n_x=4x2; b [ 1 ] ( i ) b^{[1](i)} b[1](i)维度=n_h x 1=4x1; w [ 2 ] ( i ) w^{[2](i)} w[2](i)维度=n_y x n_h=1x4; b [ 2 ] ( i ) b^{[2](i)} b[2](i)维度=n_y x 1=1x1。 z [ 1 ] ( i ) z^{[1](i)} z[1](i)维度为=4x1, a [ 1 ] [ i ] a^{[1][i]} a[1][i]维度为=4x1。 z [ 2 ] ( i ) z^{[2](i)} z[2](i)维度为=1x1, a [ 2 ] [ i ] a^{[2][i]} a[2][i]维度为=1x1。
求 d z [ 1 ] ( i ) dz^{[1](i)} dz[1](i)、 d w [ 1 ] ( i ) dw^{[1](i)} dw[1](i)、 d b [ 1 ] ( i ) db^{[1](i)} db[1](i)、 d z [ 2 ] ( i ) dz^{[2](i)} dz[2](i)、 d w [ 2 ] ( i ) dw^{[2](i)} dw[2](i)、 d b [ 2 ] ( i ) db^{[2](i)} db[2](i)
d
z
[
2
]
(
i
)
=
∂
L
(
a
[
2
]
(
i
)
,
y
)
∂
a
[
2
]
(
i
)
∂
a
[
2
]
(
i
)
∂
z
[
2
]
(
i
)
=
a
[
2
]
(
i
)
−
y
(
i
)
dz^{[2](i)}=\frac{\partial L(a^{[2](i)},y)}{\partial a^{[2](i)}}\frac{\partial a^{[2](i)}}{\partial z^{[2](i)}}=a^{[2](i)} - y^{(i)}
dz[2](i)=∂a[2](i)∂L(a[2](i),y)∂z[2](i)∂a[2](i)=a[2](i)−y(i)
上式推导过程见笔记吴恩达深度学习第一课–第二周神经网络基础作业上正反向传播推导
d
w
[
2
]
(
i
)
=
d
z
[
2
]
(
i
)
∂
z
[
2
]
(
i
)
∂
w
[
2
]
(
i
)
=
(
a
[
2
]
(
i
)
−
y
(
i
)
)
a
[
1
]
[
i
]
dw^{[2](i)}=dz^{[2](i)} \frac{\partial z^{[2](i)}}{\partial w^{[2](i)}} =(a^{[2](i)} - y^{(i)}) a^{[1][i]}
dw[2](i)=dz[2](i)∂w[2](i)∂z[2](i)=(a[2](i)−y(i))a[1][i]
由
w
[
2
]
(
i
)
w^{[2](i)}
w[2](i)维度为1x4,
(
a
[
2
]
(
i
)
−
y
(
i
)
)
(a^{[2](i)} - y^{(i)})
(a[2](i)−y(i))维度为1x1,
a
[
1
]
[
i
]
a^{[1][i]}
a[1][i]维度为4x1,所以得:
w
[
2
]
(
i
)
=
d
z
[
2
]
(
i
)
∂
z
[
2
]
(
i
)
∂
w
[
2
]
(
i
)
=
d
z
[
2
]
(
i
)
a
[
1
]
[
i
]
T
=
(
a
[
2
]
(
i
)
−
y
(
i
)
)
a
[
1
]
[
i
]
T
w^{[2](i)}=dz^{[2](i)} \frac{\partial z^{[2](i)}}{\partial w^{[2](i)}} =dz^{[2](i)}a^{[1][i]T}= (a^{[2](i)} - y^{(i)})a^{[1][i]T}
w[2](i)=dz[2](i)∂w[2](i)∂z[2](i)=dz[2](i)a[1][i]T=(a[2](i)−y(i))a[1][i]T
d
b
[
2
]
(
i
)
=
d
z
[
2
]
(
i
)
=
a
[
2
]
(
i
)
−
y
(
i
)
db^{[2](i)}=dz^{[2](i)}=a^{[2](i)} - y^{(i)}
db[2](i)=dz[2](i)=a[2](i)−y(i)
d
z
[
1
]
(
i
)
=
∂
L
(
a
[
2
]
(
i
)
,
y
(
i
)
)
∂
a
[
2
]
(
i
)
∂
a
[
2
]
(
i
)
∂
z
[
2
]
(
i
)
∂
z
[
2
]
(
i
)
∂
a
[
1
]
(
i
)
∂
a
[
1
]
(
i
)
∂
z
[
1
]
(
i
)
=
d
z
[
2
]
(
i
)
w
[
2
]
(
i
)
∗
g
[
1
]
′
(
z
[
1
]
(
i
)
)
=
w
[
2
]
(
i
)
T
d
z
[
2
]
(
i
)
∗
g
[
1
]
′
(
z
[
1
]
(
i
)
)
dz^{[1](i)}=\frac{\partial L(a^{[2](i)},y^{(i)})}{\partial a^{[2](i)}} \frac{\partial a^{[2](i)}}{\partial z^{[2](i)}} \frac{\partial z^{[2](i)}}{\partial a^{[1](i)}} \frac{\partial a^{[1](i)}}{\partial z^{[1](i)}}=dz^{[2](i)} w^{[2](i)} *g^{[1]'}(z^{[1](i)})=w^{[2](i)T} dz^{[2](i)} *g^{[1]'}(z^{[1](i)})
dz[1](i)=∂a[2](i)∂L(a[2](i),y(i))∂z[2](i)∂a[2](i)∂a[1](i)∂z[2](i)∂z[1](i)∂a[1](i)=dz[2](i)w[2](i)∗g[1]′(z[1](i))=w[2](i)Tdz[2](i)∗g[1]′(z[1](i))
d
w
[
1
]
(
i
)
=
d
z
[
1
]
(
i
)
∂
z
[
1
]
∂
w
[
1
]
=
d
z
[
1
]
(
i
)
x
[
0
]
[
i
]
T
dw^{[1](i)}=dz^{[1](i)} \frac{\partial z^{[1]}}{\partial w^{[1]}} = dz^{[1](i)} x^{[0][i]T}
dw[1](i)=dz[1](i)∂w[1]∂z[1]=dz[1](i)x[0][i]T
d
b
[
1
]
(
i
)
=
d
z
[
1
]
(
i
)
db^{[1](i)}=dz^{[1](i)}
db[1](i)=dz[1](i)
向量化
维度
令 A [ 0 ] = X [ 0 ] A^{[0]}=X^{[0]} A[0]=X[0];输入 X [ 0 ] X^{[0]} X[0]维度为n_x x m,其中有n_x个特征,m个样本; W [ 1 ] W^{[1]} W[1]维度=n_h x n_x=4 x n_x; b [ 1 ] b^{[1]} b[1]维度=n_h x 1=4x1; W [ 2 ] W^{[2]} W[2]维度=n_y x n_h=1x4; b [ 2 ] b^{[2]} b[2]维度=n_y x 1=1x1。 Z [ 1 ] Z^{[1]} Z[1]的维度为=4xm, A [ 1 ] A^{[1]} A[1]的维度为4xm。 Z [ 2 ] Z^{[2]} Z[2]的维度为=1xm, A [ 2 ] A^{[2]} A[2]的维度为1xm。
求 d Z [ 1 ] dZ^{[1]} dZ[1]、 d W [ 1 ] dW^{[1]} dW[1]、 d b [ 1 ] db^{[1]} db[1]、 d Z [ 2 ] dZ^{[2]} dZ[2]、 d W [ 2 ] dW^{[2]} dW[2]、 d b [ 2 ] db^{[2]} db[2]
推导如下:
d
Z
[
2
]
=
A
[
2
]
−
Y
dZ^{[2]}=A^{[2]} - Y
dZ[2]=A[2]−Y
d
W
[
2
]
=
1
m
d
Z
[
2
]
A
[
1
]
dW^{[2]}=\frac{1}{m} dZ^{[2]} A^{[1]}
dW[2]=m1dZ[2]A[1]
由于
Z
[
2
]
Z^{[2]}
Z[2]的维度为=1xm,
A
[
1
]
A^{[1]}
A[1]的维度为4xm,
W
[
2
]
W^{[2]}
W[2]维度=1x4,所以需要将
A
[
1
]
A^{[1]}
A[1]转置,得到下式:
d
W
[
2
]
=
1
m
d
Z
[
2
]
A
[
1
]
T
=
1
m
(
A
[
2
]
−
Y
)
A
[
1
]
T
dW^{[2]}=\frac{1}{m} dZ^{[2]} A^{[1]T}=\frac{1}{m} (A^{[2]} - Y)A^{[1]T}
dW[2]=m1dZ[2]A[1]T=m1(A[2]−Y)A[1]T
d
b
[
2
]
=
1
m
d
Z
[
2
]
=
1
m
n
p
.
s
u
m
(
A
[
2
]
−
Y
)
db^{[2]}=\frac{1}{m}dZ^{[2]}=\frac{1}{m} np.sum(A^{[2]} - Y)
db[2]=m1dZ[2]=m1np.sum(A[2]−Y)
由于
A
[
2
]
−
Y
A^{[2]} - Y
A[2]−Y维度为1xm,而
d
b
[
2
]
db^{[2]}
db[2]维度为1x1,所以对
A
[
2
]
−
Y
A^{[2]} - Y
A[2]−Y求和。
d
Z
[
1
]
=
d
Z
[
2
]
W
[
2
]
∗
g
[
1
]
′
(
Z
[
1
]
)
dZ^{[1]}=dZ^{[2]}W^{[2]}* g^{[1]'}(Z^{[1]})
dZ[1]=dZ[2]W[2]∗g[1]′(Z[1])
由于
W
[
2
]
W^{[2]}
W[2]维度=1x4,
Z
[
2
]
Z^{[2]}
Z[2]的维度为=1xm,
d
Z
[
1
]
dZ^{[1]}
dZ[1]的维度为4xm,所以需要将
W
[
2
]
W^{[2]}
W[2]转置,得到下式:
d
Z
[
1
]
=
W
[
2
]
T
d
Z
[
2
]
∗
g
[
1
]
′
(
Z
[
1
]
)
=
n
p
.
d
o
t
(
W
[
2
]
.
T
,
d
Z
[
2
]
)
∗
g
[
1
]
′
(
Z
[
1
]
)
dZ^{[1]}=W^{[2]T}dZ^{[2]}* g^{[1]'}(Z^{[1]})=np.dot(W^{[2]}.T,dZ^{[2]})*g^{[1]'}(Z^{[1]})
dZ[1]=W[2]TdZ[2]∗g[1]′(Z[1])=np.dot(W[2].T,dZ[2])∗g[1]′(Z[1])
d
W
[
1
]
=
1
m
d
Z
[
1
]
X
dW^{[1]}=\frac{1}{m} dZ^{[1]} X
dW[1]=m1dZ[1]X
由于
d
W
[
1
]
dW^{[1]}
dW[1]维度为:4 x n_x,
d
Z
[
1
]
dZ^{[1]}
dZ[1]的维度为4xm,X维度为n_x x m,所以将X转置,得到下式:
d
W
[
1
]
=
1
m
d
Z
[
1
]
X
T
dW^{[1]}=\frac{1}{m} dZ^{[1]} X^{T}
dW[1]=m1dZ[1]XT
d
b
[
1
]
=
1
m
d
Z
[
1
]
db^{[1]}=\frac{1}{m} dZ^{[1]}
db[1]=m1dZ[1]
由于
d
Z
[
1
]
dZ^{[1]}
dZ[1]的维度为4xm,而
b
[
1
]
b^{[1]}
b[1]维度=4x1,所以对每一行求和,得下式:
d
b
[
1
]
=
1
m
n
p
.
s
u
m
(
d
Z
[
1
]
)
db^{[1]}=\frac{1}{m} np.sum(dZ^{[1]})
db[1]=m1np.sum(dZ[1])