吴恩达深度学习第一课--第三周神经网络基础作业上正反向传播推导

正向传播推导

第i个样本

我们要搭建的神经网络模型如下图:
在这里插入图片描述

维度

这是一个2层神经网络模型,第0层为输入层,有n_x个特征;第1层为隐藏层,有n_h=4个隐藏单元;第2层为输出层,有n_y=1个输出单元。右上角 [ 0 ] ( i ) ^{[0](i)} [0](i)符号代表第0层第i个样本。令 a [ 0 ] ( i ) = x [ 0 ] ( i ) a^{[0](i)}=x^{[0](i)} a[0](i)=x[0](i) x [ 0 ] ( i ) x^{[0](i)} x[0](i)维度:2x1; w [ 1 ] ( i ) w^{[1](i)} w[1](i)维度=n_h x n_x=4x2; b [ 1 ] ( i ) b^{[1](i)} b[1](i)维度=n_h x 1=4x1; w [ 2 ] ( i ) w^{[2](i)} w[2](i)维度=n_y x n_h=1x4; b [ 2 ] ( i ) b^{[2](i)} b[2](i)维度=n_y x 1=1x1。

z [ 1 ] ( i ) 、 a [ 1 ] ( i ) 、 z [ 2 ] ( i ) 、 a [ 2 ] ( i ) z^{[1](i)}、a^{[1](i)}、z^{[2](i)}、a^{[2](i)} z[1](i)a[1](i)z[2](i)a[2](i)

w 1 [ 1 ] ( i ) = ( w 11 [ 1 ] ( i ) w 12 [ 1 ] ( i ) ) ; w 2 [ 1 ] ( i ) = ( w 21 [ 1 ] ( i ) w 22 [ 1 ] ( i ) ) ; w 3 [ 1 ] ( i ) = ( w 31 [ 1 ] ( i ) w 32 [ 1 ] ( i ) ) ; w 4 [ 1 ] ( i ) = ( w 41 [ 1 ] ( i ) w 42 [ 1 ] ( i ) ) w_1^{[1](i)}=\begin{pmatrix} w_{11}^{[1](i)} \\ w_{12}^{[1](i)} \\ \end{pmatrix};w_2^{[1](i)}=\begin{pmatrix} w_{21}^{[1](i)} \\ w_{22}^{[1](i)} \\ \end{pmatrix};w_3^{[1](i)}=\begin{pmatrix} w_{31}^{[1](i)} \\ w_{32}^{[1](i)} \\ \end{pmatrix};w_4^{[1](i)}=\begin{pmatrix} w_{41}^{[1](i)} \\ w_{42}^{[1](i)} \\ \end{pmatrix} w1[1](i)=(w11[1](i)w12[1](i));w2[1](i)=(w21[1](i)w22[1](i));w3[1](i)=(w31[1](i)w32[1](i));w4[1](i)=(w41[1](i)w42[1](i))
w [ 1 ] ( i ) = ( w 11 [ 1 ] ( i ) w 12 [ 1 ] ( i ) w 21 [ 1 ] ( i ) w 22 [ 1 ] ( i ) w 31 [ 1 ] ( i ) w 32 [ 1 ] ( i ) w 41 [ 1 ] ( i ) w 42 [ 1 ] ( i ) ) = ( w 1 [ 1 ] ( i ) T w 2 [ 1 ] ( i ) T w 3 [ 1 ] ( i ) T w 4 [ 1 ] ( i ) T ) w^{[1](i)}=\begin{pmatrix} w_{11}^{[1](i)} & w_{12}^{[1](i)}\\ w_{21}^{[1](i)} & w_{22}^{[1](i)} \\ w_{31}^{[1](i)} & w_{32}^{[1](i)} \\ w_{41}^{[1](i)} & w_{42}^{[1](i)} \\ \end{pmatrix}=\begin{pmatrix} w_1^{[1](i)T}\\ w_2^{[1](i)T}\\ w_3^{[1](i)T }\\w_4^{[1](i)T}\\ \end{pmatrix} w[1](i)=w11[1](i)w21[1](i)w31[1](i)w41[1](i)w12[1](i)w22[1](i)w32[1](i)w42[1](i)=w1[1](i)Tw2[1](i)Tw3[1](i)Tw4[1](i)T
z [ 1 ] ( i ) = ( z 1 [ 1 ] ( i ) z 2 [ 1 ] ( i ) z 3 [ 1 ] ( i ) z 4 [ 1 ] ( i ) ) = w [ 1 ] ( i ) x [ 0 ] ( i ) + b [ 1 ] ( i ) z^{[1](i)}=\begin{pmatrix} z_1^{[1](i)}\\ z_2^{[1](i)}\\ z_3^{[1](i)}\\ z_4^{[1](i)}\\ \end{pmatrix}=w^{[1](i)}x^{[0](i)}+b^{[1](i)} z[1](i)=z1[1](i)z2[1](i)z3[1](i)z4[1](i)=w[1](i)x[0](i)+b[1](i)
a [ 1 ] [ i ] = g [ 1 ] ( z [ 1 ] ( i ) ) = t a n h ( z [ 1 ] ( i ) ) a^{[1][i]}=g^{[1]}(z^{[1](i)})=tanh(z^{[1](i)}) a[1][i]=g[1](z[1](i))=tanh(z[1](i))
其中, z [ 1 ] ( i ) z^{[1](i)} z[1](i)维度为=4x1, a [ 1 ] [ i ] a^{[1][i]} a[1][i]维度为=4x1。
z [ 2 ] ( i ) = w [ 2 ] ( i ) a [ 1 ] ( i ) + b [ 2 ] ( i ) z^{[2](i)}=w^{[2](i)}a^{[1](i)}+b^{[2](i)} z[2](i)=w[2](i)a[1](i)+b[2](i)
y ^ = a [ 2 ] [ i ] = g [ 2 ] ( z [ 2 ] ( i ) ) = s i g m o i d ( z [ 2 ] ( i ) ) \hat y =a^{[2][i]}=g^{[2]}(z^{[2](i)})=sigmoid(z^{[2](i)}) y^=a[2][i]=g[2](z[2](i))=sigmoid(z[2](i))
其中, z [ 2 ] ( i ) z^{[2](i)} z[2](i)维度为=1x1, a [ 2 ] [ i ] a^{[2][i]} a[2][i]维度为=1x1。

计算损失

J = − 1 m ∑ i = 0 m ( y l o g 10 a [ 2 ] [ i ] + ( 1 − y ) l o g 10 ( 1 − a [ 2 ] [ i ] ) ) J=-\frac{1}{m} \sum_{i=0}^m (ylog_{10}a^{[2][i]}+(1-y)log_{10}(1-a^{[2][i]})) J=m1i=0m(ylog10a[2][i]+(1y)log10(1a[2][i]))

向量化

维度

A [ 0 ] = X [ 0 ] A^{[0]}=X^{[0]} A[0]=X[0];输入 X [ 0 ] X^{[0]} X[0]维度为n_x x m,其中有n_x个特征,m个样本; W [ 1 ] W^{[1]} W[1]维度=n_h x n_x=4 x n_x; b [ 1 ] b^{[1]} b[1]维度=n_h x 1=4x1; W [ 2 ] W^{[2]} W[2]维度=n_y x n_h=1x4; b [ 2 ] b^{[2]} b[2]维度=n_y x 1=1x1。

Z [ 1 ] 、 A [ 1 ] 、 Z [ 2 ] 、 A [ 2 ] Z^{[1]}、A^{[1]}、Z^{[2]}、A^{[2]} Z[1]A[1]Z[2]A[2]

Z [ 1 ] = W [ 1 ] X [ 0 ] + b [ 1 ] = W [ 1 ] A [ 0 ] + b [ 1 ] Z^{[1]}=W^{[1]}X^{[0]}+b^{[1]}=W^{[1]}A^{[0]}+b^{[1]} Z[1]=W[1]X[0]+b[1]=W[1]A[0]+b[1]
A [ 1 ] = g [ 1 ] ( Z [ 1 ] ) A^{[1]}=g^{[1]}(Z^{[1]}) A[1]=g[1](Z[1])
其中, Z [ 1 ] Z^{[1]} Z[1]的维度为=4xm, A [ 1 ] A^{[1]} A[1]的维度为4xm。
Z [ 2 ] = W [ 2 ] A [ 1 ] + b [ 2 ] Z^{[2]}=W^{[2]}A^{[1]}+b^{[2]} Z[2]=W[2]A[1]+b[2]
A [ 2 ] = g [ 2 ] ( Z [ 2 ] ) A^{[2]}=g^{[2]}(Z^{[2]}) A[2]=g[2](Z[2])
其中, Z [ 2 ] Z^{[2]} Z[2]的维度为=1xm, A [ 2 ] A^{[2]} A[2]的维度为1xm。

反向传播推导

采用梯度下降法来求,所得公式如下:
在这里插入图片描述

第i个样本

维度

由前向传播得维度, x [ 0 ] ( i ) x^{[0](i)} x[0](i)维度:2x1; w [ 1 ] ( i ) w^{[1](i)} w[1](i)维度=n_h x n_x=4x2; b [ 1 ] ( i ) b^{[1](i)} b[1](i)维度=n_h x 1=4x1; w [ 2 ] ( i ) w^{[2](i)} w[2](i)维度=n_y x n_h=1x4; b [ 2 ] ( i ) b^{[2](i)} b[2](i)维度=n_y x 1=1x1。 z [ 1 ] ( i ) z^{[1](i)} z[1](i)维度为=4x1, a [ 1 ] [ i ] a^{[1][i]} a[1][i]维度为=4x1。 z [ 2 ] ( i ) z^{[2](i)} z[2](i)维度为=1x1, a [ 2 ] [ i ] a^{[2][i]} a[2][i]维度为=1x1。

d z [ 1 ] ( i ) dz^{[1](i)} dz[1](i) d w [ 1 ] ( i ) dw^{[1](i)} dw[1](i) d b [ 1 ] ( i ) db^{[1](i)} db[1](i) d z [ 2 ] ( i ) dz^{[2](i)} dz[2](i) d w [ 2 ] ( i ) dw^{[2](i)} dw[2](i) d b [ 2 ] ( i ) db^{[2](i)} db[2](i)

d z [ 2 ] ( i ) = ∂ L ( a [ 2 ] ( i ) , y ) ∂ a [ 2 ] ( i ) ∂ a [ 2 ] ( i ) ∂ z [ 2 ] ( i ) = a [ 2 ] ( i ) − y ( i ) dz^{[2](i)}=\frac{\partial L(a^{[2](i)},y)}{\partial a^{[2](i)}}\frac{\partial a^{[2](i)}}{\partial z^{[2](i)}}=a^{[2](i)} - y^{(i)} dz[2](i)=a[2](i)L(a[2](i),y)z[2](i)a[2](i)=a[2](i)y(i)
上式推导过程见笔记吴恩达深度学习第一课–第二周神经网络基础作业上正反向传播推导
d w [ 2 ] ( i ) = d z [ 2 ] ( i ) ∂ z [ 2 ] ( i ) ∂ w [ 2 ] ( i ) = ( a [ 2 ] ( i ) − y ( i ) ) a [ 1 ] [ i ] dw^{[2](i)}=dz^{[2](i)} \frac{\partial z^{[2](i)}}{\partial w^{[2](i)}} =(a^{[2](i)} - y^{(i)}) a^{[1][i]} dw[2](i)=dz[2](i)w[2](i)z[2](i)=(a[2](i)y(i))a[1][i]
w [ 2 ] ( i ) w^{[2](i)} w[2](i)维度为1x4, ( a [ 2 ] ( i ) − y ( i ) ) (a^{[2](i)} - y^{(i)}) (a[2](i)y(i))维度为1x1, a [ 1 ] [ i ] a^{[1][i]} a[1][i]维度为4x1,所以得:
w [ 2 ] ( i ) = d z [ 2 ] ( i ) ∂ z [ 2 ] ( i ) ∂ w [ 2 ] ( i ) = d z [ 2 ] ( i ) a [ 1 ] [ i ] T = ( a [ 2 ] ( i ) − y ( i ) ) a [ 1 ] [ i ] T w^{[2](i)}=dz^{[2](i)} \frac{\partial z^{[2](i)}}{\partial w^{[2](i)}} =dz^{[2](i)}a^{[1][i]T}= (a^{[2](i)} - y^{(i)})a^{[1][i]T} w[2](i)=dz[2](i)w[2](i)z[2](i)=dz[2](i)a[1][i]T=(a[2](i)y(i))a[1][i]T
d b [ 2 ] ( i ) = d z [ 2 ] ( i ) = a [ 2 ] ( i ) − y ( i ) db^{[2](i)}=dz^{[2](i)}=a^{[2](i)} - y^{(i)} db[2](i)=dz[2](i)=a[2](i)y(i)
d z [ 1 ] ( i ) = ∂ L ( a [ 2 ] ( i ) , y ( i ) ) ∂ a [ 2 ] ( i ) ∂ a [ 2 ] ( i ) ∂ z [ 2 ] ( i ) ∂ z [ 2 ] ( i ) ∂ a [ 1 ] ( i ) ∂ a [ 1 ] ( i ) ∂ z [ 1 ] ( i ) = d z [ 2 ] ( i ) w [ 2 ] ( i ) ∗ g [ 1 ] ′ ( z [ 1 ] ( i ) ) = w [ 2 ] ( i ) T d z [ 2 ] ( i ) ∗ g [ 1 ] ′ ( z [ 1 ] ( i ) ) dz^{[1](i)}=\frac{\partial L(a^{[2](i)},y^{(i)})}{\partial a^{[2](i)}} \frac{\partial a^{[2](i)}}{\partial z^{[2](i)}} \frac{\partial z^{[2](i)}}{\partial a^{[1](i)}} \frac{\partial a^{[1](i)}}{\partial z^{[1](i)}}=dz^{[2](i)} w^{[2](i)} *g^{[1]'}(z^{[1](i)})=w^{[2](i)T} dz^{[2](i)} *g^{[1]'}(z^{[1](i)}) dz[1](i)=a[2](i)L(a[2](i),y(i))z[2](i)a[2](i)a[1](i)z[2](i)z[1](i)a[1](i)=dz[2](i)w[2](i)g[1](z[1](i))=w[2](i)Tdz[2](i)g[1](z[1](i))
d w [ 1 ] ( i ) = d z [ 1 ] ( i ) ∂ z [ 1 ] ∂ w [ 1 ] = d z [ 1 ] ( i ) x [ 0 ] [ i ] T dw^{[1](i)}=dz^{[1](i)} \frac{\partial z^{[1]}}{\partial w^{[1]}} = dz^{[1](i)} x^{[0][i]T} dw[1](i)=dz[1](i)w[1]z[1]=dz[1](i)x[0][i]T
d b [ 1 ] ( i ) = d z [ 1 ] ( i ) db^{[1](i)}=dz^{[1](i)} db[1](i)=dz[1](i)

向量化

维度

A [ 0 ] = X [ 0 ] A^{[0]}=X^{[0]} A[0]=X[0];输入 X [ 0 ] X^{[0]} X[0]维度为n_x x m,其中有n_x个特征,m个样本; W [ 1 ] W^{[1]} W[1]维度=n_h x n_x=4 x n_x; b [ 1 ] b^{[1]} b[1]维度=n_h x 1=4x1; W [ 2 ] W^{[2]} W[2]维度=n_y x n_h=1x4; b [ 2 ] b^{[2]} b[2]维度=n_y x 1=1x1。 Z [ 1 ] Z^{[1]} Z[1]的维度为=4xm, A [ 1 ] A^{[1]} A[1]的维度为4xm。 Z [ 2 ] Z^{[2]} Z[2]的维度为=1xm, A [ 2 ] A^{[2]} A[2]的维度为1xm。

d Z [ 1 ] dZ^{[1]} dZ[1] d W [ 1 ] dW^{[1]} dW[1] d b [ 1 ] db^{[1]} db[1] d Z [ 2 ] dZ^{[2]} dZ[2] d W [ 2 ] dW^{[2]} dW[2] d b [ 2 ] db^{[2]} db[2]

推导如下:
d Z [ 2 ] = A [ 2 ] − Y dZ^{[2]}=A^{[2]} - Y dZ[2]=A[2]Y
d W [ 2 ] = 1 m d Z [ 2 ] A [ 1 ] dW^{[2]}=\frac{1}{m} dZ^{[2]} A^{[1]} dW[2]=m1dZ[2]A[1]
由于 Z [ 2 ] Z^{[2]} Z[2]的维度为=1xm, A [ 1 ] A^{[1]} A[1]的维度为4xm, W [ 2 ] W^{[2]} W[2]维度=1x4,所以需要将 A [ 1 ] A^{[1]} A[1]转置,得到下式:
d W [ 2 ] = 1 m d Z [ 2 ] A [ 1 ] T = 1 m ( A [ 2 ] − Y ) A [ 1 ] T dW^{[2]}=\frac{1}{m} dZ^{[2]} A^{[1]T}=\frac{1}{m} (A^{[2]} - Y)A^{[1]T} dW[2]=m1dZ[2]A[1]T=m1(A[2]Y)A[1]T

d b [ 2 ] = 1 m d Z [ 2 ] = 1 m n p . s u m ( A [ 2 ] − Y ) db^{[2]}=\frac{1}{m}dZ^{[2]}=\frac{1}{m} np.sum(A^{[2]} - Y) db[2]=m1dZ[2]=m1np.sum(A[2]Y)
由于 A [ 2 ] − Y A^{[2]} - Y A[2]Y维度为1xm,而 d b [ 2 ] db^{[2]} db[2]维度为1x1,所以对 A [ 2 ] − Y A^{[2]} - Y A[2]Y求和。
d Z [ 1 ] = d Z [ 2 ] W [ 2 ] ∗ g [ 1 ] ′ ( Z [ 1 ] ) dZ^{[1]}=dZ^{[2]}W^{[2]}* g^{[1]'}(Z^{[1]}) dZ[1]=dZ[2]W[2]g[1](Z[1])
由于 W [ 2 ] W^{[2]} W[2]维度=1x4, Z [ 2 ] Z^{[2]} Z[2]的维度为=1xm, d Z [ 1 ] dZ^{[1]} dZ[1]的维度为4xm,所以需要将 W [ 2 ] W^{[2]} W[2]转置,得到下式:
d Z [ 1 ] = W [ 2 ] T d Z [ 2 ] ∗ g [ 1 ] ′ ( Z [ 1 ] ) = n p . d o t ( W [ 2 ] . T , d Z [ 2 ] ) ∗ g [ 1 ] ′ ( Z [ 1 ] ) dZ^{[1]}=W^{[2]T}dZ^{[2]}* g^{[1]'}(Z^{[1]})=np.dot(W^{[2]}.T,dZ^{[2]})*g^{[1]'}(Z^{[1]}) dZ[1]=W[2]TdZ[2]g[1](Z[1])=np.dot(W[2].T,dZ[2])g[1](Z[1])
d W [ 1 ] = 1 m d Z [ 1 ] X dW^{[1]}=\frac{1}{m} dZ^{[1]} X dW[1]=m1dZ[1]X
由于 d W [ 1 ] dW^{[1]} dW[1]维度为:4 x n_x, d Z [ 1 ] dZ^{[1]} dZ[1]的维度为4xm,X维度为n_x x m,所以将X转置,得到下式:
d W [ 1 ] = 1 m d Z [ 1 ] X T dW^{[1]}=\frac{1}{m} dZ^{[1]} X^{T} dW[1]=m1dZ[1]XT
d b [ 1 ] = 1 m d Z [ 1 ] db^{[1]}=\frac{1}{m} dZ^{[1]} db[1]=m1dZ[1]
由于 d Z [ 1 ] dZ^{[1]} dZ[1]的维度为4xm,而 b [ 1 ] b^{[1]} b[1]维度=4x1,所以对每一行求和,得下式:
d b [ 1 ] = 1 m n p . s u m ( d Z [ 1 ] ) db^{[1]}=\frac{1}{m} np.sum(dZ^{[1]}) db[1]=m1np.sum(dZ[1])

  • 2
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值