Deep learning I - III Shallow Neural Network - Backpropagation intuition反向传播算法启发

Backpropagation intuition


简单的2层浅神经网络,第一层的activation function为 tanh(z) t a n h ( z ) ,第二层的activation function为 sigmoid(z) s i g m o i d ( z )
神经网络architecture如下图:
这里写图片描述
使用计算流图(computational graphs)表示如下图:

这里写图片描述

在下面的公式中, loga[2] means lna[2] log ⁡ a [ 2 ]   m e a n s   ln ⁡ a [ 2 ] da[2],dz[2] d a [ 2 ] , d z [ 2 ] 等等是标记相应的导数的符号;并且,下面的公式是单个instance的,并没有矩阵化。

L(a[2],y)=yloga[2](1y)log(1a[2])(1.1) (1.1) L ( a [ 2 ] , y ) = − y log ⁡ a [ 2 ] − ( 1 − y ) log ⁡ ( 1 − a [ 2 ] )

da[2][1×1]=dda[2]L(a[2],y)=ya[2]+1y1a[2](1.2) (1.2) d a [ 1 × 1 ] [ 2 ] = d d a [ 2 ] L ( a [ 2 ] , y ) = − y a [ 2 ] + 1 − y 1 − a [ 2 ]

g(z[2])=sigmoid(z[2])=a[2](1.3) (1.3) g ( z [ 2 ] ) = s i g m o i d ( z [ 2 ] ) = a [ 2 ]

dz[2][1×1]=ddz[2]L(a[2],y)=dda[2]L(a[2],y)ddz[2]a[2]=da[2]g(z[2])=(ya[2]+1y1a[2])(g(z[2])(1g(z[2])))=(ya[2]+1y1a[2])a[2](1a[2])=a[2]y(1.4) (1.4) d z [ 1 × 1 ] [ 2 ] = d d z [ 2 ] L ( a [ 2 ] , y ) = d d a [ 2 ] L ( a [ 2 ] , y ) ⋅ d d z [ 2 ] a [ 2 ] = d a [ 2 ] ⋅ g ′ ( z [ 2 ] ) = ( − y a [ 2 ] + 1 − y 1 − a [ 2 ] ) ⋅ ( g ( z [ 2 ] ) ( 1 − g ( z [ 2 ] ) ) ) = ( − y a [ 2 ] + 1 − y 1 − a [ 2 ] ) ⋅ a [ 2 ] ⋅ ( 1 − a [ 2 ] ) = a [ 2 ] − y

dW[2][1×4]=ddW[2]L(a[2],y)=dda[2]L(a[2],y)ddz[2]a[2]ddW[2]z[2]=dz[2]x=dz[2][1×1](a[1][4×1])T(1.5) (1.5) d W [ 1 × 4 ] [ 2 ] = d d W [ 2 ] L ( a [ 2 ] , y ) = d d a [ 2 ] L ( a [ 2 ] , y ) ⋅ d d z [ 2 ] a [ 2 ] ⋅ d d W [ 2 ] z [ 2 ] = d z [ 2 ] ⋅ x = d z [ 1 × 1 ] [ 2 ] ( a [ 4 × 1 ] [ 1 ] ) T

db[2][1×1]=ddb[2]L(a[2],y)=dda[2]L(a[2],y)ddz[2]a[2]ddb[2]z[2]=dz[2][1×1](1.6) (1.6) d b [ 1 × 1 ] [ 2 ] = d d b [ 2 ] L ( a [ 2 ] , y ) = d d a [ 2 ] L ( a [ 2 ] , y ) ⋅ d d z [ 2 ] a [ 2 ] ⋅ d d b [ 2 ] z [ 2 ] = d z [ 1 × 1 ] [ 2 ]

da[1][4×1]=dda[1]L(a[2],y)=dda[2]L(a[2],y)ddz[2]a[2]dda[1]z[2]=dz[2]W[2]=(W[2][1×4])Tdz[2][1×1](1.7) (1.7) d a [ 4 × 1 ] [ 1 ] = d d a [ 1 ] L ( a [ 2 ] , y ) = d d a [ 2 ] L ( a [ 2 ] , y ) ⋅ d d z [ 2 ] a [ 2 ] ⋅ d d a [ 1 ] z [ 2 ] = d z [ 2 ] ⋅ W [ 2 ] = ( W [ 1 × 4 ] [ 2 ] ) T d z [ 1 × 1 ] [ 2 ]

g(z[1])=tanh(z[1])=a[1](1.8) (1.8) g ( z [ 1 ] ) = tanh ⁡ ( z [ 1 ] ) = a [ 1 ]

dz[1][4×1]=ddz[1]L(a[2],y)=dda[2]L(a[2],y)ddz[2]a[2]dda[1]z[2]ddz[1]a[1]=da[1]g(z[1])=(W[2][1×4])Tdz[2][1×1]g(z[1])[4×1](1.9) (1.9) d z [ 4 × 1 ] [ 1 ] = d d z [ 1 ] L ( a [ 2 ] , y ) = d d a [ 2 ] L ( a [ 2 ] , y ) ⋅ d d z [ 2 ] a [ 2 ] ⋅ d d a [ 1 ] z [ 2 ] ⋅ d d z [ 1 ] a [ 1 ] = d a [ 1 ] ⋅ g ′ ( z [ 1 ] ) = ( W [ 1 × 4 ] [ 2 ] ) T d z [ 1 × 1 ] [ 2 ] ∗ g ′ ( z [ 1 ] ) [ 4 × 1 ]

dW[1][4×3]=ddW[1]L(a[2],y)=dda[2]L(a[2],y)ddz[2]a[2]dda[1]z[2]ddz[1]a[1]ddW[1]z[1]=dz[1]x=dz[1][4×1](a[0][3×1])T(1.10) (1.10) d W [ 4 × 3 ] [ 1 ] = d d W [ 1 ] L ( a [ 2 ] , y ) = d d a [ 2 ] L ( a [ 2 ] , y ) ⋅ d d z [ 2 ] a [ 2 ] ⋅ d d a [ 1 ] z [ 2 ] ⋅ d d z [ 1 ] a [ 1 ] ⋅ d d W [ 1 ] z [ 1 ] = d z [ 1 ] ⋅ x = d z [ 4 × 1 ] [ 1 ] ( a [ 3 × 1 ] [ 0 ] ) T

db[1][4×1]=ddW[1]L(a[2],y)=dda[2]L(a[2],y)ddz[2]a[2]dda[1]z[2]ddz[1]a[1]ddb[1]z[1]=dz[1][4×1](1.11) (1.11) d b [ 4 × 1 ] [ 1 ] = d d W [ 1 ] L ( a [ 2 ] , y ) = d d a [ 2 ] L ( a [ 2 ] , y ) ⋅ d d z [ 2 ] a [ 2 ] ⋅ d d a [ 1 ] z [ 2 ] ⋅ d d z [ 1 ] a [ 1 ] ⋅ d d b [ 1 ] z [ 1 ] = d z [ 4 × 1 ] [ 1 ]

下面是vectorization后的反向传播算法公式:

L(A[2],Y)=1mi=1my(i)logA[2](i)(1y(i))log(1A[2](i))(2.1) (2.1) L ( A [ 2 ] , Y ) = 1 m ∑ i = 1 m − y ( i ) log ⁡ A [ 2 ] ( i ) − ( 1 − y ( i ) ) log ⁡ ( 1 − A [ 2 ] ( i ) )

dA[2][1×m]=[(Y(1)A[2](1)+1Y(1)1A[2](1)),,(Y(m)A[2](m)+1Y(m)1A[2](m))](2.2) (2.2) d A [ 1 × m ] [ 2 ] = [ ( − Y ( 1 ) A [ 2 ] ( 1 ) + 1 − Y ( 1 ) 1 − A [ 2 ] ( 1 ) ) , ⋯ , ( − Y ( m ) A [ 2 ] ( m ) + 1 − Y ( m ) 1 − A [ 2 ] ( m ) ) ]

dZ[2][1×m]=[(Y(1)A[2](1)+1Y(1)1A[2](1)),,(Y(m)A[2](m)+1Y(m)1A[2](m))][A[2](1)(1A[2](1)),,A[2](m)(1A[2](m))]=[(A[2](1)Y(1)),,(A[2](m)Y(m))]=A[2]Y(2.3) (2.3) d Z [ 1 × m ] [ 2 ] = [ ( − Y ( 1 ) A [ 2 ] ( 1 ) + 1 − Y ( 1 ) 1 − A [ 2 ] ( 1 ) ) , ⋯ , ( − Y ( m ) A [ 2 ] ( m ) + 1 − Y ( m ) 1 − A [ 2 ] ( m ) ) ] ∗ [ A [ 2 ] ( 1 ) ( 1 − A [ 2 ] ( 1 ) ) , ⋯ , A [ 2 ] ( m ) ( 1 − A [ 2 ] ( m ) ) ] = [ ( A [ 2 ] ( 1 ) − Y ( 1 ) ) , ⋯ , ( A [ 2 ] ( m ) − Y ( m ) ) ] = A [ 2 ] − Y

dW[2][1×4]=1mdZ[2][1×m](A[1][4×m])T(2.4) (2.4) d W [ 1 × 4 ] [ 2 ] = 1 m d Z [ 1 × m ] [ 2 ] ( A [ 4 × m ] [ 1 ] ) T

db[2][1×1]=1mnp.sum(dZ[2],axis=1,keepdims=True)(2.5) (2.5) d b [ 1 × 1 ] [ 2 ] = 1 m n p . s u m ( d Z [ 2 ] , a x i s = 1 , k e e p d i m s = T r u e )

dZ[1][4×m]=(W[2][1×4])TdZ[2][1×m]g[1](Z[1])[4×m] d Z [ 4 × m ] [ 1 ] = ( W [ 1 × 4 ] [ 2 ] ) T d Z [ 1 × m ] [ 2 ] ∗ g [ 1 ] ′ ( Z [ 1 ] ) [ 4 × m ]

dW[1][4×3]=1mdZ[1][4×m](A[0][3×m])T(2.6) (2.6) d W [ 4 × 3 ] [ 1 ] = 1 m d Z [ 4 × m ] [ 1 ] ( A [ 3 × m ] [ 0 ] ) T

db[1][4×1]=1msp.sum(dZ[1][4×m],axis=1,keepdims=True)(2.7) (2.7) d b [ 4 × 1 ] [ 1 ] = 1 m s p . s u m ( d Z [ 4 × m ] [ 1 ] , a x i s = 1 , k e e p d i m s = T r u e )

总结

这里写图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值