Course1-week4-deep neural network

4.1 - deep L-layer neural network

We have seen forward propagation and backward propagation in the context of a neural network with a single hidden layer as well we the logistic regression, and we learn about vectorization and why it’s important initialize the parameters randomly. By now we have actually seen most of the ideas we need to implement a deep neural network. What we are going to do now is take those ideas and put them together so that we will be able to implement our own deep neural network.


这里写图片描述

shallow versus depth is a matter of degree. When we count the layer in neural network, we don’t count the input layer, we just count the hidden layers as well as the output layer. There are functions that very deep neural network can learn, but the shallow models are unable to. Although for any given problem it’s may be hard to predict in advance exactly how deep neural network you would want, so it would be reasonable to try logistic regression, try one and then two hidden layers and view the number of hidden layers as another hyperparamenters.

Let’s now go though the notation we used to descibe the deep neural network.


这里写图片描述

L L = #(layers), n[l] = #(units in layer l l ), for example: n[0]=nx=3,n[1]=5,n[2]=5,n[3]=3,,n[4]=n[L]=1

a[l] a [ l ] = activation in layer, a[l]=g(z[l]) a [ l ] = g ( z [ l ] ) , a[0]=x a [ 0 ] = x and a[L]=y^ a [ L ] = y ^

W[l],b[l] W [ l ] , b [ l ] to denote the weights for computing the value z[l] z [ l ] in layer l l

4.2 - forward propagation in a deep network

Now we will discuss how we can perform forward propagation in a deep network.

let’s first go over what forward propagation will look like for a single trining example x, and then later on we will talk about the vectorized version when we want to carry out forward propagation on the entire set at one same time.


这里写图片描述

z[1]=W[1]x+b[1]a[1]=g[1](z[1])z[2]=W[2]a[1]+b[2]a[2]=g[2](z[2])z[4]=W[4]a[3]+b[4]a[4]=g[4](z[4]) z [ 1 ] = W [ 1 ] x + b [ 1 ] a [ 1 ] = g [ 1 ] ( z [ 1 ] ) z [ 2 ] = W [ 2 ] a [ 1 ] + b [ 2 ] a [ 2 ] = g [ 2 ] ( z [ 2 ] ) ⋯ z [ 4 ] = W [ 4 ] a [ 3 ] + b [ 4 ] a [ 4 ] = g [ 4 ] ( z [ 4 ] )

so for one training example, the general rule for forward propagation equations is :

z[l]=W[l]a[l1]+b[l]a[l]=g[l](z[l]) z [ l ] = W [ l ] a [ l − 1 ] + b [ l ] a [ l ] = g [ l ] ( z [ l ] )

how about for doing this in a vectorized way for the whole training set at the same time.

Z[l]=W[l]A[l1]+b[l]A[l]=g[l](A[l]) Z [ l ] = W [ l ] A [ l − 1 ] + b [ l ] A [ l ] = g [ l ] ( A [ l ] )

Now bear in mind that X X is just equal to Z[0], just stack the training examples in different columns, similiary we just taking this vectors z[l](i) z [ l ] ( i ) or a[l](i) a [ l ] ( i ) and stack them up and calling this Z[l] Z [ l ] or A[l] A [ l ] .

Z[l]=[z[l](1),z[l](2),,z[l](m)] Z [ l ] = [ z [ l ] ( 1 ) , z [ l ] ( 2 ) , ⋯ , z [ l ] ( m ) ]

A[l]=[a[l](1),a[l](2),,a[l](m)] A [ l ] = [ a [ l ] ( 1 ) , a [ l ] ( 2 ) , ⋯ , a [ l ] ( m ) ]

4.3 - getting your matrix dimensions right

W[l]=(n[l],n[l1]) W [ l ] = ( n [ l ] , n [ l − 1 ] )

b[l]=(n[l],1) b [ l ] = ( n [ l ] , 1 )

dW[l].shape=W[l].shape d W [ l ] . s h a p e = W [ l ] . s h a p e

db[l].shape=b[l].shape d b [ l ] . s h a p e = b [ l ] . s h a p e

for the vectorized version, the dimension of W W and b would be stays the same, but instead of be (n[l],1) ( n [ l ] , 1 ) , the dimension of Z[l] Z [ l ] will be (n[l],m) ( n [ l ] , m ) , where m m is the size of training examples.

Z[l],A[l]=(n[l],m)

dZ[l],dA[l]=(n[l],m) d Z [ l ] , d A [ l ] = ( n [ l ] , m )

4.4 - why deep representations

what is the deep network computing. If you are building a feac recognition or face detection system, here is the deep netowrk could be doing. Intuitively, you can think of the earlier layer of the neural network is detecting simpler functions like edges, and then composing them together in the later layer of a neural network, so that they can learn one more complex functions. So deep neural network with multiple hidden layers might be able to have the earlier layer learn these low levels simpler features and then have the deep layer to put together the simpler things that’s detected in order to detect more complex things.


这里写图片描述

4.5 - building block of deep neural network

We have seen the basic building blocks of forward propagation and backward propagation, the key compontent we need to implement a deep neural network. Now let’s see how we can put them together to build a deep net.
Let’s pick one layer and look at the computation focus on just that layer for now,

For l l for forward:

  • parameters: W[l],b[l]

    • input: a[l1] a [ l − 1 ]
    • output: a[l] a [ l ]
    • z[l]=W[l]a[l1]+b[l] z [ l ] = W [ l ] a [ l − 1 ] + b [ l ]
    • a[l]=g[l](z[l]) a [ l ] = g [ l ] ( z [ l ] )
    • cache z[l] z [ l ] will be useful for backward propagation step later
    • for backward

      • input: da[l],cache:z[l] d a [ l ] , c a c h e : z [ l ]
      • output: da[l1],dW[l],db[l] d a [ l − 1 ] , d W [ l ] , d b [ l ]

      This is the basic structure of how we implement forward and backward propagations step.


      这里写图片描述


      这里写图片描述

      Now we’ve seen one of the basic building blocks for implement a deep neural network, in each layer these is a forward propagation step and these is a corresponding backward propagation step, and a cache deposit informations from one to another.

      4.6 - forward and backward propagation

      forward propagation:

      • input a[l1] a [ l − 1 ]
      • output a[l] a [ l ] , cache w[l],b[l],z[l],a[l1] w [ l ] , b [ l ] , z [ l ] , a [ l − 1 ]

        for one single example:

        z[l]=Wa[l1]+b[l] z [ l ] = W a [ l − 1 ] + b [ l ]

        a[l]=g[l](z[l]) a [ l ] = g [ l ] ( z [ l ] )

        for entire training examples or vectorized version:
        Z[l]=WA[l1]+b[l] Z [ l ] = W A [ l − 1 ] + b [ l ]

        A[l]=g[l](Z[l]) A [ l ] = g [ l ] ( Z [ l ] )

      barkward propagation:

      • input da[l] d a [ l ]
      • output da[l1],dW[l],db[l] d a [ l − 1 ] , d W [ l ] , d b [ l ]

        for one single example:

      dz[l]=da[l]g[l](z[l])dw[l]=dz[l]a[l1]db[l]=dz[l]da[l1]=w[l]Tdz[l] d z [ l ] = d a [ l ] ∗ g [ l ] ′ ( z [ l ] ) d w [ l ] = d z [ l ] ⋅ a [ l − 1 ] d b [ l ] = d z [ l ] d a [ l − 1 ] = w [ l ] T ⋅ d z [ l ]

      reminder the equations we used in neural network with just one single hidden layer:
      

      dz[l]=w[l]Tdz[l]g[l](z[l]) d z [ l ] = w [ l ] T ⋅ d z [ l ] ∗ g [ l ] ′ ( z [ l ] )

      for entire training examples or vectorized version:
      

      dZ[l]=dA[l]g[l](Z[l])dW[l]=1mdZ[l]A[l1]Tdb[l]=1mnp.sum(dZ[l],axis=1,keepdims=True)dA[l1]=W[l]TdZ[l] d Z [ l ] = d A [ l ] ∗ g [ l ] ′ ( Z [ l ] ) d W [ l ] = 1 m d Z [ l ] ⋅ A [ l − 1 ] T d b [ l ] = 1 m n p . s u m ( d Z [ l ] , a x i s = 1 , k e e p d i m s = T r u e ) d A [ l − 1 ] = W [ l ] T ⋅ d Z [ l ]

      according to the equations in barkward propagation for any layer l l , we know we need to cache the W[l],Z[l],A[l1] from the forward propagation.


      这里写图片描述

      4.7 - parameter vs hyperparameter

      parameters: W[1],b[1],W[2],b[2], W [ 1 ] , b [ 1 ] , W [ 2 ] , b [ 2 ] , ⋯

      hyperparamenters:

      • learning rate
      • #iteration
      • #hidden layer
      • #unit size
      • choice of activation function

      these are parameters that contral W W and b, so we call these things hyperparameters.

      4.8 - summary


      这里写图片描述

      这里写图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值