Course4-week1-convolutional neural network

1 - Computer vision

Computer vision problem:

  • images recognition
  • object detection
  • style transfer


这里写图片描述

One of the challenges of the computer vision problem is that input can get really big.


这里写图片描述

The total number of parameters on the first layer(assume there are 1000 units in the first layer) is 1000 × × (1000 × × 1000 × × 3), if we use a standard fully connected network, cause the W[1] W [ 1 ] .shape = (1000, 1000 × × 1000 × × 3). With that many parameter, it’s difficult to get enough data to prevent neural network from overfitting, and also the computation requirement a bit infeasible.

2 - edges detection example

The convolution operation is one of the fundamental building blocks of a convolution neural network.

convolution operation:


这里写图片描述

Tensorflow: tf.nn.conv2d()
Keras:Conv2D

detect edges in a images:


这里写图片描述

3 - more edges detection

We have seen how the convolution operator work and allow us to implement a vertical edges detector.

In this vedio, you will learn the different between positive edges and negative edges is the different between light to dark vs. dark to light. And also see how to have an algorithm learn rather than having hand coding an edges detector.


这里写图片描述

the -30 shows that this is a dark to light rather than a light to dark transition.
This is a more complicated example the show the positive and negative edges.


这里写图片描述

In summary, different filters allow you to find vertical and horizontal edges. For detecting the vertial edges in the images, there are somethings else we could use, such as Sobel filter, Scharr filter and so on.


这里写图片描述

With the rise of deep learning, one of things we learn is that when you really want to detect edges in some complicated images, maybe you don’t need to hand pick these nine numbers, maybe can just treat the nine number as parameters, which we can learn using the back propagation and hopefully it can give us a good detector even better than Sobel at capturing the feature of data.

So by just letting all of the nine number as parameters, and learning them automatically from data, we find that the neural network can actually learn low level features, such as edges, even more robustly than computer vision researchers generatelly code out these number by hand.

So the idea that treat these nine number as parameter has been one of the most powerful ideas in computer vision.

4 - padding

In order to building deep neural network, one modification to the basic convolutional operation is padding. If you take a 6 × × 6 images and convolve it with a 3 × × 3 filter, you end up with a 4 × × 4 output.

(n,n)(f,f)(nf+1,nf+1) ( n , n ) ∗ ( f , f ) → ( n − f + 1 , n − f + 1 )

The two downsides to this:

  • every time apply a convolutional operator, the image shrink.
  • the pixels on the corners or on the edges are used much less in the output so throwing away a lot of information near the edge of the images

So in order to fix both of these problems, what we can do is before apply the convolutional operation, we can pad the image,


这里写图片描述

Now we get back out 6 × × 6 images, so that preserves the original size of the input.

(n,n)(f,f) add padding (n+2pf+1,n+2pf+1) ( n , n ) ∗ ( f , f )  add padding  → ( n + 2 p − f + 1 , n + 2 p − f + 1 )

In term of how much to padding, it turns out there are two common choices.

  • Valid: this basically means no padding.
  • Same: pad so that the output size is same as the input size. In this case, if we want the output size equal to the input size:
    n+2pf+1=n n + 2 p − f + 1 = n

    p=f12(1) (1) → p = f − 1 2

so if the p is odds, by choosing the padding size to be f12 f − 1 2 , we can make sure that the output size same as the input size. So when f=3 f = 3 , the p=1 p = 1 , and if f=5 f = 5 , the p p would be equal to 2.

By convention, in computer vision, f is usually odd, the main reasons for this is equation 1.

5 - strided convolutional

Strided convolution is another piece of the basic building block of convolutions used in the convolutional neural network. Now we are going to do the convolution operator with a stride of two.


这里写图片描述

the output dimension turns to be governed by the following formula:

(n+2pfs+1,n+2pfs+1) ( ⌊ n + 2 p − f s ⌋ + 1 , ⌊ n + 2 p − f s ⌋ + 1 )

That’s a convention that the filter must lie entirely within the images or the images plus padding region.

6 - convolutions over volumes

We seen how convolutions over 2D images work. Now let’s see how can implement convolutions over 3D images.

In order to detect edges or some of features in the RGB images, we convolve this not to a 3 × × 3 filter, but now a 3×3×3 3 × 3 × 3 filter.


这里写图片描述

Note: the number of channels in the “images” must match the number of channeld in the filter.


这里写图片描述

By convention in computer vision, when you have a input with a certain height, a certain width, and a certain number of channels, then the filter will have a potential different height, a different width, but the same number of channels.

multiple filters

What to do if you want to use multiple filters at the same time?


这里写图片描述

taking the 6×6×3 6 × 6 × 3 images convolve it with two different 3×3×3 3 × 3 × 3 filters and resulting in two different 4×4 4 × 4 output, we can stack them up to form a 4×4×2 4 × 4 × 2 volume. The 2 comes from the fact that we used two different filters.

Summary:

[n×n×nC1][f×f×nC2](nf+1)×(nf+1)×nC [ n × n × n C 1 ] ∗ [ f × f × n C 2 ] ⟶ ( n − f + 1 ) × ( n − f + 1 ) × n C ′

nC1 n C 1 must equal to nC2 n C 2 , and nC n C ′ is the number of filters. And the above fumalar assuming that use stride one and no padding.

7 - one layer of convolutional neural network

This is one layer of convolutional neural network.


这里写图片描述

z[1]=W[1]a[0]+b[1] z [ 1 ] = W [ 1 ] a [ 0 ] + b [ 1 ]

a[1]=g(z[1]) a [ 1 ] = g ( z [ 1 ] )

These filters display the rule similar to W[1] W [ 1 ] , the convolution operation is really applying the lienar operation, and add biase, and apply a relu activation function, so we have gone from a 6×6×3 6 × 6 × 3 input a[0] a [ 0 ] through one layer of neural network to a 4×4×2 4 × 4 × 2 output a[1] a [ 1 ] as the activation value of next layer.

quiz: if you have 10 filters that are $3\times3\times3$ in one layer of a neural network, how many parameters dose this layer have?

10 * (3 * 3 * 3 + 1)

Notice one nice thing that is no matter how big the input images is, but the number of the parameters we have still remains fixed as 280.

Summary of notation:

f[l]=filtersize f [ l ] = f i l t e r s i z e

p[l]=padding p [ l ] = p a d d i n g

s[l]=stride s [ l ] = s t r i d e

input:n[l1]H,n[l1]W,n[l1]C i n p u t : n H [ l − 1 ] , n W [ l − 1 ] , n C [ l − 1 ]

output:n[l]H,n[l]W,n[l]C o u t p u t : n H [ l ] , n W [ l ] , n C [ l ]

n[l]H/W=n[l1]H/W+2p[l]f[l]s[l]+1 n H / W [ l ] = ⌊ n H / W [ l − 1 ] + 2 p [ l ] − f [ l ] s [ l ] + 1 ⌋

n[l]C:number of filters n C [ l ] : number of filters

Each filter:f[l],f[l],n[l1]C Each filter : f [ l ] , f [ l ] , n C [ l − 1 ]

When you are using vectorized implementation version, or batch gradient descent, or mini-batch gradient descnt.

activation: A[l]=(m,n[l]H,n[l]W,n[l]C) activation:  A [ l ] = ( m , n H [ l ] , n W [ l ] , n C [ l ] )

weight:W[l]=(f[l],f[l],n[l1]C,n[l]C) weight : W [ l ] = ( f [ l ] , f [ l ] , n C [ l − 1 ] , n C [ l ] )

bias:b[l]=(1,1,1,n[l]C) bias : b [ l ] = ( 1 , 1 , 1 , n C [ l ] )

8 - a simple convolution network example

Go through a concrete example of a deep convolution neural network.


这里写图片描述

One thing to take away from this is that as you go deeper in the neural network, typically you start out with large images, and then the height and width will stay the same for while, and gradually trend down as you go deeper in neural network. Whereas the the number of channels will generally increase.

In the typical convNet, there are usually three type of layers:

  • convolution(CONV)
  • pooling(POOL)
  • fully connected(FC)

9 - pooling layer

9.1 - max pooling

ConvNet often also use pooling layer to reduce the size of representation to speed up the computation, as well as make some of the features it detect a bit more robust.


这里写图片描述

hyperparameters of the max pooling

f = 2

s = 2

What the max operator does is really that if some particular feature is detected anywhere in this filter, then keep the high number; but if the feature is not detected, then the max of the all the number from the region is still quite small.

One interesting property of max pooling is that it has a set of hyperparameters, but it has no parameters for gradient descent to learn.

The formula for figuring out the output size for a CONV layer also work for the POOL layer.


这里写图片描述

hyperparameters of the max pooling

f = 3

s = 1

max pooling computation is done independently on each of the n[l]c n c [ l ] channels,

9.2 - average pooling


这里写图片描述

hyperparameters of the max pooling

f = 2

s = 2

In these days, the max pooling is much more often used than average pooling.

To summarize:

  • for the pooling layer, the common choice of hyperparameter might be f = 2, s = 2, and this have the effect of shrinking the height and width of the representation by a factor of two.
  • no parameter to learn,
  • (nH,nW,nC)(nHfs+1,nHfs+1,nC) ( n H , n W , n C ) ⟶ ( ⌊ n H − f s + 1 ⌋ , ⌊ n H − f s + 1 ⌋ , n C )

10 - convolutional neural network example

It’s actually quiet similar to one of the classic networks call LeNet-5, which was created by Yann LeCun.


这里写图片描述

As go deeper, the width and height of the images will decrease, whereas the number of channels will increase.

Common pattern in neural network is :

CONV->POOL->CONV-POOL->FC->FC->FC->SOFTMAX CONV->POOL->CONV-POOL->FC->FC->FC->SOFTMAX


这里写图片描述

  • Pool layer don’t have any parameters
  • Conv layer tend to have relatively few parameters
  • a lot of parameters tend to be in FC layer
  • activation size go down generally as go deeper

11 - why convolutions

Why convolutions are so useful when include them in neural network.

Two main advantages of convolutional layer over fully connected layers, the advantages are parameter sharing, and sparsity of connection.


这里写图片描述

If we are create a neural network with 32×32×3=3,072 32 × 32 × 3 = 3 , 072 units in one layer, and 28×28×6=4704 28 × 28 × 6 = 4704 units in the next layer, and connected every one of this units, that is a CF layer, the parameters in the weights matrix would be 3072×4704=14,000,000 3072 × 4704 = 14 , 000 , 000 , So that’s a lot of parameters to train, and this is just a pretty small images. But the total number of parameters in the CONV layer is (5×5+1)×6=156 ( 5 × 5 + 1 ) × 6 = 156 .

The reason for CONV NET has relatively small parameter is two:

  • parameter sharing: a feature detector that is useful in one part of the image is probably useful in another part of the images. A feature detector computed for the upper left hand corner of the images, will be useful for the lower right hand corner of the image. Sp we don’t need to learn separate feature detector for different position of the image.
  • spariry of connections: in each layer, each output value depends only on a small number of input.

though these two mechanisms, a neural network has a lot few parameters, which allows it to be trained with smaller training set and prevent it from overfitting.

How to train this network?


这里写图片描述

define the cost function:

J(W,b)=1mi=1mL(y^(i),y(i)) J ( W , b ) = 1 m ∑ i = 1 m L ( y ^ ( i ) , y ( i ) )

So to training the neural network, all we need to do is then use gradient descent, of some other algotithm, like Momentum, RMSProp or Adam, in order to optimize all the parameters in the neural network to try to reduce the cost.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值