Course4-week1-convolutional neural network

最新推荐文章于 2021-06-24 12:09:58 发布

土肥宅娘口三三

最新推荐文章于 2021-06-24 12:09:58 发布

阅读量822

点赞数

分类专栏： deep learning 文章标签： Andrew Ng deep learning deeplearning.ai

本文链接：https://blog.csdn.net/robin_xu_shuai/article/details/80631021

版权

deep learning 专栏收录该内容

13 篇文章 0 订阅

订阅专栏

1 - Computer vision

Computer vision problem:

images recognition
object detection
style transfer

One of the challenges of the computer vision problem is that input can get really big.

The total number of parameters on the first layer(assume there are 1000 units in the first layer) is 1000 $\times$ (1000 $\times$ 1000 $\times$ 3), if we use a standard fully connected network, cause the $W^{[1]}$ .shape = (1000, 1000 $\times$ 1000 $\times$ 3). With that many parameter, it’s difficult to get enough data to prevent neural network from overfitting, and also the computation requirement a bit infeasible.

2 - edges detection example

The convolution operation is one of the fundamental building blocks of a convolution neural network.

convolution operation:

Tensorflow: tf.nn.conv2d()
Keras:Conv2D

detect edges in a images:

3 - more edges detection

We have seen how the convolution operator work and allow us to implement a vertical edges detector.

In this vedio, you will learn the different between positive edges and negative edges is the different between light to dark vs. dark to light. And also see how to have an algorithm learn rather than having hand coding an edges detector.

the -30 shows that this is a dark to light rather than a light to dark transition.
This is a more complicated example the show the positive and negative edges.

In summary, different filters allow you to find vertical and horizontal edges. For detecting the vertial edges in the images, there are somethings else we could use, such as Sobel filter, Scharr filter and so on.

With the rise of deep learning, one of things we learn is that when you really want to detect edges in some complicated images, maybe you don’t need to hand pick these nine numbers, maybe can just treat the nine number as parameters, which we can learn using the back propagation and hopefully it can give us a good detector even better than Sobel at capturing the feature of data.

So by just letting all of the nine number as parameters, and learning them automatically from data, we find that the neural network can actually learn low level features, such as edges, even more robustly than computer vision researchers generatelly code out these number by hand.

So the idea that treat these nine number as parameter has been one of the most powerful ideas in computer vision.

4 - padding

In order to building deep neural network, one modification to the basic convolutional operation is padding. If you take a 6 $\times$ 6 images and convolve it with a 3 $\times$ 3 filter, you end up with a 4 $\times$ 4 output.

(n, n) * (f, f) \to (n - f + 1, n - f + 1)

$(n, n)\ast(f, f) \rightarrow (n - f + 1, n - f + 1)$

The two downsides to this:

every time apply a convolutional operator, the image shrink.
the pixels on the corners or on the edges are used much less in the output so throwing away a lot of information near the edge of the images

So in order to fix both of these problems, what we can do is before apply the convolutional operation, we can pad the image,

Now we get back out 6 $\times$ 6 images, so that preserves the original size of the input.

(n, n) * (f, f) add padding \to (n + 2 p - f + 1, n + 2 p - f + 1)

$(n, n)\ast(f, f) \text{ add padding }\rightarrow (n + 2p - f + 1, n + 2p - f + 1)$

In term of how much to padding, it turns out there are two common choices.

Valid: this basically means no padding.
Same: pad so that the output size is same as the input size. In this case, if we want the output size equal to the input size:
$n + 2 p - f + 1 = n$ $n + 2p - f + 1 = n$
$\to p = f - 1 2 (1)$ $\rightarrow p = \frac{f - 1}2\tag1$

so if the p is odds, by choosing the padding size to be $\frac{f - 1}2$ , we can make sure that the output size same as the input size. So when $f = 3$ , the $p = 1$ , and if $f = 5$ , the $p$ would be equal to 2.

By convention, in computer vision, $f$ is usually odd, the main reasons for this is equation 1.

5 - strided convolutional

Strided convolution is another piece of the basic building block of convolutions used in the convolutional neural network. Now we are going to do the convolution operator with a stride of two.

the output dimension turns to be governed by the following formula:

(⌊ n + 2 p - f s ⌋ + 1, ⌊ n + 2 p - f s ⌋ + 1)

$(\lfloor\frac{n + 2p - f}{s}\rfloor + 1, \lfloor\frac{n + 2p - f}{s}\rfloor + 1)$

That’s a convention that the filter must lie entirely within the images or the images plus padding region.

6 - convolutions over volumes

We seen how convolutions over 2D images work. Now let’s see how can implement convolutions over 3D images.

In order to detect edges or some of features in the RGB images, we convolve this not to a 3 $\times$ 3 filter, but now a $3\times3\times3$ filter.

Note: the number of channels in the “images” must match the number of channeld in the filter.

By convention in computer vision, when you have a input with a certain height, a certain width, and a certain number of channels, then the filter will have a potential different height, a different width, but the same number of channels.

multiple filters

What to do if you want to use multiple filters at the same time?

taking the $6\times6\times3$ images convolve it with two different $3\times3\times3$ filters and resulting in two different $4\times4$ output, we can stack them up to form a $4\times4\times2$ volume. The 2 comes from the fact that we used two different filters.

Summary:

[n \times n \times n C 1] * [f \times f \times n C 2] ⟶ (n - f + 1) \times (n - f + 1) \times n' C

$[n\times n\times n_{C1}] \ast [f\times f\times n_{C2}] \longrightarrow (n - f + 1) \times (n - f + 1) \times n_C^{'}$

$n_{C1}$ must equal to $n_{C2}$ , and $n_C^{'}$ is the number of filters. And the above fumalar assuming that use stride one and no padding.

7 - one layer of convolutional neural network

This is one layer of convolutional neural network.

z [1] = W [1] a [0] + b [1]

$z^{[1]} = W^{[1]}a^{[0]} + b^{[1]}$

a [1] = g (z [1])

$a^{[1]} = g(z^{[1]})$

These filters display the rule similar to $W^{[1]}$ , the convolution operation is really applying the lienar operation, and add biase, and apply a relu activation function, so we have gone from a $6\times6\times3$ input $a^{[0]}$ through one layer of neural network to a $4\times4\times2$ output $a^{[1]}$ as the activation value of next layer.

quiz: if you have 10 filters that are $3\times3\times3$ in one layer of a neural network, how many parameters dose this layer have?

10 * (3 * 3 * 3 + 1)

Notice one nice thing that is no matter how big the input images is, but the number of the parameters we have still remains fixed as 280.

Summary of notation:

f [l] = f i l t e r s i z e

$f^{[l]} = filter size$

p [l] = p a d d i n g

$p^{[l]} = padding$

s [l] = s t r i d e

$s^{[l]} = stride$

i n p u t : n [l - 1] H, n [l - 1] W, n [l - 1] C

$input: n^{[l - 1]}_{H}, n^{[l - 1]}_{W}, n^{[l - 1]}_{C}$

o u t p u t : n [l] H, n [l] W, n [l] C

$output: n^{[l]}_{H}, n^{[l]}_{W}, n^{[l]}_{C}$

n [l] H / W = ⌊ n [ l - 1 ] H / W + 2 p [ l ] - f [ l ] s [ l ] + 1 ⌋

$n^{[l]}_{H/W} = \lfloor\frac{n^{[l - 1]}_{H/W} + 2p^{[l]} - f^{[l]}}{s^{[l]}} + 1\rfloor$

n [l] C : number of filters

$n^{[l]}_{C}: \text{number of filters}$

Each filter : f [l], f [l], n [l - 1] C

$\text{Each filter}: f^{[l]}, f^{[l]}, n^{[l - 1]}_{C}$

When you are using vectorized implementation version, or batch gradient descent, or mini-batch gradient descnt.

activation: A [l] = (m, n [l] H, n [l] W, n [l] C)

$\text{activation: }A^{[l]} = (m, n^{[l]}_{H}, n^{[l]}_{W}, n^{[l]}_{C})$

weight : W [l] = (f [l], f [l], n [l - 1] C, n [l] C)

$\text{weight}: W^{[l]} = (f^{[l]}, f^{[l]}, n^{[l - 1]}_{C}, n^{[l]}_{C})$

bias : b [l] = (1, 1, 1, n [l] C)

$\text{bias}: b^{[l]} = (1,1,1,n^{[l]}_{C})$

8 - a simple convolution network example

Go through a concrete example of a deep convolution neural network.

One thing to take away from this is that as you go deeper in the neural network, typically you start out with large images, and then the height and width will stay the same for while, and gradually trend down as you go deeper in neural network. Whereas the the number of channels will generally increase.

In the typical convNet, there are usually three type of layers:

convolution(CONV)
pooling(POOL)
fully connected(FC)

9 - pooling layer

9.1 - max pooling

ConvNet often also use pooling layer to reduce the size of representation to speed up the computation, as well as make some of the features it detect a bit more robust.

hyperparameters of the max pooling
f = 2
s = 2

What the max operator does is really that if some particular feature is detected anywhere in this filter, then keep the high number; but if the feature is not detected, then the max of the all the number from the region is still quite small.

One interesting property of max pooling is that it has a set of hyperparameters, but it has no parameters for gradient descent to learn.

The formula for figuring out the output size for a CONV layer also work for the POOL layer.

hyperparameters of the max pooling
f = 3
s = 1

max pooling computation is done independently on each of the $n^{[l]}_{c}$ channels,

9.2 - average pooling

hyperparameters of the max pooling
f = 2
s = 2

In these days, the max pooling is much more often used than average pooling.

To summarize:

for the pooling layer, the common choice of hyperparameter might be f = 2, s = 2, and this have the effect of shrinking the height and width of the representation by a factor of two.
no parameter to learn,
$(n_{H}, n_{W}, n_{C}) \longrightarrow (\lfloor \frac{n_{H} - f}{s} +1\rfloor, \lfloor\frac{n_{H} - f}{s} +1\rfloor, n_{C})$

10 - convolutional neural network example

It’s actually quiet similar to one of the classic networks call LeNet-5, which was created by Yann LeCun.

As go deeper, the width and height of the images will decrease, whereas the number of channels will increase.

Common pattern in neural network is :

CONV->POOL->CONV-POOL->FC->FC->FC->SOFTMAX

$\text{CONV->POOL->CONV-POOL->FC->FC->FC->SOFTMAX}$

Pool layer don’t have any parameters
Conv layer tend to have relatively few parameters
a lot of parameters tend to be in FC layer
activation size go down generally as go deeper

11 - why convolutions

Why convolutions are so useful when include them in neural network.

Two main advantages of convolutional layer over fully connected layers, the advantages are parameter sharing, and sparsity of connection.

If we are create a neural network with $32\times32\times3 = 3,072$ units in one layer, and $28\times28\times6 = 4704$ units in the next layer, and connected every one of this units, that is a CF layer, the parameters in the weights matrix would be $3072 \times 4704 = 14,000,000$ , So that’s a lot of parameters to train, and this is just a pretty small images. But the total number of parameters in the CONV layer is $(5\times5 + 1) \times 6 = 156$ .

The reason for CONV NET has relatively small parameter is two:

parameter sharing: a feature detector that is useful in one part of the image is probably useful in another part of the images. A feature detector computed for the upper left hand corner of the images, will be useful for the lower right hand corner of the image. Sp we don’t need to learn separate feature detector for different position of the image.
spariry of connections: in each layer, each output value depends only on a small number of input.

though these two mechanisms, a neural network has a lot few parameters, which allows it to be trained with smaller training set and prevent it from overfitting.

How to train this network?

define the cost function:

J (W, b) = 1 m \sum i = 1 m L (y^(i), y (i))

$J(W, b) = \frac1m\sum_{i = 1}^{m} \mathcal{L}(\hat{y}^{(i)}, y^{(i)})$

So to training the neural network, all we need to do is then use gradient descent, of some other algotithm, like Momentum, RMSProp or Adam, in order to optimize all the parameters in the neural network to try to reduce the cost.