CS231n-Lecture Note-04-Image Classification with CNN

In the previous section, we mostly talked about Neural Networks. CNN is similar to that. Both are made up of neurons with learnable weights and biases. Each neuron receives some inputs and outputs with non-linearity. Still has loss function.

The difference is the input. CNN inputs images into neurons.

Architecture Overview

In the previous chapters we discussed, the input is a vector. Then transform it with hidden layers and output by "output layer".

In this diagram, the input size is 1x3072, which is stretching a 32x32x3 image into a vector.

In the convolution layer, the input is a specific image with a 32x32x3 size. Not like using weights, CNN uses a filter, which has the same channel size as the input image.

Layers used to build ConvNets

Let's set filter as w, then it's same with the forward with neural network.
​​​​​​Filter will convolve over all spatial locations from left to right and top to bottom.

The number of filters determines the size of the activation map (feature map).

Here, we use six filters, then output 6 activation maps (feature map).

If we input two batches of 3x32x32 images, then we will get two batches of outputs.

Add bias into convolution layer

The size of input we can set as N\times C_{in}\times H\times W. N means the number of batches. C_{in} is the input image's channel. H is height, and W is width.

Here is the detail of the parameters.

The input batch size is the same with outputs, and the input channel is the same with filter and bias's channel.

This is only one convolution layer. The ConvNet is a sequence of convolution layers with activation functions.

After using these filters, let's look at what convolutional filters learn.

Here is an input image; it's a corner of a car light. After a 32x5x5 filter, we get 32 activation maps (feature maps). We can see the car light edge in some images. These 32 images shows these features in different ways.

The fully convolutional neural network architecture is shown below:

The pooling layer is something we haven't talked about before. Which is adding a size of pixel to the image edge. The first column is the feature of the image, and the last column is the class of top 5.

Convolutional Layer

Let's talk more deeply about this convolutional layer.

Stride

How does the filter work?

In the previous, we talked about filter convolve over all spatial locations from left to right and top to bottom. We haven't talked about it more clearly.

The hyperparameter stride, which we will talk about in the next section.

Stride determines how the filter moves and the distance of the movement.

For example, the input size is 7x7 and the filter is 3x3 with stride equal to 1.

The process of this filter would be like this:

src: Google

With a 3x3 filter, we will get a 4x4 feature map.

Each pixel has a value, so with the value, we need to know how it is calculated. 

In the 5x5 inputs with a 3x3 filter, the stride equals 1. We are using matrix multiplication to calculate the output. The output size is:

                        ​​​​​​​        ​​​​​​​        ​​​​​​​        (N - F) / stride + 1\\ N-input \ width \\ F-filter\ width

Padding:

Another hyperparameter used in the conv layer is padding. Which is adding the specified size to the edge.

The padding sets to 1 would be like:

With zero padding the input image would be like:

As we can see, the output size after with padding is not 3x3, it's 5x5.

The output after with padding and stride is:

        ​​​​​​​        ​​​​​​​        ​​​​​​​        ​​​​​​​        ​​​​​​​        (N + 2P - F) / stride + 1

When we are using this formula to calculate the output, remmber to use down-sampling.

This article just breify talk about the CNN structure.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值