CS231n-Lecture Note-04-Image Classification with CNN

itspollyyy

已于 2023-11-15 10:18:21 修改

阅读量39

点赞数

分类专栏：深度学习 CV 文章标签： cnn 人工智能神经网络

于 2023-11-15 10:17:30 首次发布

本文链接：https://blog.csdn.net/weixin_43399179/article/details/134392568

版权

深度学习同时被 2 个专栏收录

9 篇文章 0 订阅

订阅专栏

7 篇文章 0 订阅

订阅专栏

In the previous section, we mostly talked about Neural Networks. CNN is similar to that. Both are made up of neurons with learnable weights and biases. Each neuron receives some inputs and outputs with non-linearity. Still has loss function.

The difference is the input. CNN inputs images into neurons.

Architecture Overview

In the previous chapters we discussed, the input is a vector. Then transform it with hidden layers and output by "output layer".

In this diagram, the input size is 1x3072, which is stretching a 32x32x3 image into a vector.

In the convolution layer, the input is a specific image with a 32x32x3 size. Not like using weights, CNN uses a filter, which has the same channel size as the input image.

Layers used to build ConvNets

Let's set filter as w, then it's same with the forward with neural network.
Filter will convolve over all spatial locations from left to right and top to bottom.

The number of filters determines the size of the activation map (feature map).

Here, we use six filters, then output 6 activation maps (feature map).

If we input two batches of 3x32x32 images, then we will get two batches of outputs.

The size of input we can set as $N\times C_{in}\times H\times W$ . N means the number of batches. $C_{in}$ is the input image's channel. H is height, and W is width.

Here is the detail of the parameters.

The input batch size is the same with outputs, and the input channel is the same with filter and bias's channel.

This is only one convolution layer. The ConvNet is a sequence of convolution layers with activation functions.

After using these filters, let's look at what convolutional filters learn.

Here is an input image; it's a corner of a car light. After a 32x5x5 filter, we get 32 activation maps (feature maps). We can see the car light edge in some images. These 32 images shows these features in different ways.

The fully convolutional neural network architecture is shown below:

The pooling layer is something we haven't talked about before. Which is adding a size of pixel to the image edge. The first column is the feature of the image, and the last column is the class of top 5.

Convolutional Layer

Let's talk more deeply about this convolutional layer.

Stride

How does the filter work?

In the previous, we talked about filter convolve over all spatial locations from left to right and top to bottom. We haven't talked about it more clearly.

The hyperparameter stride, which we will talk about in the next section.

Stride determines how the filter moves and the distance of the movement.

For example, the input size is 7x7 and the filter is 3x3 with stride equal to 1.

The process of this filter would be like this:

With a 3x3 filter, we will get a 4x4 feature map.

Each pixel has a value, so with the value, we need to know how it is calculated.

In the 5x5 inputs with a 3x3 filter, the stride equals 1. We are using matrix multiplication to calculate the output. The output size is:

$(N - F) / stride + 1\\ N-input \ width \\ F-filter\ width$

Padding:

Another hyperparameter used in the conv layer is padding. Which is adding the specified size to the edge.

The padding sets to 1 would be like:

With zero padding the input image would be like:

As we can see, the output size after with padding is not 3x3, it's 5x5.

The output after with padding and stride is:

$(N + 2P - F) / stride + 1$

When we are using this formula to calculate the output, remmber to use down-sampling.

This article just breify talk about the CNN structure.

itspollyyy

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
CS231n-Lecture Note-04-Image Classification with CNN

In the previous section, we mostly talked about Neural Networks. CNN is similar to that. Both are made up of neurons with learnable weights and biases. Each neuron receives some inputs and outputs with non-linearity. Still has loss function.The difference
复制链接

扫一扫

专栏目录