Convolutional Neural Networks--CNN 1

1. Convolutional Neural Networks

Each node connects to another and has an associated weight and threshold. If the output of any individual node is above the specified threshold value, that node is activated, sending data to the next layer of the network. Otherwise, no data is passed along to the next layer of the network.

While we primarily focused on feed-forward networks here, there are various types of neural nets. For example, recurrent neural networks are commonly used for natural language processing, whereas convolutional neural networks (ConvNets or CNNs) are more often utilized for classification and computer vision tasks.

However, convolutional neural networks now provide a more scalable approach to image classification and object recognition tasks, leveraging(借力 利用) principles from linear algebra specifically matrix multiplication, to identify patterns within an image. That said, they can be computationally(从计算方面上讲) demanding(要求高的,费力的), requiring graphical processing units (GPUs) to train models.


2. Convolutional Neural Networks

Convolutional neural networks have three main types of layers, which are:

  • Convolutional layer
  • Pooling layer
  • Fully-connected (FC) layer

2.1. Convolutional Layer

The convolutional layer is the first layer of a convolutional network. While convolutional layers can be followed by additional convolutional layers or pooling layers, the fully-connected layer is the final layer.

As we move forward, the CNN identifying greater portions of the image. Earlier layers focus on simple features, such as colors and edges. As the image data progresses through the layers of the CNN, it starts to recognize larger elements or shapes of the object until it finally identifies the intended object.

The convolutional layer requires a few components, which are input data, a feature detector, and a feature map. Let’s assume that the input will be a color image, which is made up of a matrix of pixels in 3D. This means that the input will have three dimensions—a height, width, and depth—which correspond to RGB in an image.

2.1.1. Feature Detector

Feature detector, also known as a kernel or a filter, which will move across the receptive field. of the image, checking if the feature is present(such as colors and edges). This process is known as a convolution.

The feature detector is a two-dimensional (2-D) array of weights, which represents part of the image.

While they can vary in size, the filter(feature detector) size is typically a 3×3 matrix; This also determines the size of the receptive field. The filter is then applied to an area of the image, and a dot product is calculated between the input pixels and the filter. This dot product is then fed into an output array. Afterwards, the filter shifts by a stride, repeating the process until the filter has swept across the entire image. The final output from the series of dot products(点积) from the input and the filter is known as a feature map, activation map.

请添加图片描述
Note that the weights in the feature detector remain fixed as it moves across the image, which is also known as parameter sharing.

Some parameters, like the weight values, adjust during training through the process of back propagation and gradient descent. However, there are three hyperparameters which affect the volume size of the output that need to be set before the training of the neural network begins. These include:

  1. The number of filters alters the depth of the output. For example, three distinct filters would yield(产生) three different feature maps, creating a depth of three.
  2. Stride is the distance, or number of pixels, that the filters moves over the input matrix. While stride values of two or greater are rare, a larger stride yields a smaller output.
  3. Zero-padding is usually used when the filters do not fit the input image. This sets all elements that fall outside the input matrix to zero, producing a larger or equally sized output.

There are three types of padding:

  • Valid padding: This is also known as no padding. In this case, the last (最后的) convolution is dropped if dimensions do not align.
  • Same padding: This padding ensures that the output layer has the same size as the input layer
  • Full padding: This type of padding increases the size of the output by adding zeros to the border of the input.

After each convolution operation, a CNN applies a Rectified Linear Unit (ReLU) transformation to the feature map, introducing nonlinearity to the model. ???

2.2. Pooling Layer

Pooling layers, also known as down sampling, conducts dimensionality reduction, reducing the number of parameters in the input.

Similar to the convolutional layer, the pooling operation sweeps a filter across the entire input, but the difference is that this filter does not have any weights. Instead, the kernel applies an aggregation(聚合函数) function to the values within the receptive field(比如这个 filter 是为了选出2×2矩阵里面的最大值,也就是4个数变成了一个). There are two main types of pooling:

  • Max pooling: As the filter moves across the input, it selects the pixel with the maximum value to send to the output array. As an aside(说句题外话), this approach tends to be used more often compared to average pooling.
  • Average pooling: As the filter moves across the input, it calculates the average value within the receptive field to send to the output array.
    Max pooling
    请添加图片描述

2.3. Fully-Connected Layer

The name of the full-connected layer aptly(恰当地) describes itself. As mentioned earlier, the pixel values of the input image are not directly connected to the output layer in partially connected layers. However, in the fully-connected layer, each node in the output layer connects directly to a node in the previous layer.

As an example, let’s assume that we’re trying to determine if an image contains a bicycle. You can think of the bicycle as a sum of parts. It consists of a frame(框架), handlebars, wheels, pedals, et cetera. Each individual part of the bicycle makes up a lower-level pattern in the neural net, and the combination of its parts represents a higher-level pattern, creating a feature hierarchy within the CNN.

请添加图片描述
Convolution and pooling layers extract features from the image. So this layer doing some “preprocessing” of data. Fully connected layers perform classification based on this extracted features. 说的就是这个意思,看下图
请添加图片描述

A great video can help you understand CNN, but before to watch it, you have to read this blog.

https://youtu.be/aircAruvnKk

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值