如何简化卷积神经网络_卷积神经网络：简化-CSDN博客

如何简化卷积神经网络

If you are getting into Artificial Intelligence, chances are that you’ve heard of Convolutional Neural Networks (CNN) and were overwhelmed by it, or maybe you just want to know what they are.

如果您正在学习人工智能，那么很有可能您已经听说过卷积神经网络(CNN)，但不知所措，或者您只是想知道它们是什么。

In this article, I will try to explain CNN in layman’s terms.

在本文中，我将尝试以外行的方式解释CNN。

什么是CNN？ (What is a CNN?)

Let us take an analogy to understand Convolutional Neural Networks.

让我们做一个类比来理解卷积神经网络。

Giphy Giphy的拼图游戏

Imagine you have been given the pieces of a jigsaw puzzle and told to identify what it depicts, and there is no cover image to help you. Say you find a few pieces, that when put together, form eyes. By putting pieces together, you created a new piece that gave you some insight into what the image might be. This is the convolution.

想象一下，您已经得到了拼图游戏的碎片，并被告知要识别它所描绘的内容，而没有封面图像可以帮助您。假设您发现了几块，放在一起就会形成眼睛。通过将各个部分放在一起，您可以创建一个新的部分，从而使您对图像可能是什么有所了解。这就是卷积。

Convolution is a process of extracting features from the input by altering its shape.

卷积是通过更改其形状从输入中提取特征的过程。

But only the identification of one entity (eyes) is not useful, it still could be an image of anything ranging from a dog to a lion. So you keep putting pieces together until you arrive at some conclusion. Similarly, one convolution is not enough, a series of convolutions are necessary to arrive at some conclusion.

但是，仅识别一个实体(眼睛)是没有用的，它仍然可以是从狗到狮子的任何事物的图像。因此，您将各个部分放在一起直到得出结论。同样，一个卷积是不够的，必须进行一系列的卷积才能得出一些结论。

Convolutional Neural Network is a series of successive convolutions on the input to obtain a prediction.

卷积神经网络是输入上的一系列连续卷积以获得预测。

As you keep solving the puzzle, the number of parts goes on reducing until you get one whole picture. Likewise, in CNNs the size of input goes on reducing at each convolution, till you get one final prediction. (More on this up ahead).

随着您不断解决难题，零件的数量会不断减少，直到获得一张完整的照片为止。同样，在CNN中，每次卷积时输入的大小都会继续减小，直到得到一个最终的预测。 (有关更多信息，请参见前面的内容)。

CNN在哪里使用？ (Where are CNNs used?)

The most common use of CNNs is for image classification like identifying whether an image consists of a cat or a dog, classifying handwritten digits, and many more

CNN的最常见用途是图像分类，例如识别图像是由猫还是狗组成，对手写数字进行分类等等。

CNNs can also be used for other complex tasks like signal processing and image segmentation.

CNN还可以用于其他复杂任务，例如信号处理和图像分割。

CNN的工作原理如何？ (How exactly do CNNs work?)

It’s a technical topic, but I’ll try to explain it in simple terms.

这是一个技术主题，但是我将尝试用简单的术语进行解释。

内核： (Kernels:)

An image is a set of pixels, and to interpret what they depict as a whole, CNNs use something called Kernels. (Also called filters or feature detectors).

图像是一组像素，为了解释它们整体上的描绘，CNN使用一种称为Kernels的东西。 (也称为过滤器或特征检测器)。

Kernels can be termed as the intelligence of CNNs.

内核可以称为CNN的智能。

Let’s correlate reading a paragraph with kernels. Just as we read a line from left to right, we extract meaning from the text. Similarly, the kernel moves left to right over an image, while extracting features. Also, while reading, we interpret one word at a time and not individual letters, likewise kernel extracts features from a set of pixels at a time. The number of pixels considered at once is determined by the size of the kernel. (A square kernel of size 3*3 is most commonly used).

让我们将阅读段落与内核相关联。正如我们从左到右阅读一行一样，我们从文本中提取含义。同样，内核在提取特征时在图像上从左到右移动。同样，在阅读时，我们一次解释一个单词，而不是单个字母，同样，内核一次一次从一组像素中提取特征。一次考虑的像素数量取决于内核的大小。 (最常用的是大小为3 * 3的方形内核)。

Each pixel in an image is associated with a value in the range 0 to 255, determining its intensity (Or 3 values per pixel for colored images, one for each of red, green, and blue channels). Similarly, kernels also have a value associated with each of its blocks.

图像中的每个像素都与一个介于0到255之间的值相关联，以确定其强度(或彩色图像每个像素3个值，红色，绿色和蓝色通道每个像素一个值)。同样，内核也具有与其每个块关联的值。

内核从哪里获得价值？ (Where do kernels get their values?)

As I mentioned before, kernels are the intelligence that extracts features. The values may be initialized randomly or by some predefined function, but they change over time when the network is being trained. The actual intelligence part of our network is these values.

如前所述，内核是提取功能的智能。这些值可以随机初始化，也可以通过一些预定义的函数进行初始化，但是当训练网络时，它们会随时间变化。这些网络是我们网络真正的智能部分。

Now you might be wondering, “Hey, that’s all good, but how does the kernel extract features?”

现在您可能想知道，“嘿，这一切都很好，但是内核如何提取功能？ ”

It is as simple as multiplication and addition!

它就像乘法和加法一样简单！

Features are extracted by multiplying corresponding blocks as shown and adding them up. For example value at position [1, 1] in the image is multiplied by the value at position [1, 1] in the kernel. (Typically, this is followed by an activation layer to remove non-linearity, but it’s a vast topic so I won’t include that in this article)

如图所示，通过乘以相应的块并将其相加来提取特征。例如，将图像中位置[1，1]的值乘以内核中位置[1，1]的值。 (通常，在其后是一个激活层，以消除非线性，但这是一个广泛的话题，因此我将不在本文中介绍)

In the following example, in the first step… The resultant feature is given as (7*1) + (2*0) + (3*-1) + …. + (2*-1) = 6

在下面的示例中，第一步…得到的结果特征为(7 * 1)+(2 * 0)+(3 * -1)+…。 +(2 * -1)= 6

大步走： (Strides:)

Images contain thousands of pixels, and processing all of them is computationally expensive as well as time-consuming. So we reduce the number of features by altering the Stride parameter.

图像包含数千个像素，处理所有这些像素在计算上既昂贵又费时。因此，我们通过更改Stride参数来减少要素数量。

Stride defines the number of pixels by which the kernel should move in either direction.

步幅定义了内核应沿任一方向移动的像素数。

In the illustrations you see below, the kernel is moving to the immediate next set of pixels, in this case, the stride is [1, 1]. This tells the kernel to move 1 pixel to the right while moving along the row and 1 pixel down while moving along the column.

在下面的插图中，内核正在移至下一组像素，在这种情况下，跨度为[1，1]。这告诉内核在沿着行移动时向右移动1个像素，而在沿着列移动时向下移动1个像素。

Now, to reduce the size, if we change the stride to say [2, 2], the kernel will skip 1 pixel while moving in either direction.

现在，为了减小大小，如果将步幅更改为[2，2]，则内核在向任一方向移动时将跳过1个像素。

Stride is an important parameter in CNNs because it helps to reduce the size drastically and thus reduce the number of computations. In the above example, we reduce the number of output features from 9 to just 4 by changing the stride parameter.

步幅是CNN中的重要参数，因为它有助于大幅减小尺寸，从而减少计算量。在上面的示例中，我们通过更改stride参数将输出要素的数量从9个减少到4个。

池化： (Pooling:)

Video by Kensuke Koike from YouTube

YouTube上的 Kenike Keike 视频

This video accurately demonstrates that we don’t necessarily lose information when we reduce the size of the image, as we can identify that the picture depicts a dog even after the image was reduced in size.

该视频准确地表明了减小图像尺寸时我们不一定会丢失信息，因为即使在图像尺寸减小后，我们也可以识别出该图像描绘的是一只狗。

So if even 1/4th part of the actual image is sufficient to deduce what it depicts, we don’t need to process the complete image, even in CNNs. This is where pooling comes into picture.

因此，即使实际图像的四分之一都足以推断出它所描绘的内容，即使在CNN中，我们也不需要处理完整的图像。这就是池化的体现 。

Pooling is used to reduce the size of the feature map.

池化用于减小要素图的大小。

Pooling directly reduces the size of the image (feature map) by moving a filter over the features and taking only one value from a block of features. That one value depends on the type of pooling. It may be the maximum value from the block (Max Pooling) or the sum of values (Sum Pooling) or the average of all values (Mean Pooling). Max pooling is the one that’s used most commonly.

通过在特征上移动滤镜并从一组特征中仅获取一个值，合并可直接减小图像(特征图)的大小。一个值取决于池的类型。它可以是块中的最大值(最大池)或值的总和(总池)或所有值的平均值(均值池)。最大池化是最常用的池。

展平和预测： (Flattening And Prediction:)

After a series of convolutions and pooling, we obtain the features and now we need our network to predict something from these features. It’s as if the network now sees eyes, nose, and whiskers, and from this it has to predict that the image is of a cat. To do this, we need to first flatten the matrix.

经过一系列的卷积和合并后，我们获得了特征，现在我们需要网络来根据这些特征进行预测。好像网络现在看到了眼睛，鼻子和胡须，并因此必须预测该图像是猫的。为此，我们需要先展平矩阵。

After flattening, we predict our class from the features by using an Artificial Neural Network (ANN).

展平后，我们通过使用人工神经网络(ANN)从特征中预测课程。

The main use of convolution is to extract features, and they perform well at it. ANN converts the features into more attributes which will intern give better predictions, hence ANNs are used after convolution. I will explain ANNs in detail in my next article.

卷积的主要用途是提取特征，并且在此方面表现良好。人工神经网络将特征转换为更多的属性，这些属性将提供更好的预测，因此在卷积后使用人工神经网络。我将在下一篇文章中详细解释ANN。

Thank you for reading, any feedback/suggestions are appreciated.

感谢您的阅读，感谢您提供任何反馈/建议。