卷积神经网络结构_卷积神经网络

本文深入探讨了卷积神经网络(CNN)的结构,包括其在深度学习中的重要性,以及如何使用TensorFlow等工具进行实现。通过翻译自的数据科学文章,读者将能够理解CNN的核心组件和工作原理。
摘要由CSDN通过智能技术生成

卷积神经网络结构

CNN’s are a special type of ANN which accepts images as inputs. Below is the representation of a basic neuron of an ANN which takes as input X vector. The values in the X vector is then multiplied by corresponding weights to form a linear combination. To thus, a non-linearity function or an activation function is imposed so as to get the final output.

CNN是一种特殊类型的ANN,它接受图像作为输入。 以下是ANN的基本神经元的表示形式,它作为输入X向量。 然后,将X向量中的值乘以相应的权重以形成线性组合。 因此,施加非线性函数或激活函数以获得最终输出。

为什么选择CNN? (Why CNN?)

Talking about grayscale images, they have pixel ranges from 0 to 255 i.e. 8-bit pixel values. If the size of the image is NxM, then the size of the input vector will be N*M. For RGB images, it would be N*M*3. Consider an RGB image with size 30x30. This would require 2700 neurons. An RGB image of size 256x256 would require over 100000 neurons. ANN takes a vector of inputs and gives a product as a vector from another hidden layer that is fully connected to the input. The number of weights, parameters for 224x224x3 is very high. A single neuron in the output layer will have 224x224x3 weights coming into it. This would require more computation, memory, and data. CNN exploits the structure of images leading to a sparse connection between input and output neurons. Each layer performs convolution on CNN. CNN takes input as an image volume for the RGB image. Basically, an image is taken as an input and we apply kernel/filter on the image to get the output. CNN also enables parameter sharing between the output neurons which means that a feature detector (for example horizontal edge detector) that’s useful in one part of the image is probably useful in another part of the image.

谈到灰度图像,它们的像素范围是0到255,即8位像素值。 如果图像的大小为NxM,则输入向量的大小将为N * M。 对于RGB图像,它将为N * M * 3。 考虑大小为30x30的RGB图像。 这将需要2700个神经元。 大小为256x256的RGB图像将需要超过100000个神经元。 ANN接受输入向量,并从另一个完全连接到输入的隐藏层中将向量作为乘积。 权数为224x224x3的参数非常高。 输出层中的单个神经元将具有224x224x3的权重。 这将需要更多的计算,内存和数据。 CNN利用图像的结构导致输入和输出神经元之间的稀疏连接。 每层在CNN上执行卷积。 CNN将输入作为RGB图像的图像量。 基本上,将图像作为输入,我们在图像上应用内核/过滤器以获取输出。 CNN还可以在输出神经元之间共享参数,这意味着在图像的一部分中有用的特征检测器(例如水平边缘检测器)可能在图像的另一部分中有用。

卷积 (Convolutions)

Every output neuron is connected to a small neighborhood in the input through a weight matrix also referred to as a kernel or a weight matrix. We can define multiple kernels for every convolution layer each giving rise to an output. Each filter is moved around the input image giving rise to a 2nd output. The outputs corresponding to each filter are stacked giving rise to an output volume.

每个输出神经元通过权重矩阵(也称为内核或权重矩阵)连接到输入中的小邻域。 我们可以为每个卷积层定义多个内核,每个内核都会产生输出。 每个滤镜在输入图像周围移动,产生第二个输出。 堆叠与每个滤波器对应的输出,以增加输出量。

Image for post
Convolution operation, Image by indoml
卷积运算,图片由indoml

Here the matrix values are multiplied with corresponding values of kernel filter and then summation operation is performed to get the final output. The kernel filter slides over the input matrix in order to get the output vector. If the input matrix has dimensions of Nx and Ny, and the kernel matrix has dimensions of Fx and Fy, then the final output will have a dimension of Nx-Fx+1 and Ny-Fy+1. In CNN’s, weights represent a kernel filter. K kernel maps will provide k kernel features.

这里,矩阵值与内核滤波器的相应值相乘,然后执行求和运算以获得最终输出。 内核滤波器在输入矩阵上滑动以获取输出向量。 如果输入矩阵的尺寸为Nx和Ny,而内核矩阵的尺寸为Fx和Fy,则最终输出的尺寸将为Nx-Fx + 1和Ny-Fy + 1。 在CNN中,权重代表内核过滤器。 K个内核映射将提供k个内核功能。

填充 (Padding)

Padded convolution is used when preserving the dimension of an input matrix that is important to us and it helps us keep more of the information at the border of an image. We have seen that convolution reduces the size of the feature map. To retain the dimension of feature map as that of an input map, we pad or append the rows and column with zeros.

当保留对我们很重要的输入矩阵的维时,将使用填充卷积,这有助于我们将更多信息保留在图像的边界。 我们已经看到卷积减小了特征图的大小。 为了将要素图的尺寸保留为输入图的尺寸,我们将行和列填充或附加零。

Image for post
Padding, Image by author
填充,作者提供的图片

In the above figure, with padding of 1, we were able to preserve the dimension of a 3x3 input. The size pf the output feature map is of dimension N-F+2P+1. Where N is the size of the input map, F is the size of the kernel matrix and P is the value of padding. For preserving the dimensions, N-F+2P+1 should be equal to N. Therefore,

在上图中,填充为1时,我们能够保留3x3输入的尺寸。 输出特征图的大小pf为维度N-F + 2P + 1。 其中N是输入映射的大小,F是内核矩阵的大小,P是填充的值。 为了保留尺寸,N-F + 2P + 1应该等于N。因此,

Image for post
Condition for retaining dimensions, Image by author
保留尺寸的条件,作者提供的图片

大步走 (Stride)

Stride refers to the number of pixels the kernel filter will skip i.e pixels/time. A Stride of 2 means the kernel will skip 2 pixels before performing the convolution operation.

步幅是指内核过滤器将跳过的像素数,即像素/时间。 跨度为2表示内核在执行卷积运算之前将跳过2个像素。

Image for post
Stride demonstration, Image by indoml
大步示范,图片来自indoml

In the figure above, the kernel filter is sliding over the input matrix by skipping one pixel at a time. A Stride of 2 would perform this skipping action twice before performing the convolution like in the image below.

在上图中,内核过滤器通过一次跳过一个像素在输入矩阵上滑动。 步幅为2会在执行卷积之前执行两次此跳过动作,如下图所示。

Image for post
Stride demonstration, Image by indoml
大步示范,图片来自indoml

An observation to make here is that the output feature map is reduced(4 times) when the stride is increased from 1 to 2. The dimension of the output feature map is (N-F+2P)/S + 1.

这里要观察到的是,将步幅从1增加到2时,输出特征图将减少(4倍)。输出特征图的尺寸为(N-F + 2P)/ S + 1。

汇集 (Pooling)

Pooling provides translational invariance by subsampling: reduces the size of the feature maps. The two commonly used Pooling techniques are max pooling and average pooling.

合并通过子采样提供平移不变性:减小要素图的大小。 两种常用的池化技术是最大池化和平均池化。

Image for post
Max pooling operation, Image by indoml
最大池操作,图片由indoml提供

In the above operation, the pooling operation divides 4x4 matrix into 4 2x2 matrices and picks the value which is the greatest amongst the four(for max-pooling) and the average of the four( for average pooling). This reduces the size of the feature maps which therefore reduces the number of parameters without missing important information. One thing to note here is that the pooling operation reduces the Nx and Ny values of the input feature map but does not reduce the value of Nc (number of channels). Also, the hyperparameters involved in pooling operation are the filter dimension, stride, and type of pooling(max or avg). There is no parameter for gradient descent to learn.

在上述操作中,合并操作将4x4矩阵划分为4个2x2矩阵,并选择四个值(对于最大池化)和四个平均值(对于平均池化)中最大的值。 这减小了特征图的大小,因此减少了参数的数量而不会丢失重要信息。 这里要注意的一件事是,池化操作会减少输入要素图的Nx和Ny值,但不会减少Nc(通道数)的值。 同样,合并操作中涉及的超参数是过滤器的尺寸,步幅和合并类型(最大或平均)。 没有参数可供梯度下降学习。

输出特征图 (Output Feature Map)

The size of the output feature map or volume depends on:

输出要素图的大小或体积取决于:

  1. Size of the input feature map

    输入要素图的大小
  2. Kernel size(Kw,Kh)

    仁尺寸(Kw,Kh)
  3. Zero padding

    零填充
  4. Stride(Sw, Sh)

    步幅(Sw,Sh)

天真卷积 (Naive Convolution)

These are the building blocks of convolutional neural network and depend on the above parameters. The dimension of the output feature map can be formulated as:

这些是卷积神经网络的基础,并取决于上述参数。 输出特征图的维可以表示为:

Image for post
The dimension of o/p feature map, Image by author
o / p特征图的尺寸,作者提供

膨胀卷积 (Dilated Convolution)

This has an additional parameter known as the dilation rate. This technique is used to increase the receptive field in convolution. This convolution is also known as an atrous convolution. A 3x3 convolution with dilation rate of 2 visualizes the same area as a naive 5x5 convolution, whilst having only 9 parameters. It can deliver a broader field of view at the same computational cost. They should be used only if a wide field of view is needed and when one cannot afford multiple convolutions or larger kernels. The image below depicts the receptive coverage of a dilated convolution.

这具有称为膨胀率的附加参数。 该技术用于增加卷积中的接收场。 这种卷积也称为原子卷积。 膨胀率为2的3x3卷积与朴素的5x5卷积可视化相同的区域,但只有9个参数。 它可以以相同的计算成本提供更广阔的视野。 仅在需要广阔视野且无法承受多次卷积或更大内核时,才应使用它们。 下图描绘了膨胀卷积的接受范围。

Image for post
Paul-Louis Pröve Paul-LouisPröve

转置卷积 (Transposed Convolution)

Used with an aim to increase the size of the output feature map. It is used in encoder-decoder networks to increase the spatial dimensions. The input image is appropriately padded before the convolution operation.

旨在增加输出要素图的大小。 它用于编码器-解码器网络以增加空间尺寸。 在卷积操作之前适当地填充输入图像。

Image for post
Divyanshu Mishra Divyanshu Mishra摄

结束 (The End)

Thank you and stay tuned for more blogs on AI.

谢谢,请继续关注更多有关AI的博客。

翻译自: https://towardsdatascience.com/convolutional-neural-networks-f62dd896a856

卷积神经网络结构

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值