CNN (Convolutional Neural Network)



在这里插入图片描述 在这里插入图片描述
拓扑图 模型图

Left: A regular 3-layer Neural Network. Right: A ConvNet arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a ConvNet transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).

A ConvNet is made up of Layers. Every Layer has a simple API:It transforms an input 3D volume to an output 3D volume with some differentiable function that may or may not have parameters.

一个简单的卷积神经网络(ConvNet)就是有一组层(a sequence of layers)构成的。通常用以下三种层来构建神经网络:卷积层(Convolutional Layer),池化层(Pooling Layer),全连接层(Fully-Connected Layer)

  • 一个CIFAR-10 分类神经网络例子Example Architecture: Overview*. We will go into more details below, but a simple ConvNet for CIFAR-10 classification could have the architecture [INPUT - CONV - RELU - POOL - FC]. In more detail:
    • INPUT [32x32x3] will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.

    • CONV layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as [32x32x12] if we decided to use 12 filters.

    • RELU layer will apply an elementwise activation function, such as the $ max(0,x) $ thresholding at zero. This leaves the size of the volume unchanged ([32x32x12]).

    • POOL layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as [16x16x12].

    • FC (i.e. fully-connected) layer will compute the class scores, resulting in volume of size [1x1x10], where each of the 10 numbers correspond to a class score, such as among the 10 categories of CIFAR-10. As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume.

In summary

  • 一个卷积网络就是有一组 可以将一个volume转化为另一个volume的 层组成的
  • 目前流行的层有 CONV/ POOL/ RLU/ FC
  • 各个层通过不同的函数来实现将一个volume转化为另一个volume的
  • 有的层有参数(parameter),如 CONV/ FC;有的层没有参数,如 RELU/ POOL
  • 有的层有超参数(hyperparameter),如 CONV/ FC/ POOL;有的层没有,如 RELU


Convolutional Layer 卷积层


filter on a layer

  • filter 也就是卷积块,典型的如 [ 5 ∗ 5 ∗ 3 ] [5*5*3] [553]大小。

  • receptive field 感受域,也就是filter的size,宽和高的积,如上个例子中的 [ 5 ∗ 5 ] ​ [5*5]​ [55]

  • Local Connectivity 局部连通性。

  • 两个例子阐述计算 connection 或者 weight 的数量

    Example 1. For example, suppose that the input volume has size [32x32x3], (e.g. an RGB CIFAR-10 image). If the receptive field (or the filter size) is 5x5, then each neuron in the Conv Layer will have weights to a [5x5x3] region in the input volume, for a total of 553 = 75 weights (and +1 bias parameter). Notice that the extent of the connectivity along the depth axis must be 3, since this is the depth of the input volume.

    Example 2. Suppose an input volume had size [16x16x20]. Then using an example receptive field size of 3x3, every neuron in the Conv Layer would now have a total of 3320 = 180 connections to the input volume. Notice that, again, the connectivity is local in space (e.g. 3x3), but full along the input depth (20).

Spatial arrangement

Three hyperparameters control the size of the output volume: the depth, stride and zero-padding

  • depth:输出的深度
  • stride:步长 通常取1或者2
  • zero-padding:零填充

计算size of output volume

  • W W W size of input

  • F F F size of filter

  • P P P size of padding

  • S S S stride

size of output volume is ( W + 2 ∗ P − F ) / S + 1 (W + 2*P - F)/S+1 (W+2PF)/S+1

In general, setting zero padding to be P = ( F − 1 ) / 2 P=(F−1)/2 P=(F1)/2 when the stride is S = 1 S=1

  • 0
  • 0
    觉得还不错? 一键收藏
  • 0


  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助




当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


