论文笔记：Very Deep Convolutional Networks for Large-Scale Image Recognition

最新推荐文章于 2021-11-17 13:50:24 发布

ReWz

最新推荐文章于 2021-11-17 13:50:24 发布

阅读量386

点赞数

分类专栏：论文学习文章标签：卷积网络神经网络人工智能深度学习

本文链接：https://blog.csdn.net/qq_43409114/article/details/106440459

版权

论文学习专栏收录该内容

9 篇文章 1 订阅

订阅专栏

前言

论文：Very Deep Convolutional Networks for Large-Scale Image Recognition

一、INTRODUCTION

作者在这篇论文中提出了如何解决卷积网络深度的问题。作者先是固定好模型的其他部分参数，然后通过增加一些卷积层来逐渐增大网络的深度，而且作者的卷积层使用的是非常小的卷积核

In this paper, we address another important aspect of ConvNet architecture design – its depth. To this end, we fix other parameters of the architecture, and steadily increase the depth of the network by adding more convolutional layers, which is feasible due to the use of very small (3 × 3) convolution filters in all layer

模型最终的performance是非常好的，不仅在分类任务中表现达到了state-of-the-art，在location任务中效果也是非常好的

As a result, we come up with significantly more accurate ConvNet architectures, which not only achieve the state-of-the-art accuracy on ILSVRC classification and localisation tasks, but are also applicable to other image recognition datasets, where they achieve excellent performance even when used as a part of a relatively simple pipelines

二、CONVNET CONFIGURATIONS

在训练过程中，模型的输入被固定为224x224的三通道

During training, the input to our ConvNets is a fixed-size 224 × 224 RGB image

作者使用了非常小的3x3的卷积核，这也是其模型特色之一

we use filters with a very small receptive field: 3 × 3

作者还使用了1x1的卷积核，1x1的卷积核可以把它看作是一个线性变换

In one of the configurations we also utilise 1 × 1 convolution filters, which can be seen as a linear transformation of the input channels

如果某个卷积层的卷积核为3x3，那么其padding将其设置为1

the padding is 1 pixel for 3 × 3 conv. layers

池化层使用的是2x2的pool切步长为2

Max-pooling is performed over a 2 × 2 pixel window, with stride 2

卷积层之后有三个全连接层，前两层隐藏层单元数为4096个，第三次隐藏单元数为4096

A stack of convolutional layers (which has a different depth in different architectures) is followed by three Fully-Connected (FC) layers: the first two have 4096 channels each, the third performs 1000- way ILSVRC classification and thus contains 1000 channels (one for each class

该网络使用的激活函数均为relu

All hidden layers are equipped with the rectification

三、CONFIGURATIONS

这篇论文设计了（A-E）个模型，这些模型基本上参数都是一致的，不一样的就是深度了，模型A有11层（8个卷积3个全连接）到模型E有19层（16个卷积层3个全连接层），依次递增，如下
作者提出模型的显著特点就是卷积核特别小，但也正是因为卷积核变小了，才使得网络可以变得更深。因为2层3x3的卷积层的感受野实际上就相当于一层5x5的卷积核的感受野。3层3x3的卷积层感受野相当于1层7x7卷积核的感受野
为什么要使用三个3x3的卷积层而不直接使用1个7x7的卷积核的呢？第一个原因就是将一层non-linear分解成三层non-linear，可以增强网络的表达能力。第二个原因就是这样做可以减少网络的参数，如果使用前者替换后者，可以减少大概19%的参数。

First, we incorporate three non-linear rectification layers instead of a single one, which makes the decision function more discriminative. Second, we decrease the number of parameters: assuming that both the input and the output of a three-layer 3 × 3 convolution stack has C channels, the stack is parametrised by 3(3^2)C2= 27C^2 weights; at the same time, a single 7 × 7 conv. layer would require 7^2C2= 49C^2parameters, i.e. 81% more

也可以把他们看成是一种正则化，因为网络必须把一层分解成三层。

This can be seen as imposing a regularisation on the 7 × 7 conv. filters, forcing them to have a decomposition through the 3 × 3 filters (with non-linearity injected in between).

其实小卷积核的网络在之前就有人做了，但是他们的网络不是很深，直到2014年，Goodfellow使用了11层的网络才表明增加网络深度确实可以提高performance

Goodfellow et al. (2014) applied deep ConvNets (11 weight layers) to the task of street number recognition, and showed that the increased depth led to better performance

四、CLASSIFICATION FRAMEWORK

在训练过程中，作者使用的biathsize为256，并且momentum的mu=0.9

The batch size was set to 256, momentum to 0.9.

网络在卷积层中使用了L2正则，在全连接层使用了dropout，并且keep-prop=0.5

The training was regularised by weight decay (the L2penalty multiplier set to 5·10−4) and dropout regularisation for the first two fully-connected layers (dropout ratio set to 0.5)

学习率被初始化为0.01，当验证集上的准确率下降时，学习率开始衰减

The learning rate was initially set to 10−2, and then decreased by a factor of 10 when the validation set accuracy stopped improving

作者在训练的过程中，发现它的网络比别人提出深度较少的网络训练收敛更快，作者猜测可能是因为将较大卷积核分解成三层较小卷积核带有正则化作用

the nets required less epochs to converge due to (a) implicit regularisation imposed by greater depth and smaller conv. filter sizes; (b) pre-initialisation of certain layers

网络参数的初始化工作上非常重要的，一个糟糕的初始化会让网络无法学习。为了规避这个问题，作者使用了预训练技术，因为模型A的深度要稍微小一点，相对好训练一些，所以先随机初始化模型A然后进行训练，完成训练后在使用模型A中的参数对模型E进行初始化。不过作者也提到了，使用Xavier初始化可以替代预训练过程。

五、CLASSIFICATION EXPERIMENTS

在这里插入图片描述

从图中可以看出用不用LRN实际上对performance影响不大
随着网络的深度的增加，模型的performance越来越好。对比Ｃ模型和D模型可以看出使用1x1的的卷积核没有使用3x3的卷积核效果好。但是对比B和Ｃ可以看出，1x1的卷积核效果还是更好一些的
当网络深度到19层时，其performance就陷入了瓶颈了，可能是因为数据集size的影响。作者推测，如果数据集足够多的话，网络越深越好。
作者尝试对比13层的B模型和将用5x5作替换的8层B‘模型，最终结果是后者在top-err上比前者高了7%，这就说明了，将一层拥有较大卷积核的卷积层分解成拥有较小卷积核的卷积层确实可以提高performance

在这里插入图片描述