版权声明:本文为博主原创文章,未经博主允许不得转载
论文:https://arxiv.org/abs/1611.05431
2017年发表在CVPR
创新点:
提出了”cardinality",原文是这么描述的,increasing cardinality is more effective than going deeper or wider 。
改变了传统Vgg,Resnet堆叠的思想,借用了Inception系列split-transform-merge的策略,把单路卷积转变到了多个支路的多个卷积,不过拓扑结构相同,减少了超参数的设计,便于移植。
原文解析:
原文中先是提到了Vgg(Resnet继承),The VGG-nets [36] exhibit a simple yet effective strategy of constructing very deep networks: stacking building blocks of the same shape,然后谈到可能会避免超参数对特定数据过适应,也就是拓展性强,原文是这么解释的:may reduce the risk of over-adapting the hyperparameters to a specific dataset.
接着原文又说起了Inception系列网络,谈到了split-transform-merge的策略,不过有个问题:网络需要精心设计,过滤器的数目大小等等,扩展性一般。原文中是这么提到的,Despite good accuracy, the realization of Inception models has been accompanied with a series of complicating factors — the filter numbers and sizes are tailored for each individual transformation 。
于是于是,作者提出了ResNeXt网络,同时采用 Vgg/ResNet 堆叠的思想和 Inception 的 split-transform-merge思想,adopts VGG/ResNets’ strategy of repeating layers, while exploiting the split-transform-merge strategy。文章中是这么描述的,A module in our network performs a set of transformations, each on a low-dimensional embedding, whose outputs are aggregated by summation. We pursuit a simple realization of this idea — the transformations to be aggregated are all of the same topology。
当然当然,结果呢,作者说到了在增加准确率的同时基本不改变或降低模型的复杂度,也提到了一个新名词"cardinality'',谈到了自己的观点,Experiments demonstrate that increasing cardinality is a more effective way of gaining accuracy than going deeper or wider, especially when depth and width starts to give diminishing returns for existing models.如下图 Fig1, 右边是 cardinality=32 ,这里每个被聚合的拓扑结构都是一样的(减少了超参数的设计,减轻了负担)
介绍完了,作者谈到了相关的工作:
1.Multi-branch convolutional networks.
2.Grouped convolutions.
3.Compressing convolutional networks.
4.Ensembling(这个我不是很明白,有理解的大牛能帮忙解释下吗)。原文是这么说的,Averaging a set of independently trained networks is an effective solution to improving accuracy [24], widely adopted in recognition competitions [33]. Veit et al.[40] interpret a single ResNet as an ensemble of shallower networks, which results from ResNet’s additive behaviors[15]. Our method harnesses additions to aggregate a set of transformations. But we argue that it is imprecise to view our method as ensembling, because the members to be aggregated are trained jointly, not independently.
结构图:
两条设计规则:
(i)if producing spatial maps of the same size, the blocks share the same hyper-parameters (width and filter sizes),
(ii)each time when the spatial map is downsampled by a factorof 2, the width of the blocks is multiplied by a factor of 2.
ResNeXt block:
作者先是提到了全连接层,谈到了内积,原文是这么说的,Inner product can be thought of as a form of aggregating transformation:
然后作者把wixi换成了一般的函数,原文是这么说的,we consider replacing the elementary transformation (wixi) with a more generic function, which in itself can also be a network.
其中C是 cardinality,Ti是相同的拓扑结构。
最后作者展示了三种相同的 ResNeXt block。fig3.a 就是最原始的结构。 fig3.b ,有点类似 Inception-ResNet,只不过都是相同的拓扑结构。fig 3.c是group convolutions,最早应该是 AlexNet 上使用,可以减少计算量。这里 采用32个 group,每个 group 的输入输出 channels 都是4。
结果图:
下表说明了在复杂度相同的情况下提高cardinality对结果的影响。
下表主要说明了增加Cardinality和增加深度或宽度的区别。
下表一方面证明了residual connection的有效性,也证明了aggregated transformations的有效性.