DenseNet

最新推荐文章于 2024-08-26 18:19:23 发布

xin心扉

最新推荐文章于 2024-08-26 18:19:23 发布

阅读量8.4k

点赞数 1

分类专栏：深度学习

本文链接：https://blog.csdn.net/weixin_41172694/article/details/86978647

版权

深度学习专栏收录该内容

8 篇文章 0 订阅

订阅专栏

论文：Densely Connected Convolutional Networks

链接：https://arxiv.org/abs/1608.06993

第三方代码：

CVPR2017的最佳论文

论文详解：

作为CVPR2017年的Best Paper, DenseNet脱离了加深网络层数(ResNet)和加宽网络结构(Inception)来提升网络性能的定式思维,从特征的角度考虑,通过特征重用和旁路(Bypass)设置达到更好的效果和更少的参数。

先说下DenseNet的优点，原文是这么说的，DenseNets have several com-pelling advantages: they alleviate the vanishing gradientproblem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters.

1.减轻了梯度消失。

2.加强了feature的传递

3.加强了特征的重用

4.一定程度上减少了参数数列。

随着CNN网络层数的不断增加,gradient vanishing和model degradation问题出现在了人们面前，目前很多论文都针对这个问题提出了解决方案，BatchNormalization的广泛使用在一定程度上缓解了gradient vanishing的问题,又比如ResNet，Highway Networks，Stochastic depth，FractalNets等，用文章的一句话来说： they create short paths from early layers to later layers。

那么接下来就是重点了，文章是这么说的：

to ensure maximum information flow between layers in the network, we connect all layers (with matching feature-map sizes) di-rectly with each other. To preserve the feed-forward nature,each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequentlayers.

也就是说作者把所有层连接起来，dense block的结构图如下所示：在传统的卷积神经网络中，如果L层，那么就会有L个连接，但在DenseNet中，会有L(L+1)/2个连接

原文是这么说的，A possibly counter-intuitive effect of this dense connecivity pattern is that it requires fewer parameters than traditional convolutional networks, as there is no need to relearn redundant feature-maps.

DenseNet layers are very narrow (e.g., 12 filters per layer),adding only a small set of feature-maps to the “collective knowledge” of the network and keep the remaining feature-maps unchanged—and the final classifier makes a decisionbased on all feature-maps in the network.

我是这么理解的，dense block中每个卷积层的输出feature map的数量都很小，但由于特征重用，concat后的输入channel还是比较大的。

Besides better parameter efficiency, one big advantage of DenseNets is their improved flow of information and gradients throughout the network, which makes them easy to train. Each layer has direct access to the gradients from the loss function and the original input signal, leading to an implicit deep supervision [20]. This helps training of deeper network architectures. Further, we also observe that dense connections have a regularizing effect, which reduces over-fitting on tasks with smaller training set sizes

梯度消失问题在网络深度越深的时候越容易出现，原因输入信息和梯度信息在很多层之间传递导致的，而现在这种dense connection相当于每一层都直接连接input和loss，因此减轻梯度消失现象，这样更深网络不是问题。另外作者还观察到这种dense connection有正则化的效果，因此对于过拟合有一定的抑制作用。

嗯嗯嗯，接下来就讲正文了，网络结构如下图所示：

Dense connectivity.To further improve the informationflow between layers we propose a different connectivity pattern: we introduce direct connections from any layer to all subsequent layers.

[x0,x1,…,xl-1]表示将 $X_{0}$ 到 $X_{l-1}$ 层的所有输出feature map concatenation。Hl包括BN，ReLU和3*3的卷积。参考ResNet V2

Pooling layers.However, an essential part of convolutional networks is down-sampling layers that change the size of feature-maps.To facilitate down-sampling in our architecture we divide the network into multiple densely connected dense blocks;see Figure 2.

由于在DenseNet中需要对不同层的feature map进行concat操作,所以需要不同层的feature map保持相同的feature size.为了使用Down sampling,作者将DenseNet分为多个Denseblock。在同一个Denseblock中feature size相同大小,在不同Denseblock之间设置transition layers实现Down sampling。The transition layers used in our experiments consist of a batch normalization layer and an 1×1 convolutional layer followed by a 2×2 average pooling layer.

Growth rate.在Denseblock中,表示每个dense block中每层输出的feature map个数, 那么第i层网络的输入便为K0+(i-1)×K（K0上一个Dense Block的输出channel）,可以看到DenseNet和现有网络的一个主要的不同点:DenseNet可以接受较少的特征图数量作为网络层的输出,作者的实验也表明小的k可以有更好的效果。One explanation for this is that each layer has access to all the preceding feature-maps in its block and, therefore,to the network’s “collective knowledge”. 也就是说后面几层可以得到前面所有层的输入，concat后的输入channel还是比较大的。如果我们把feature-maps看作是一个Denseblock的全局状态,那么每一层的训练目标便是通过现有的全局状态,判断需要添加给全局状态的更新值.

Bottleneck layers.目的是减少输入的feature map数量，既能降维减少计算量，又能融合各个通道的特征。

Compression.为了进一步优化模型的简洁性,我们可以在transition layer中降低feature map的数量.若一个Denseblock中包含m个feature maps,那么我们使其输出连接的transition layer层生成⌊θm⌋个输出feature map.其中θ为Compression factor, we refer to our model as DenseNet-BC.

实验结果

左图包含了对多种DenseNet结构参数和最终性能的统计,我们可以看出当模型实现相同的test error时,原始的DenseNet往往要比DenseNet-BC拥有2-3倍的参数量.中间图为DenseNet-BC与ResNet的对比,在相同的模型精度下,DenseNet-BC只需要ResNet约三分之一的参数数量.右图为1001层超过10M参数量的ResNet与100层只有0.8M参数量的DenseNet-BC在训练时的对比,虽然他们在约相同的训练epoch时收敛,但DenseNet-BC却只需要ResNet不足十分之一的参数量

算法分析：

Model compactness.As a direct consequence of the input concatenation, the feature-maps learned by any of the DenseNet layers can be accessed by all subsequent layers.This encourages feature reuse throughout the network, and leads to more compact models

每一层学到的feature map都能被之后所有层直接使用,这使得特征可以在整个网络中重用,也使得模型更加简洁.

Implicit Deep Supervision.One explanation for the improved accuracy of dense convolutional networks may be that individual layers receive additional supervision from the loss function through the shorter connections.

Feature Reuse.为了探索feature的复用情况,作者进行了相关实验，这一块不是特别了解，英文直译下:

从图中我们可以得出以下结论:

a) 一些较早层提取出的特征仍可能被较深层直接使用

b) information flow from the first to the last layers of the DenseNet through few indirections.后面的block也会重用到之前块的特征。

c) 第2-3个Denseblock中的层对之前Transition layer利用率很低,transition layer输出大量冗余特征，Compression的必要性.

d) 最后的分类层、使用了之前Dense Block中的多层信息,但更偏向于使用最后几个feature map的特征,网络的最后几层,某些high-level的特征可能被产生.

结论：

向前：每一层都可以看到所有的之前的输入，对于网络已经学习到的『知识』（即已有feature map），以及原始输入，都可以直接access到，然后再添加自己的『知识』到全局知识库。鼓励了特征的重用，特征重用就可以减少不必要的计算量。另外，多层之间可以很好地进行交互，每一层都接受前面所有层的输出，具有多层特征融合的特性；
向后：跳跃结构，可以很近地连接到最后的loss，训练起来很容易，直接接受最终loss的监督，深层监督，解决梯度消失的问题，并且，能起到正则化的作用缓解过拟合；

因为有稠密直连的过程，所以各个feature都要存下来，实际上很容易爆显存，另外，这种稠密连接也意味着反向传播计算梯度更加复杂，每一步训练并不一定会更快。

参考知乎：https://zhuanlan.zhihu.com/p/32989555

https://www.zhihu.com/question/60109389

参考博客：https://blog.csdn.net/u014380165/article/details/75142664

https://blog.csdn.net/SIGAI_CSDN/article/details/82115254