论文阅读 | Densely Connected Convolutional Networks

前言:DenseNet

Densely Connected Convolutional Networks

引言

作者首先总结了深度网络成功的关键:创造一条从前面浅层到后面深层的最短通路

Although these different approaches vary in network topology and training procedure, they all share a key characteristic: they create short paths from early layers to later layers.

于是基于这样的启发,作者设计了一个简单的连接模式,来确保每两层之间都有信息的流动,这样任意两层之间的信息流动都有一条直接的最短通路

In this paper, we propose an architecture that distills this insight into a simple connectivity pattern: to ensure maximum information flow between layers in the network, we connect all layers (with matching feature-map sizes) directly with each other.

如图示意。其中L层则排列组合有L(L-1)/2的连接
在这里插入图片描述
不仅如此,DenseNet与Resnet的区别还体现在,DenseNet是通过特征通道的连接来连接各个特征而不是如同Resnet中的简单加法,但这样的问题体现在,最后一层会接受来自前面所有层的特征图,导致特征通道特别大,因此作者设计的网络很窄

方法

对于ResNet,实际的卷积模式为
在这里插入图片描述
(这里的H指的是非线性转换,即一套conv+bn+relu的组合)
而对于Densenet 的卷积模式为
在这里插入图片描述
当然这样放在整个网络中也会出现一个问题,即池化后特征图的尺寸就变了,无法进行通道连接。
因此DenseNet的设计出现在相同特征图尺寸的卷积操作中(非线性转换操作)
在这里插入图片描述
其中连接DenseBlock的conv+pooling被称作The transition layer

We refer to layers between blocks as transition layers, which do convolution and pooling. The transition layers used in our experiments consist of a batch normalization layer and an 1×1 convolutional layer followed by a 2×2 average pooling layer.

每一层的特征通道数的选取Growth rate

If each function H produces k feature maps, it follows that the `layer has k0 +k ×(l-1) input feature-maps, where k0 is the number of channels in the input layer.

作者将特征通道设计得比较窄,发现就足够由比较好的性能提升,作者给出的解释是每一层会接受前面所有层的特征图,因此是一种集体知识

One explanation for this is that each layer has access to all the preceding feature-maps in its block and, therefore, to the network’s “collective knowledge”. One can view the feature-maps as the global state of the network. Each layer adds k feature-maps of its own to this state.

作者还发现,可以在每一个层的输入之前加一个1 * 1的卷积降通道,可以提高计算效率,即一个Bottleneck layers.DenseNet变成DenseNet-B.

同时可以在 transition layer中降一点通道数,用一个压缩系数来控制,变成了DenseNet-BC

If a dense block contains m feature-maps, we let the following transition layer generate bθmc output feature maps, where 0 <θ ≤1 is referred to as the compression factor.

最后的网络搭建
在这里插入图片描述

实验

分类结果
在这里插入图片描述
还有一些其他的实验
这里比较有趣的是对各个层的信息流向利用的一个可视化
由于每个DenseBlock里有L层,即L(L-1)/2的通路连接,这里L=12,因此可以可视化每个通路的特征流动情况,作者在这里有四个发现:
在这里插入图片描述

  1. 对于每个target的一列来看,每层都把输入分布到各个层上。这表明,在早期的层的特征也被当前这层直接利用了

All layers spread their weights over many inputs within the same block. This indicates that features extracted by very early layers are, indeed, directly used by deep layers throughout the same dense block.

  1. transition layer的权重也分布在各个层上

The weights of the transition layers also spread their weight across all layers within the preceding dense block, indicating information flow from the first to the
last layers of the DenseNet through few indirections.

  1. 横向看第一行,对于第2、3个block, 第一层的特征流向后面的每一层的权重都是很小的,这说明由过渡层流出的特征被接下来的每一层都不怎么利用的上,说明过渡层出来了很多冗余信息

The layers within the second and third dense block consistently assign the least weight to the outputs of the transition layer (the top row of the triangles), indicating that the transition layer outputs many redundant features (with low weight on average). This is in keeping with the strong results of DenseNet-BC where
exactly these outputs are compressed.

  1. 最后的分类层(最右边的一列)的特征更集中到最后的几张特征图上,说明网络的最后可能产生了一些高维语义特征

Although the final classification layer, shown on the very right, also uses weights across the entire dense block, there seems to be a concentration towards final
feature-maps, suggesting that there may be some more high-level features produced late in the network.

总结

DenseNet也算是神经网络的一个经典设计,这次作个阅读笔记,我觉得论文里的一些观点和实验都挺新奇的

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值