Table of Contents
论文名:Densely Connected Convolutional Networks
下载地址:https://arxiv.org/pdf/1608.06993.pdf
正文
该论文的作者借鉴了ResNet模型跨层连接(不熟悉的同学可以参考我之前的博客ResNet解析)的思想,也有自己的创新,从feature map作为优化网络性能的切入点,设计了全新的DenseNet模型,获得了CVPR2017的最佳论文荣誉。
1.原理解析
作者在ResNet模型的基础上,设计了多层feature map的跨层稠密连接结构dense block,这种结构实现feature的极致利用。dense block结构如下图所示,每个dense block中包含多个卷积层,不同于传统卷积网络中每一层的输入来自前一层的输出,dense block结构使得每一层的输入来自于前面所有层的输出
dense block结构有如下优点:
- 减轻了梯度消失gradient vanishing和模型退化model degradation
- 加强了feature的传递
- 更有效的利用了feature
- 一定程度减少了参数数量
下面列出两个公式来帮助大家理解ResNet和DenseNet的原理,同时明白它们的不同之处。
公式1是ResNet的原理,l层的输入是l-1层的输出和对l-1层的非线性变换;公式2是DenseNet的原理,l层的输入是0到l-1层的所有的非线性变换。需要说明的一点是,ResNet是将输入输出进行相加,而DenseNet则是将所以之前层的输出进行叠加(并不是相加操作).
2.网络结构
2.1 DenseNet-B
由于dense block结构的这种密集连接方式充分利用了feature map,所以论文中卷积层输出feature map维度数k可以较小(卷积核的个数),k称为growth rate,可以理解为卷积层宽度的增长;虽然k较小,但是经过多层的叠加,L层输入维度=k0+k(L−1)仍然很大,为了减少参数量,在每一个卷积层之前加入一个1*1卷积,对输入feature map进行降维,使其维度为变为4k.最终DenseNet-B的结构为BN+ReLU+1x1 Conv+BN+ReLU+3x3 Conv.
2.2 Translation layer
为了进一步降低参数量,在每个dense block之间添加一个 Translation layer:1*1卷积和2*2池化,减小feature map 的尺寸和维度,其中维度压缩因子为θ,0<θ<1.具有的Translation layer的结构称为DenseNet-C.
具有的DenseNet-B和Translation layer的结构称为DenseNet-BC.
2.3 整体结构
3.DenseNet实现
3.1 DenseLayer
class DenseLayer(nn.Module):
def __init__(self, in_channels, k, n=4):
super(DenseLayer, self).__init__()
self.layer = nn.Sequential(
nn.BatchNorm2d(in_channels),
nn.ReLU(inplace=True),
nn.Conv2d(in_channels, k*n, 1, bias=False),
nn.BatchNorm2d(k*n),
nn.ReLU(inplace=True),
nn.Conv2d(k*n, k, 3, padding=1, bias=False),
)
def forward(self, x):
out = self.layer(x)
out = torch.cat([x, out], 1)
return out
3.2 Transition
class Transition(nn.Module):
def __init__(self, in_channels, theta=0.5):
super().__init__()
self.layer = nn.Sequential(
nn.BatchNorm2d(in_channels),
nn.ReLU(inplace=True),
nn.Conv2d(in_channels, in_channels*theta, 1, bias=False),
nn.AvgPool2d(kernel_size=2, stride=2)
)
def forward(self, x):
x = self.layer(x)
return x
3.3 DenseBlock
class DenseBlock(nn.Module):
def __init__(self, in_channels, denselayer_num, k, theta):
super().__init__()
layers = []
for i in range(denselayer_num):
layers.append(DenseLayer(in_channels, k))
in_channels += k
layers.append(Transition(in_channels, theta))
self.block = nn.Sequential(*layers)
def forward(self, x):
x = self.block(x)
return x
3.4 DenseNet
class DenseNet(nn.Module):
def __init__(self, block_layer_num, k, theta, classes_num=settings.CLASSES_NUM):
super().__init__()
self.in_channels = k
self.conv1 = nn.Sequential(
nn.Conv2d(3, k, 7, stride=2, padding=3, bias=False),
nn.BatchNorm2d(k),
nn.ReLU(inplace=True)
)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
layers = []
for i, layers_num in block_layer_num:
layers.append(DenseBlock(self.in_channels, layers_num, k, theta))
self.in_channels = (self.in_channels+layers_num*k)/2
self.densnet_bc = nn.Sequential(*layers)
self.avgpool = nn.AvgPool2d(7, stride=1)
self.fc = nn.Linear(self.in_channels, classes_num)
def forward(self, x):
x = self.conv1(x)
x = self.maxpool(x)
x = self.densnet_bc(x)
x = self.avgpool(x)
x = self.fc(x)
return x
def densenet121():
model = DenseNet([6, 12, 24, 16], 12, 0.5)
return model
实例化函数
def densenet121():
model = DenseNet([6, 12, 24, 16], 24, 0.5)
return model
def densenet169():
model = DenseNet([6, 12, 32, 32], 32, 0.5)
return model
def densenet201():
model = DenseNet([6, 12, 48, 32], 32, 0.5)
return model
def densenet264():
model = DenseNet([6, 12, 64, 48], 32, 0.5)
return model