2017_DenseNet_Facebook:
图:
trasition layer补充图
网络描述:
DenseNet让网络的每一层的输入变成所有前面层的叠加(concat),然后把它的特征图传递给所有接下来的网络层。transition layer,放在两个Dense Block中间,是因为每个Dense Block结束后的输出channel个数很多,需要用1*1的conv来降维。
Densenet和其他网络对比
与Inception系列和ResNet网络不通,Inception网络主要是从网络的宽度方面改进网络的结构从而提高网络的表达能力,而ResNet主要是从网络的深度方面改进网络的结构来提高表达能力,而DenseNet则是通过特征图重用的方式来探索网络的潜能。在resnet中前一层的输入和后一层的输入是相加到一起的D,这在某种程度上阻碍了信息在网络中的流动,但是在densenet中并不是直接相加的而是通过级联的方式连接在一起(concatenate)。同时densenet的参数校教育其他网络要少很多并且减轻了梯度消失的问题,因为在普通个网络中梯度是通过一层一层传递下去的导致梯度消失,但是在densenet中每一层都连接着后面的层,这就在某种程度上避免了这种现象的发生。==除此之外,densenet很窄,每一层只有12个filter。==文中指出resnet是通过网络的深度来提升模型的性能,googlenet是通过增加宽度来提升模型的性能,但是densenet是通过很好的利用feature来提升模型的性能的。
在处理特征图数量或尺寸不匹配的问题上,ResNet采用零填充或者使用1x1的Conv来扩充特征图数量,而DenseNet是在两个Dense Block之间使用Batch+1x1Conv+2x2AvgPool作为trasition layer的方式来匹配特征图的尺寸。 这样就充分利用了学习的特征图,而不会使用零填充来增加不必要的外在噪声,或者使用1x1Conv+stride=2来采样已学习到的特征(stride=2会丢失部分学习的特征)。
密集连接不会带来冗余吗?不会!密集连接这个词给人的第一感觉就是极大的增加了网络的参数量和计算量。但实际上 DenseNet 比其他网络效率更高,其关键就在于网络每层计算量的减少以及特征的重复利用。由于每一层都包含之前所有层的输出信息,因此其只需要很少的特征图就够了(DenseNet与其他的网络架构有一个重要的不同之处在于可以通过修改k的大小,让DenseNet的网络变得非常小),这也是为什么DneseNet的参数量较其他模型大大减少的原因,这也是被称作轻量级网络的原因。这种dense connection相当于每一层都直接连接input和loss,因此就可以减轻梯度消失现象,这样更深网络不是问题。
特点,优点:
(1)解决梯度消失的问题:尽量缩短前层和后层之间的连接
(2)加强特征传播
(3)鼓励特征重用
(4)大幅度减少参数数量
代码:
keras实现:
# model
def DenseLayer(x, nb_filter, bn_size=4, alpha=0.0, drop_rate=0.2):
# Bottleneck layers
x = BatchNormalization(axis=3)(x)
x = LeakyReLU(alpha=alpha)(x)
x = Conv2D(bn_size*nb_filter, (1, 1), strides=(1,1), padding='same')(x)
# Composite function
x = BatchNormalization(axis=3)(x)
x = LeakyReLU(alpha=alpha)(x)
x = Conv2D(nb_filter, (3, 3), strides=(1,1), padding='same')(x)
if drop_rate: x = Dropout(drop_rate)(x)
return x
def DenseBlock(x, nb_layers, growth_rate, drop_rate=0.2):
for ii in range(nb_layers):
conv = DenseLayer(x, nb_filter=growth_rate, drop_rate=drop_rate)
x = concatenate([x, conv], axis=3)
return x
def TransitionLayer(x, compression=0.5, alpha=0.0, is_max=0):
nb_filter = int(x.shape.as_list()[-1]*compression)
x = BatchNormalization(axis=3)(x)
x = LeakyReLU(alpha=alpha)(x)
x = Conv2D(nb_filter, (1, 1), strides=(1,1), padding='same')(x)
if is_max != 0: x = MaxPooling2D(pool_size=(2, 2), strides=2)(x)
else: x = AveragePooling2D(pool_size=(2, 2), strides=2)(x)
return x
growth_rate = 12
inpt = Input(shape=(32,32,3))
x = Conv2D(growth_rate*2, (3, 3), strides=1, padding='same')(inpt)
x = BatchNormalization(axis=3)(x)
x = LeakyReLU(alpha=0.1)(x)
x = DenseBlock(x, 12, growth_rate, drop_rate=0.2)
x = TransitionLayer(x)
x = DenseBlock(x, 12, growth_rate, drop_rate=0.2)
x = TransitionLayer(x)
x = DenseBlock(x, 12, growth_rate, drop_rate=0.2)
x = BatchNormalization(axis=3)(x)
x = GlobalAveragePooling2D()(x)
x = Dense(10, activation='softmax')(x)
model = Model(inpt, x)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
for ii in range(10):
print("Epoch:", ii+1)
model.fit(train_X, train_Y, batch_size=100, epochs=1, verbose=1)
score = model.evaluate(test_X, test_Y, verbose=1)
print('Test loss =', score[0])
print('Test accuracy =', score[1])
save_model(model, 'DenseNet.h5')
model = load_model('DenseNet.h5')
pred_Y = model.predict(test_X)
score = model.evaluate(test_X, test_Y, verbose=0)
print('Test loss =', score[0])
print('Test accuracy =', score[1])
pytorch实现:
def conv_block(in_channel, out_channel):
layer = nn.Sequential(
nn.BatchNorm2d(in_channel),
nn.ReLU(),
nn.Conv2d(in_channel, out_channel, kernel_size=3, padding=1, bias=False)
)
return layer
class dense_block(nn.Module):
def __init__(self, in_channel, growth_rate, num_layers):
super(dense_block, self).__init__()
block = []
channel = in_channel
for i in range(num_layers):
block.append(conv_block(channel, growth_rate))
channel += growth_rate
self.net = nn.Sequential(*block)
def forward(self, x):
for layer in self.net:
out = layer(x)
x = torch.cat((out, x), dim=1)
return x
def transition(in_channel, out_channel):
trans_layer = nn.Sequential(
nn.BatchNorm2d(in_channel),
nn.ReLU(),
nn.Conv2d(in_channel, out_channel, 1),
nn.AvgPool2d(2, 2)
)
return trans_layer
class densenet(nn.Module):
def __init__(self, in_channel, num_classes, growth_rate=32, block_layers=[6, 12, 24, 16]):
super(densenet, self).__init__()
self.block1 = nn.Sequential(
nn.Conv2d(in_channel, 64, 7, 2, 3),
nn.BatchNorm2d(64),
nn.ReLU(True),
nn.MaxPool2d(3, 2, padding=1)
)
self.DB1 = self._make_dense_block(64, growth_rate,num=block_layers[0])
self.TL1 = self._make_transition_layer(256)
self.DB2 = self._make_dense_block(128, growth_rate, num=block_layers[1])
self.TL2 = self._make_transition_layer(512)
self.DB3 = self._make_dense_block(256, growth_rate, num=block_layers[2])
self.TL3 = self._make_transition_layer(1024)
self.DB4 = self._make_dense_block(512, growth_rate, num=block_layers[3])
self.global_average = nn.Sequential(
nn.BatchNorm2d(1024),
nn.ReLU(),
nn.AdaptiveAvgPool2d((1,1)),
)
self.classifier = nn.Linear(1024, num_classes)
def forward(self, x):
x = self.block1(x)
x = self.DB1(x)
x = self.TL1(x)
x = self.DB2(x)
x = self.TL2(x)
x = self.DB3(x)
x = self.TL3(x)
x = self.DB4(x)
x = self.global_average(x)
x = x.view(x.shape[0], -1)
x = self.classifier(x)
return x
def _make_dense_block(self,channels, growth_rate, num):
block = []
block.append(dense_block(channels, growth_rate, num))
channels += num * growth_rate
return nn.Sequential(*block)
def _make_transition_layer(self,channels):
block = []
block.append(transition(channels, channels // 2))
return nn.Sequential(*block)
net = densenet(3,10)
x = torch.rand(1,3,224,224)
for name,layer in net.named_children():
if name != "classifier":
x = layer(x)
print(name, 'output shape:', x.shape)
else:
x = x.view(x.size(0), -1)
x = layer(x)
print(name, 'output shape:', x.shape)