- 🍨 本文为🔗365天深度学习训练营中的学习记录博客
- 🍖 原作者:K同学啊|接辅导、项目定制
一、课题背景和开发环境
📌第J5周:DenseNet+SE-Net实战📌
- 语言:Python3、Pytorch
- 📌本周任务:📌
– 1. 在DenseNet系列算法中插入SE-Net通道注意力机制,并完成猴痘病识别
– 2. 改进思路是否可以迁移到其他地方呢
– 3.测试集accuracy达到89%(拔高,可选)
🔊注: 从前几周开始训练营的难度逐渐提升,具体体现在不再直接提供源代码。任务中会给大家提供一些算法改进的思路、方向,希望大家这一块可以积极探索。(这个探索的过程很重要,也将学到更多)
二、介绍
SE-Net是ImageNet 2017(ImageNet收官赛)的冠军模型,是由WMW团队发布。具有复杂度低,参数少和计算量小的优点。且SE-Net思路很简单,很容易扩展到已有网络结构如Inception和ResNet中。已经有很多工作在空间维度上来提升网络的性能,如Inception等,而SE-Net将关注点放在了特征通道之间的关细上。其具体策略为:通过学习的方式来自动获取到每个特征通道的重要程度,然后依照这个重要程度去提升有用的特征并抑制对当前任务用处不大的特征,这又叫做“特征重标定”策略。具体的SE模块如下图所示:
给定一个输入 x x x ,其特征通道数为 c 1 c_1 c1 ,通过一系列卷积等一般变换 F t r F_{tr} Ftr 后得到一个特征通道数为 c 2 c_2 c2 的特征。与传统的卷积神经网络不同,我们需要通过下面三个操作来重新标定前面得到的特征。
- 首先是Squeeze操作,我们顺着空间维度来进行特征压缩,将一个通道数和输入的特征通道数相等,例如将形状为(1, 32, 32, 10)的feature map压缩成(1, 1, 1, 10)。此操作通常采用
global average pooling
来实现。 - 得到了全局描述特征后,我们进行Excitation操作来抓取特征通道之间的关系,它是一个类似于循环神经网络中门的机制:
s = F e x ( z , W ) = σ ( g ( z , W ) ) = σ ( W 2 R e L U ( W 1 z ) ) s = F_{ex}(z, W) = \sigma(g(z, W)) = \sigma(W_2ReLU(W_1z)) s=Fex(z,W)=σ(g(z,W))=σ(W2ReLU(W1z))
这里采用包含两个全连接层的bottleneck结构,即中间小两头大的结构:其中第一个全链接层起到即降维的作用,并通过ReLU激活,第二个全链接层用来将其恢复至原始的维度。进行Excitation操作的最终目的是为每个特征通道生成权重,即学习到的各个通道的激活值(sigmoid激活,值在0~1之间)。 - 最后一个是Scale的操作,我们将Excitation的输出的权重看作是经过特征选择后的每个特征通道的重要性,然后通过乘法逐通道加权到先前的特征上,完成在通道维度上的对原始特征的重标定,从而使得模型对各个通道的特征更有辨别能力,这类似于attention机制。
三、SE模块应用分析
SE模块的灵活性在于它可以直接应用在现有的网络结构中。以Inception和ResNet为例,我们只需要在Inception模块或Residual模块后添加一个SE模块即可。具体如下图所示:
上图分别是将SE模块嵌入到Inception结构与ResNet中的示例,方框旁边的维度信息代表该层的输出, r r r 表示Excitation操作中的降维系数。
四、SE模块效果对比
SE模块很容易嵌入到其他网络中,为了验证SE模块的作用,在其它流行网络如ResNet和Inception中引入SE模块,测试其在ImageNet上的效果,如下表所示:
首先看一下网络的深度对SE的影响。上表分别展示了ResNet-50、ResNet-101、ResNet-152、ResNeXt-50、ResNeXt-101和VGG-16、BN-Inception、Inception-ResNet-v2嵌入SE模型的结果。第一栏Original是原作者实现的结果,为了进行公平的比较,重新进行了实现实验得到re-implementation的结果。最后一栏SE-module是指嵌入了SE模块的结果,它的训练参数和第二栏re-implementation一致。括号中的红色数值是指相对于re-implementation的精度提升的幅值。
从上表可以看出,SE-ResNets在各种深度上都远远超过了其对应的没有SE的结构版本的精度,这说明无论网络的深度如何,SE模块都能够给网络带来性能上的增益。值得一提的是,SE-ResNet-50可以达到和ResNet-101一样的精度;更甚,SE-ResNet-101远远地超过了更深的ResNet-152。
五、SE模块代码实现
tensorflow
from tensorflow import keras
from keras import layers
from layers import Model, Input, Reshape, Activation, BatchNormalization, GlobalAveragePooling2D, Dense
class SqueezeExcitationLayer(Model):
def __init__(self, filter_sq):
# filter_sq是Excitation中第一个卷积过程中卷积核的个数
super.__init__()
self.avgpool = GlobalAveragePooling2D()
self.dense = Dense(filter_sq)
self.relu = Activation('relu')
self.sigmoid = Activation('sigmoid')
def call(self, inputs):
x = self.avgpool(inputs)
x = self.dense(x)
x = self.relu(x)
x = Dense(inputs.shape[-1])(x)
x = self.sigmoid(x)
x = Reshape((1,1,inputs.shape[-1]))(x)
scale = inputs * x
return scale
SE = SqueezeExcitationLayer(16)
pytorch
''' Squeeze Excitation Module '''
class SEModule(nn.Module):
def __init__(self, in_channel, filter_sq=16):
super(SEModule, self).__init__()
self.se = nn.Sequential(
nn.AdaptiveAvgPool2d((1, 1)),
nn.Flatten(),
nn.Linear(in_channel, in_channel//filter_sq),
nn.ReLU(True),
nn.Linear(in_channel//filter_sq, in_channel),
nn.Sigmoid()
)
#self.se = nn.Sequential(
# nn.AdaptiveAvgPool2d((1,1)),
# nn.Conv2d(in_channel, in_channel//filter_sq, kernel_size=1),
# nn.ReLU(),
# nn.Conv2d(in_channel//filter_sq, in_channel, kernel_size=1),
# nn.Sigmoid()
#)
def forward(self, inputs):
x = self.se(inputs)
s1, s2 = x.size(0), x.size(1)
x = torch.reshape(x, (s1, s2, 1, 1))
x = inputs * x
return x
六、SE模块插入到DenseNet代码实现
tensorflow
''' Basic unit of DenseBlock (using bottleneck layer) '''
def DenseLayer(x, bn_size, growth_rate, drop_rate, name=None):
f = BatchNormalization(name=name+'_1_bn')(x)
f = Activation('relu', name=name+'_1_relu')(f)
f = Conv2D(bn_size*growth_rate, 1, strides=1, use_bias=False, name=name+'_1_conv')(f)
f = BatchNormalization(name=name+'_2_bn')(f)
f = Activation('relu', name=name+'_2_relu')(f)
f = Conv2D(growth_rate, 3, strides=1, padding=1, use_bias=False, name=name+'_2_conv')(f)
if drop_rate>0:
f = Dropout(drop_rate)(f)
x = layers.Concatenate(axis=-1)([x, f])
return x
''' DenseBlock '''
def DenseBlock(x, num_layers, bn_size, growth_rate, drop_rate, name=None):
for i in range(num_layers):
x = DenseLayer(x, bn_size, growth_rate, drop_rate, name=name+'_denselayer'+str(i+1))
return x
''' Transition layer between two adjacent DenseBlock '''
def Transition(x, out_channel):
x = BatchNormalization(name=name+'_bn')(x)
x = Activation('relu', name=name+'_relu')(x)
x = Conv2D(out_channel, 1, strides=1, use_bias=False, name=name+'_conv')(x)
x = AveragePooling2D(2, 2, name='pool')(x)
return x
''' DenseNet-BC model '''
def DenseNet(input_tensor=None, # 可选的keras张量,用作模型的图像输入
input_shape=None,
init_channel=64,
growth_rate=32,
block_config=(6,12,24,16),
bn_size=4,
compression_rate=0.5,
drop_rate=0,
classes=1000): # 用于分类图像的可选类数
img_input = Input(shape=input_shape)
# first Conv2d
x = ZeroPadding2D(padding=((3, 3), (3, 3)), name='conv1_pad')(img_input)
x = Conv2D(64, 7, strides=2, use_bias=False, name='conv1_conv')(x)
x = BatchNormalization(name='conv1_bn')(x)
x = Activation('relu', name='conv1_relu')(x)
x = MaxPooling2D(3, strides=2, padding=1, name='conv1_pool')(x)
# DenseBlock
num_features = init_channel
for i, num_layers in enumerate(block_config):
x = DenseBlock(x, num_layers, bn_size, growth_rate, drop_rate, name='denseblock'+str(i+1))
num_features += num_layers*growth_rate
if i!=len(block_config)-1:
x = Transition(x, int(num_features*compression_rate))
num_features = int(num_features*compression_rate)
# 加SE注意力机制
x = SqueezeExcitationLayer(16)(x)
# final bn+ReLU
x = BatchNormalization(name='final_bn')(x)
x = Activation('relu', name='final_relu')(x)
x = GlobalAveragePooling2D(name='final_pool')(x)
x = Dense(classes, activation='softmax', name='predictions')(x)
model = Model(img_input, x, name='DenseNet')
return model
''' DenseNet121 '''
def densenet121(n_classes=1000, **kwargs):
model = DenseNet(init_channel=64, growth_rate=32, block_config=(6,12,24,16),
classes=n_classes, **kwargs)
return model
''' DenseNet169 '''
def DenseNet169(n_classes=1000, **kwargs):
model = DenseNet(init_channel=64, growth_rate=32, block_config=(6,12,32,32),
classes=n_classes, **kwargs)
return model
''' DenseNet201 '''
def DenseNet201(n_classes=1000, **kwargs):
model = DenseNet(init_channel=64, growth_rate=32, block_config=(6,12,48,32),
classes=n_classes, **kwargs)
return model
pytorch
''' Basic unit of DenseBlock (using bottleneck layer) '''
class DenseLayer(nn.Sequential):
def __init__(self, in_channel, growth_rate, bn_size, drop_rate):
super(DenseLayer, self).__init__()
self.add_module('norm1', nn.BatchNorm2d(in_channel))
self.add_module('relu1', nn.ReLU(inplace=True))
self.add_module('conv1', nn.Conv2d(in_channel, bn_size*growth_rate,
kernel_size=1, stride=1, bias=False))
self.add_module('norm2', nn.BatchNorm2d(bn_size*growth_rate))
self.add_module('relu2', nn.ReLU(inplace=True))
self.add_module('conv2', nn.Conv2d(bn_size*growth_rate, growth_rate,
kernel_size=3, stride=1, padding=1, bias=False))
self.drop_rate = drop_rate
def forward(self, x):
new_feature = super(DenseLayer, self).forward(x)
if self.drop_rate>0:
new_feature = F.dropout(new_feature, p=self.drop_rate, training=self.training)
return torch.cat([x, new_feature], 1)
''' DenseBlock '''
class DenseBlock(nn.Sequential):
def __init__(self, num_layers, in_channel, bn_size, growth_rate, drop_rate):
super(DenseBlock, self).__init__()
for i in range(num_layers):
layer = DenseLayer(in_channel+i*growth_rate, growth_rate, bn_size, drop_rate)
self.add_module('denselayer%d'%(i+1,), layer)
''' Transition layer between two adjacent DenseBlock '''
class Transition(nn.Sequential):
def __init__(self, in_channel, out_channel):
super(Transition, self).__init__()
self.add_module('norm', nn.BatchNorm2d(in_channel))
self.add_module('relu', nn.ReLU(inplace=True))
self.add_module('conv', nn.Conv2d(in_channel, out_channel,
kernel_size=1, stride=1, bias=False))
self.add_module('pool', nn.AvgPool2d(2, stride=2))
''' DenseNet-BC model '''
class DenseNet(nn.Module):
def __init__(self, growth_rate=32, block_config=(6,12,24,16), init_channel=64,
bn_size=4, compression_rate=0.5, drop_rate=0, num_classes=1000):
'''
:param growth_rate: (int) number of filters used in DenseLayer, `k` in the paper
:param block_config: (list of 4 ints) number of layers in eatch DenseBlock
:param init_channel: (int) number of filters in the first Conv2d
:param bn_size: (int) the factor using in the bottleneck layer
:param compression_rate: (float) the compression rate used in Transition Layer
:param drop_rate: (float) the drop rate after each DenseLayer
:param num_classes: (int) number of classes for classification
'''
super(DenseNet, self).__init__()
# first Conv2d
self.features = nn.Sequential(OrderedDict([
('conv0', nn.Conv2d(3, init_channel, kernel_size=7, stride=2, padding=3, bias=False)),
('norm0', nn.BatchNorm2d(init_channel)),
('relu0', nn.ReLU(inplace=True)),
('pool0', nn.MaxPool2d(3, stride=2, padding=1))
]))
# DenseBlock
num_features = init_channel
for i, num_layers in enumerate(block_config):
block = DenseBlock(num_layers, num_features, bn_size, growth_rate, drop_rate)
self.features.add_module('denseblock%d'%(i+1), block)
num_features += num_layers*growth_rate
if i!=len(block_config)-1:
transition = Transition(num_features, int(num_features*compression_rate))
self.features.add_module('transition%d'%(i+1), transition)
num_features = int(num_features*compression_rate)
# SE Module
self.features.add_module('SE-module', SEModule(num_features))
# final BN+ReLU
self.features.add_module('norm5', nn.BatchNorm2d(num_features))
self.features.add_module('relu5', nn.ReLU(inplace=True))
# classification layer
self.classifier = nn.Linear(num_features, num_classes)
# params initialization
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight)
elif isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.bias, 0)
nn.init.constant_(m.weight, 1)
elif isinstance(m, nn.Linear):
nn.init.constant_(m.bias, 0)
def forward(self, x):
x = self.features(x)
x = F.avg_pool2d(x, 7, stride=1).view(x.size(0), -1)
x = self.classifier(x)
return x
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 64, 112, 112] 9,408
BatchNorm2d-2 [-1, 64, 112, 112] 128
ReLU-3 [-1, 64, 112, 112] 0
MaxPool2d-4 [