经典CNN
LeNet
《Gradient-Based Learning Applied to Document Recognition》Proceedings of the IEEE 1998
LeNet-5是最简单的架构之一,是第一个将反向传播应用于实际应用的CNN架构。它由2个5×5卷积层、2个2×2池化层和3个全连接层组成。当初是用于手写数字识别。
- 创新点:叠加卷积层和池化层,并以一个或多个全连接层结束网络。
class LeNet(nn.Module):
def __init__(self):
super(LeNet, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 6, kernel_size=5),
nn.Conv2d(6, 16, kernel_size=5)
)
self.classifier = nn.Sequential(
nn.Linear(16 * 5 * 5, 120),
nn.Linear(120, 84),
nn.Linear(84, 10)
)
def forward(self, x):
x = self.features(x)
x = self.classifier(x)
return x
AlexNet
《ImageNet Classification with Deep Convolutional Neural Networks》NeurIPS 2012
AlexNet是最早在GPU上实现的CNN模型之一,创建了一个更深、更复杂的CNN模型,由5个卷积层、3个池化层和3个全连接层组成。该模型具有各种大小的卷积核,并且通道数比LeNet大得多。 他们还开始使用ReLU激活代替Sigmoid型或Tanh激活,这有助于训练更好的模型。
- 创新点:使用ReLU作为激活函数;使用卷积神经网络的重叠池化。
class AlexNet(nn.Module):
def __init__(self, num_classes):
super(AlexNet, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(64, 192, kernel_size=5, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(192, 384, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(384, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
)
self.avgpool = nn.AdaptiveAvgPool2d((6, 6))
self.classifier = nn.Sequential(
nn.Dropout(),
nn.Linear(256 * 6 * 6, 4096),
nn.ReLU(inplace=True),
nn.Dropout(),
nn.Linear(4096, 4096),
nn.ReLU(inplace=True),
nn.Linear(4096, num_classes),
)
def forward(self, x):
x = self.features(x)
x = self.avgpool(x)
x = torch.flatten(x, 1)
if self.training:
return self.classifier(x)
x = self.classifier[:2](x)
return x
VGG
《Very Deep Convolutional Networks for Large-Scale Image Recognition》CVPR 2014
VGG-16有13个卷积层、5个池化层和3个全连接层。这个网络只是在AlexNet的基础上堆叠了更多的层,但提出了将大的卷积核(如11×11卷积核)分解为多个3×3卷积核的想法。
- 创新点:设计了更深层次的网络(大约是AlexNet的两倍);堆叠多个小卷积核以起到更大卷积核相同的效果,减少模型参数。
class VGG(nn.Module):
def __init__(self, num_classes, layer_nums=(2, 2, 3, 3, 3), planes=(64, 128, 256, 512, 512)):
super(VGG, self).__init__()
self.features = self._make_layers(layer_nums, planes)
self.avgpool = nn.AdaptiveAvgPool2d((7, 7))
self.classifier = nn.Sequential(
nn.Linear(512 * 7 * 7, 4096),
nn.ReLU(True),
nn.Dropout(),
nn.Linear(4096, 4096),
nn.ReLU(True),
nn.Dropout(),
nn.Linear(4096, num_classes),
)
@staticmethod
def _make_layers(layer_nums, planes):
layers = []
in_channels = 3
for (num, plane) in zip(layer_nums, planes):
for i in range(num):
layers += [nn.Conv2d(in_channels, plane, kernel_size=3, padding=1),
nn.ReLU(inplace=True)]
in_channels = plane
layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
return nn.Sequential(*layers)
def forward(self, x):
x = self.features(x)
x = self.avgpool(x)
x = torch.flatten(x, 1)
if self.training:
return self.classifier(x)
x = self.classifier[:-1](x)
return x
ResNet
《Deep Residual Learning for Image Recognition》 CVPR 2015
由于简单地堆叠CNN层以创建更深层模型会导致梯度消失问题,增加网络深度后,比较靠前的层梯度会很小,使得深层模型难以训练。ResNet模型引入了残差块连接,该模型为梯度传递创建了替代路径以跳过中间层并直接到达初始层,这使人们能够训练出性能较好的极深模型。
- 创新点:使用跳连接结构skip connections,引入残差单元来解决退化问题;网络深度大大增加。
def conv3x3(in_planes, out_planes, stride=1):
return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, padding=1, bias=False)
def conv1x1(in_planes, out_planes, stride=1):
return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False)
class Bottleneck(nn.Module):
def __init__(self, inplanes, planes, stride=1, downsample=None):
super(Bottleneck, self).__init__()
self.expansion = 4
self.conv1 = conv1x1(inplanes, planes)
self.bn1 = nn.BatchNorm2d(planes)
self.conv2 = conv3x3(planes, planes, stride)
self.bn2 = nn.BatchNorm2d(planes)
self.conv3 = conv1x1(planes, planes * self.expansion)
self.bn3 = nn.BatchNorm2d(planes * self.expansion)
self.relu = nn.ReLU(inplace=True)
self.downsample = downsample
def forward(self, x):
identity = x
if self.downsample is not None:
identity = self.downsample(x)
out = self.relu(self.bn1(self.conv1(x)))
out = self.relu(self.bn2(self.conv2(out)))
out = self.bn3(self.conv3(out))
out += identity
out = self.relu(out)
return out
class ResNet(nn.Module):
def __init__(self, num_classes, last_stride=2, layers=(3, 4, 6, 3)):
super(ResNet, self).__init__()
self.inplanes = 64
self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=7, stride=2, padding=3, bias=False)
self.bn1 = nn.BatchNorm2d(self.inplanes)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.layer1 = self._make_layer(Bottleneck, 64, layers[0])
self.layer2 = self._make_layer(Bottleneck, 128, layers[1], stride=2)
self.layer3 = self._make_layer(Bottleneck, 256, layers[2], stride=2)
self.layer4 = self._make_layer(Bottleneck, 512, layers[3], stride=last_stride)
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.classifier = nn.Linear(2048, num_classes)
def _make_layer(self, block, planes, blocks, stride=1):
downsample = None
if stride != 1 or self.inplanes != planes * block.expansion:
downsample = nn.Sequential(
conv1x1(self.inplanes, planes * block.expansion, stride),
nn.BatchNorm2d(planes * block.expansion))
layers = list()
layers.append(block(self.inplanes, planes, stride, downsample)) # 只有第一个block可能需要改变大小
self.inplanes = planes * block.expansion
for _ in range(1, blocks):
layers.append(block(self.inplanes, planes))
return nn.Sequential(*layers)
def forward(self, x):
x = self.relu(self.bn1(self.conv1(x)))
x = self.maxpool(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.avgpool(x)
x = torch.flatten(x, 1)
if self.training:
return self.classifier(x)
return x
Inception-v4
《Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning》 AAAI 2016
InceptionNet具有更深,更多的参数。为了解决训练更深层模型的问题,采用了在模型之间使用多个辅助分类器,并按一个较小的权重加到最终分类结果中,以防止梯度消失。并行使用各种大小的卷积核,增加了模型的宽度,同时使模型适应更多的尺度。采用 1 × 1 1\times1 1×1卷积进行降维。
- Inception-v1使用 1 × 1 1\times1 1×1、 3 × 3 3\times3 3×3、 5 × 5 5\times5 5×5卷积层和 3 × 3 3\times3 3×3池化层并联;用全局平均池化层来取代模型最后的全连接层;
- Inception-v2增加了Batch Normalization,同时把 5 × 5 5\times5 5×5的卷积层改成了两个 3 × 3 3\times3 3×3的卷积层串联,增加网络的深度,并且减少了很多参数;引入了非对称卷积;使用类似Inception的Reduction结构,实现了特征图的压缩和通道的扩增;
- Inception-v3使用了将网络中的 3 × 3 3\times3 3×3、 7 × 7 7\times7 7×7卷积层分别分解为 3 × 1 3\times1 3×1和 1 × 3 1\times3 1×3、 7 × 1 7\times1 7×1和 1 × 7 1\times7 1×7;使用了Label Smoothing和RMSProp优化器;
- Inception-v4修改了模型前面的stem模块以及Inception-C模块。
- 创新点:使用多种卷积核并行使网络变宽。
class Conv(nn.Sequential):
def __init__(self, inplanes, outplanes, kernel_size, stride, padding=(0, 0)):
super(Conv, self).__init__()
self.add_module('conv', nn.Conv2d(inplanes, outplanes, kernel_size, stride, padding, bias=False))
self.add_module('bn', nn.BatchNorm2d(outplanes))
self.add_module('relu', nn.ReLU(inplace=True))
def conv1x1(in_planes, out_planes):
return Conv(in_planes, out_planes, 1, 1)
def conv3x3(in_planes, out_planes, stride=1, padding=0):
return Conv(in_planes, out_planes, 3, stride, padding)
def conv7x7(plane1, plane2, plane3, reverse=False):
if reverse:
return Conv(plane1, plane2, kernel_size=(7, 1), stride=1, padding=(3, 0)), \
Conv(plane2, plane3, kernel_size=(1, 7), stride=1, padding=(0, 3))
return Conv(plane1, plane2, kernel_size=(1, 7), stride=1, padding=(0, 3)), \
Conv(plane2, plane3, kernel_size=(7, 1), stride=1, padding=(3, 0))
class Concat(nn.Module):
def forward(self, x):
return torch.cat([module(x) for module in self._modules.values()], 1)
class StemPart1(Concat):
def __init__(self):
super(StemPart1, self).__init__()
self.maxpool = nn.MaxPool2d(3, stride=2)
self.conv = conv3x3(64, 96, stride=2)
class StemPart2(Concat):
def __init__(self):
super(StemPart2, self).__init__()
self.branch0 = nn.Sequential(
conv1x1(160, 64),
conv3x3(64, 96))
self.branch1 = nn.Sequential(
conv1x1(160, 64),
*conv7x7(64, 64, 64),
Conv(64, 96, kernel_size=(3, 3), stride=1))
class StemPart3(Concat):
def __init__(self):
super(StemPart3, self).__init__()
self.conv = conv3x3(192, 192, stride=2)
self.maxpool = nn.MaxPool2d(3, stride=2)
class InceptionA(Concat):
def __init__(self):
super(InceptionA, self).__init__()
self.branch0 = conv1x1(384, 96)
self.branch1 = nn.Sequential(
conv1x1(384, 64),
conv3x3(64, 96, padding=1))
self.branch2 = nn.Sequential(
conv1x1(384, 64),
conv3x3(64, 96, padding=1),
conv3x3(96, 96, padding=1))
self.branch3 = nn.Sequential(
nn.AvgPool2d(3, stride=1, padding=1, count_include_pad=False),
conv1x1(384, 96))
class ReductionA(Concat):
def __init__(self):
super(ReductionA, self).__init__()
self.branch0 = conv3x3(384, 384, stride=2)
self.branch1 = nn.Sequential(
conv1x1(384, 192),
conv3x3(192, 224, padding=1),
conv3x3(224, 256, stride=2))
self.branch2 = nn.MaxPool2d(3, stride=2)
class InceptionB(Concat):
def __init__(self):
super(InceptionB, self).__init__()
self.branch0 = conv1x1(1024, 384)
self.branch1 = nn.Sequential(
conv1x1(1024, 192),
*conv7x7(192, 224, 256))
self.branch2 = nn.Sequential(
conv1x1(1024, 192),
*conv7x7(192, 192, 224, reverse=True),
*conv7x7(224, 224, 256, reverse=True))
self.branch3 = nn.Sequential(
nn.AvgPool2d(3, stride=1, padding=1, count_include_pad=False),
conv1x1(1024, 128)
)
class ReductionB(Concat):
def __init__(self):
super(ReductionB, self).__init__()
self.branch0 = nn.Sequential(
conv1x1(1024, 192),
conv3x3(192, 192, stride=2))
self.branch1 = nn.Sequential(
conv1x1(1024, 256),
*conv7x7(256, 256, 320),
conv3x3(320, 320, stride=2))
self.branch2 = nn.MaxPool2d(3, stride=2)
class InceptionC(nn.Module):
def __init__(self):
super(InceptionC, self).__init__()
self.branch0 = conv1x1(1536, 256)
self.branch1_0 = conv1x1(1536, 384)
self.branch1_1a = Conv(384, 256, kernel_size=(1, 3), stride=1, padding=(0, 1))
self.branch1_1b = Conv(384, 256, kernel_size=(3, 1), stride=1, padding=(1, 0))
self.branch2_0 = nn.Sequential(
conv1x1(1536, 384),
Conv(384, 448, kernel_size=(3, 1), stride=1, padding=(1, 0)),
Conv(448, 512, kernel_size=(1, 3), stride=1, padding=(0, 1)))
self.branch2_1a = Conv(512, 256, kernel_size=(1, 3), stride=1, padding=(0, 1))
self.branch2_1b = Conv(512, 256, kernel_size=(3, 1), stride=1, padding=(1, 0))
self.branch3 = nn.Sequential(
nn.AvgPool2d(3, stride=1, padding=1, count_include_pad=False),
conv1x1(1536, 256))
def forward(self, x):
x0 = self.branch0(x)
x1_0 = self.branch1_0(x)
x1_1a = self.branch1_1a(x1_0)
x1_1b = self.branch1_1b(x1_0)
x2_0 = self.branch2_0(x)
x2_1a = self.branch2_1a(x2_0)
x2_1b = self.branch2_1b(x2_0)
x3 = self.branch3(x)
out = torch.cat((x0, x1_1a, x1_1b, x2_1a, x2_1b, x3), 1)
return out
class InceptionV4(nn.Module):
def __init__(self, num_classes):
super(InceptionV4, self).__init__()
features = [conv3x3(3, 32, stride=2), conv3x3(32, 32), conv3x3(32, 64, padding=1),
StemPart1(), StemPart2(), StemPart3()]
modules = {'IA': InceptionA, 'RA': ReductionA, 'IB': InceptionB,
'RB': ReductionB, 'IC': InceptionC}
num_modules = OrderedDict(IA=4, RA=1, IB=7, RB=1, IC=3)
for k, v in num_modules.items():
features.extend([modules[k]() for _ in range(v)])
self.features = nn.Sequential(*features)
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.classifier = nn.Linear(1536, num_classes)
def forward(self, x):
x = self.features(x)
x = self.avgpool(x)
x = torch.flatten(x, 1)
if self.training:
return self.classifier(x)
return x
Xception
《Xception: Deep Learning with Depthwise Separable Convolutions》 CVPR 2017
Xception由inception结构加上depthwise separable convlution,再加上残差网络结构改进而来。常规卷积是直接通过一个卷积核把空间信息和通道信息直接提取出来,结合了spatial dimensions和channels dimensions;Xception将两个步骤分开做的,把spatial correlations和corss-channel correlations充分解耦合,模型使用depthwise separable convolution来实现。depthwise separable convolution由depth-wise convolution和point-wise convolution串联实现。
- 创新点:使用depthwise separable convolution;引入残差结构。
import torch
import torch.nn as nn
from .utils import load_pretrained_weights
class SeparableConv2d(nn.Sequential):
def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False):
super(SeparableConv2d, self).__init__()
self.conv1 = nn.Conv2d(in_channels, in_channels, kernel_size, stride,
padding, groups=in_channels, bias=bias)
self.pointwise = nn.Conv2d(in_channels, out_channels, kernel_size=1, bias=bias)
class Block(nn.Module):
def __init__(self, in_filters, out_filters, num_layers, strides, relu_first=True, grow_first=True):
super(Block, self).__init__()
self.skip = None
if out_filters != in_filters or strides != 1:
self.skip = nn.Conv2d(in_filters, out_filters, kernel_size=1, stride=strides, bias=False)
self.skipbn = nn.BatchNorm2d(out_filters)
layers = []
filters = in_filters
if grow_first:
layers.append(nn.ReLU(inplace=True))
layers.append(SeparableConv2d(in_filters, out_filters))
layers.append(nn.BatchNorm2d(out_filters))
filters = out_filters
for i in range(num_layers - 1):
layers.append(nn.ReLU(inplace=True))
layers.append(SeparableConv2d(filters, filters))
layers.append(nn.BatchNorm2d(filters))
if not grow_first:
layers.append(nn.ReLU(inplace=True))
layers.append(SeparableConv2d(in_filters, out_filters))
layers.append(nn.BatchNorm2d(out_filters))
if not relu_first:
layers = layers[1:]
else:
layers[0] = nn.ReLU()
if strides != 1:
layers.append(nn.MaxPool2d(3, strides, 1))
self.rep = nn.Sequential(*layers)
def forward(self, inp):
skip = inp
if self.skip is not None:
skip = self.skip(skip)
skip = self.skipbn(skip)
x = self.rep(inp)
x += skip
return x
class Xception(nn.Module):
def __init__(self, num_classes):
super(Xception, self).__init__()
self.conv1 = nn.Conv2d(3, 32, kernel_size=3, stride=2, bias=False)
self.bn1 = nn.BatchNorm2d(32)
self.relu1 = nn.ReLU(inplace=True)
self.conv2 = nn.Conv2d(32, 64, kernel_size=3, bias=False)
self.bn2 = nn.BatchNorm2d(64)
self.relu2 = nn.ReLU(inplace=True)
self.block1 = Block(64, 128, 2, 2, relu_first=False)
self.block2 = Block(128, 256, 2, 2)
self.block3 = Block(256, 728, 2, 2)
for i in range(4, 12):
self.add_module('block%d' % i, Block(728, 728, 3, 1))
self.block12 = Block(728, 1024, 2, 2, grow_first=False)
self.conv3 = SeparableConv2d(1024, 1536)
self.bn3 = nn.BatchNorm2d(1536)
self.relu3 = nn.ReLU(inplace=True)
self.conv4 = SeparableConv2d(1536, 2048)
self.bn4 = nn.BatchNorm2d(2048)
self.relu4 = nn.ReLU(inplace=True)
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.classifier = nn.Linear(2048, num_classes)
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu1(x)
x = self.conv2(x)
x = self.bn2(x)
for i in range(1, 13):
x = self._modules['block%d' % i](x)
x = self.conv3(x)
x = self.bn3(x)
x = self.relu3(x)
x = self.conv4(x)
x = self.bn4(x)
x = self.relu4(x)
x = self.avgpool(x)
x = torch.flatten(x, 1)
if self.training:
return self.classifier(x)
return x
ResNeXt
《Aggregated Residual Transformations for Deep Neural Networks》 CVPR 2017
- 创新点:将ResNet中的BottleNeck Block的 3 × 3 3 \times 3 3×3卷积替换成分组卷积。
DenseNet
《Densely Connected Convolutional Networks 》 CVPR 2017
DenseNet 是一种具有密集连接的卷积神经网络。密集连接相当于每一层都直接连接input和loss,因此就可以减轻梯度消失现象,因此可以加深网络。在该网络中的一个Dense Block中,每一层的输入来自前面每一个Dense Layer的输出,使得任何两层之间都有直接的连接。各个Dense Block内的特征图的size统一,这样在做concatenation就不会有size的问题。为了避免网络变得很宽,卷积层的out_channels都比较小(比如32,64,96)。每个Dense Block的之间层称为Transition Layer,用以减小特征图的size和channel。
- 创新点:密集连接,缓解梯度消失问题,加强特征传播,鼓励特征复用,极大的减少了参数量。
def conv3x3(in_planes, out_planes):
return nn.Conv2d(in_planes, out_planes, kernel_size=3, padding=1, bias=False)
def conv1x1(in_planes, out_planes):
return nn.Conv2d(in_planes, out_planes, kernel_size=1, bias=False)
class _DenseLayer(nn.Sequential):
def __init__(self, inplanes, growth_rate, num_filter):
super(_DenseLayer, self).__init__()
self.add_module('norm1', nn.BatchNorm2d(inplanes)),
self.add_module('relu1', nn.ReLU(inplace=True)),
self.add_module('conv1', conv1x1(inplanes, num_filter * growth_rate)),
self.add_module('norm2', nn.BatchNorm2d(num_filter * growth_rate)),
self.add_module('relu2', nn.ReLU(inplace=True)),
self.add_module('conv2', conv3x3(num_filter * growth_rate, growth_rate)),
def forward(self, *prev_features):
concated_features = torch.cat(prev_features, 1)
bottleneck_output = self.conv1(self.relu1(self.norm1(concated_features)))
new_features = self.conv2(self.relu2(self.norm2(bottleneck_output)))
return new_features
class _DenseBlock(nn.Module):
def __init__(self, layer_num, inplanes, growth_rate):
super(_DenseBlock, self).__init__()
num_filter = 4
for i in range(layer_num):
layer = _DenseLayer(inplanes + i * growth_rate, growth_rate, num_filter)
self.add_module('denselayer%d' % (i + 1), layer)
def forward(self, init_features):
features = [init_features]
for name, layer in self.named_children():
new_features = layer(*features)
features.append(new_features)
return torch.cat(features, 1)
class _Transition(nn.Sequential):
def __init__(self, inplanes, outplanes):
super(_Transition, self).__init__()
self.add_module('norm', nn.BatchNorm2d(inplanes))
self.add_module('relu', nn.ReLU(inplace=True))
self.add_module('conv', conv1x1(inplanes, outplanes))
self.add_module('pool', nn.AvgPool2d(kernel_size=2, stride=2))
class DenseNet(nn.Module):
def __init__(self, num_classes, num_layers=(6, 12, 24, 16)):
super(DenseNet, self).__init__()
inplanes = 64
growth_rate = 32
# Initial convolution
self.features = nn.Sequential(OrderedDict([
('conv0', nn.Conv2d(3, inplanes, kernel_size=7, stride=2, padding=3, bias=False)),
('norm0', nn.BatchNorm2d(inplanes)),
('relu0', nn.ReLU(inplace=True)),
('pool0', nn.MaxPool2d(kernel_size=3, stride=2, padding=1)),
]))
# Add dense blocks
for i, layer_num in enumerate(num_layers):
block = _DenseBlock(layer_num, inplanes, growth_rate)
self.features.add_module('denseblock%d' % (i + 1), block)
inplanes = inplanes + layer_num * growth_rate
if i != len(num_layers) - 1:
trans = _Transition(inplanes, inplanes // 2)
self.features.add_module('transition%d' % (i + 1), trans)
inplanes //= 2
self.features.add_module('norm5', nn.BatchNorm2d(inplanes))
self.relu = nn.ReLU(inplace=True)
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.classifier = nn.Linear(inplanes, num_classes)
def forward(self, x):
features = self.features(x)
f = self.relu(features)
f = self.avgpool(f)
f = torch.flatten(f, 1)
if self.training:
return self.classifier(f)
return f
SENet
《Squeeze-and-Excitation Networks》 CVPR 2017
SENet的全称是Squeeze-and-Excitation Networks,主要由两部分组成:压缩部分用global average pooling将特征压缩到一维,相当于这一维参数获得了之前H*W全局的视野,感受区域更广;提取部分对每个通道的重要性进行预测,得到不同通道的重要性大小后再作用到之前的特征的对应通道上。SE模块的灵活性在于它可以直接应用现有的网络结构中,但增加SE模块后,模型参数以及计算量也会增加。
- 创新点:SE模块为各个通道分配权重,能动态对特征进行调整;适用于其他模型。
def conv3x3(in_planes, out_planes, stride=1, groups=1):
return nn.Conv2d(in_planes, out_planes, kernel_size=3,
stride=stride, padding=1, groups=groups, bias=False)
def conv1x1(in_planes, out_planes, stride=1, bias=False):
return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=bias)
class SEModule(nn.Module):
def __init__(self, channels, reduction):
super(SEModule, self).__init__()
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.fc1 = conv1x1(channels, channels // reduction, bias=True)
self.relu = nn.ReLU(inplace=True)
self.fc2 = conv1x1(channels // reduction, channels, bias=True)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
module_input = x
x = self.avg_pool(x)
x = self.fc1(x)
x = self.relu(x)
x = self.fc2(x)
x = self.sigmoid(x)
return module_input * x
class SEBottleneck(nn.Module):
expansion = 4
def __init__(self, inplanes, planes, stride=1, groups=64, reduction=16, downsample=None):
super(SEBottleneck, self).__init__()
self.conv1 = conv1x1(inplanes, planes * 2)
self.bn1 = nn.BatchNorm2d(planes * 2)
self.conv2 = conv3x3(planes * 2, planes * 4, stride=stride, groups=groups)
self.bn2 = nn.BatchNorm2d(planes * 4)
self.conv3 = conv1x1(planes * 4, planes * 4)
self.bn3 = nn.BatchNorm2d(planes * 4)
self.relu = nn.ReLU(inplace=True)
self.se_module = SEModule(planes * 4, reduction=reduction)
self.downsample = downsample
self.stride = stride
def forward(self, x):
residual = x
if self.downsample is not None:
residual = self.downsample(x)
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
out = self.se_module(out) + residual
return self.relu(out)
class SENet(nn.Module):
def __init__(self, num_classes, num_blocks=(3, 8, 36, 3), dropout_p=0.2):
super(SENet, self).__init__()
self.inplanes = 128
self.block = SEBottleneck
self.layer0 = nn.Sequential(OrderedDict([
('conv1', conv3x3(3, 64, stride=2)),
('bn1', nn.BatchNorm2d(64)),
('relu1', nn.ReLU(inplace=True)),
('conv2', conv3x3(64, 64)),
('bn2', nn.BatchNorm2d(64)),
('relu2', nn.ReLU(inplace=True)),
('conv3', conv3x3(64, self.inplanes, 3)),
('bn3', nn.BatchNorm2d(self.inplanes)),
('relu3', nn.ReLU(inplace=True)),
('pool', nn.MaxPool2d(3, stride=2, ceil_mode=True))]))
self.layer1 = self._make_layer(64, num_blocks[0], downsample_config=(1, 0))
self.layer2 = self._make_layer(128, num_blocks[1], stride=2)
self.layer3 = self._make_layer(256, num_blocks[2], stride=2)
self.layer4 = self._make_layer(512, num_blocks[3], stride=2)
self.avg_pool = nn.AdaptiveAvgPool2d((1, 1))
self.dropout = nn.Dropout(dropout_p) if dropout_p is not None else None
self.classifier = nn.Linear(512 * self.block.expansion, num_classes)
def _make_layer(self, planes, num_blocks, stride=1, downsample_config=(3, 1)):
block = self.block
downsample = None
if stride != 1 or self.inplanes != planes * block.expansion:
kernel_size, padding = downsample_config
downsample = nn.Sequential(
nn.Conv2d(self.inplanes, planes * block.expansion, kernel_size, stride, padding, bias=False),
nn.BatchNorm2d(planes * block.expansion))
layers = [block(self.inplanes, planes, stride=stride, downsample=downsample)]
self.inplanes = planes * block.expansion
for i in range(1, num_blocks):
layers.append(block(self.inplanes, planes))
return nn.Sequential(*layers)
def forward(self, x):
x = self.layer0(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.avg_pool(x)
if self.dropout is not None:
x = self.dropout(x)
x = torch.flatten(x, 1)
if self.training:
return self.classifier(x)
return x
MobileNet
《MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications》CVPR 2017
《MobileNetV2: Inverted Residuals and Linear Bottlenecks》 CVPR 2018
《Searching for MobileNetV3》 2019
MobileNet v1
- 将3x3卷积替换成深度可分离卷积,大大减少了模型的参数。
- 采用ReLU6,在低精度计算下有更强的鲁棒性
- 可使用宽度因子缩减输入输出通道数,得到更小的模型。使用分辨率因子控制输入的分辨率,减少计算量。
MobileNet v2
- ReLU在低维空间运算中会损失很多信息,所以把1x1卷积后的ReLU替换成线性激活函数。
- 相比于residual block的先降通道数后升通道数来减少计算量,Inverted residual先扩大通道数再进行卷积,最后再缩小回原来的通道数。这样能在中间层学到更多的特征。
MobileNet v3
- 引入SE模块
- 使用新的激活函数h-Swish
- 对于网络首尾计算量大的部分进行调整缩减
OSNet
《Omni-Scale Feature Learning for Person Re-Identification》 ICCV 2019
OSNet使用了由多种卷积特征流组成的残差块,每个残差块检测一定尺度的特征。在一个残差块中根据输入来给不同特征流分配权重,以动态融合多尺度特征。使用了点卷积和深度卷积串联组成的深度可分离卷积,减少参数的数量。
- 创新点:使用类似inception的多卷积并行,使不同的流具有不同的感受野;使用统一的聚合门进行特征融合,有助于学习多尺度特性的动态组合;使用分离卷积,网络是轻量级的。
class Conv1x1(nn.Sequential):
def __init__(self, in_channels, out_channels, act=True):
super(Conv1x1, self).__init__()
self.conv = nn.Conv2d(in_channels, out_channels, 1, bias=False)
self.bn = nn.BatchNorm2d(out_channels)
self.relu = nn.ReLU(inplace=True) if act else nn.Identity()
class LightConv3x3(nn.Sequential):
def __init__(self, in_channels, out_channels):
super(LightConv3x3, self).__init__()
self.conv1 = nn.Conv2d(in_channels, out_channels, 1, bias=False)
self.conv2 = nn.Conv2d(out_channels, out_channels, 3, padding=1, bias=False, groups=out_channels)
self.bn = nn.BatchNorm2d(out_channels)
self.relu = nn.ReLU(inplace=True)
class ChannelGate(nn.Module):
def __init__(self, in_channels, reduction=16):
super(ChannelGate, self).__init__()
self.global_avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.fc1 = nn.Conv2d(in_channels, in_channels // reduction, 1)
self.relu = nn.ReLU(inplace=True)
self.fc2 = nn.Conv2d(in_channels // reduction, in_channels, 1)
self.gate_activation = nn.Sigmoid()
def forward(self, x):
identity = x
x = self.global_avgpool(x)
x = self.fc1(x)
x = self.relu(x)
x = self.fc2(x)
x = self.gate_activation(x)
return identity * x
class OSBlock(nn.Module):
def __init__(self, in_channels, out_channels, bottleneck_reduction=4):
super(OSBlock, self).__init__()
mid_channels = out_channels // bottleneck_reduction
self.conv1 = Conv1x1(in_channels, mid_channels)
self.conv2a = LightConv3x3(mid_channels, mid_channels)
self.conv2b = nn.Sequential(
LightConv3x3(mid_channels, mid_channels),
LightConv3x3(mid_channels, mid_channels))
self.conv2c = nn.Sequential(
LightConv3x3(mid_channels, mid_channels),
LightConv3x3(mid_channels, mid_channels),
LightConv3x3(mid_channels, mid_channels))
self.conv2d = nn.Sequential(
LightConv3x3(mid_channels, mid_channels),
LightConv3x3(mid_channels, mid_channels),
LightConv3x3(mid_channels, mid_channels),
LightConv3x3(mid_channels, mid_channels))
self.gate = ChannelGate(mid_channels)
self.conv3 = Conv1x1(mid_channels, out_channels, act=False)
self.downsample = None
if in_channels != out_channels:
self.downsample = Conv1x1(in_channels, out_channels, act=False)
self.relu = nn.ReLU(inplace=True)
def forward(self, x):
identity = x
if self.downsample is not None:
identity = self.downsample(identity)
x1 = self.conv1(x)
x2a = self.conv2a(x1)
x2b = self.conv2b(x1)
x2c = self.conv2c(x1)
x2d = self.conv2d(x1)
x2 = self.gate(x2a) + self.gate(x2b) + self.gate(x2c) + self.gate(x2d)
x3 = self.conv3(x2)
out = self.relu(x3 + identity)
return out
class OSNet(nn.Module):
def __init__(self, num_classes, layers=(2, 2, 2)):
super(OSNet, self).__init__()
self.conv = nn.Sequential(
nn.Conv2d(3, 64, 7, stride=2, padding=3, bias=False),
nn.BatchNorm2d(64),
nn.ReLU(inplace=True),
nn.MaxPool2d(3, stride=2, padding=1))
self.layer1 = self._make_layer(layers[0], 64, 256)
self.layer2 = self._make_layer(layers[1], 256, 384)
self.layer3 = self._make_layer(layers[2], 384, 512, downsample=False)
self.conv1 = Conv1x1(512, 512)
self.global_avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.refactor = nn.Sequential(
nn.Linear(512, 512),
nn.BatchNorm1d(512),
nn.ReLU(inplace=True)
)
self.classifier = nn.Linear(512, num_classes)
@staticmethod
def _make_layer(num_layers, in_channels, out_channels, downsample=True):
layers = [OSBlock(in_channels, out_channels)]
layers.extend([OSBlock(out_channels, out_channels) for _ in range(num_layers-1)])
if downsample:
layers.append(nn.Sequential(
Conv1x1(out_channels, out_channels),
nn.AvgPool2d(2, stride=2)))
return nn.Sequential(*layers)
def forward(self, x):
x = self.conv(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.conv1(x)
v = self.global_avgpool(x)
v = torch.flatten(v, 1)
f = self.refactor(v)
if self.training:
return self.classifier(f)
return f
HRNet
《High-Resolution Representations for Labeling Pixels and Regions》 CVPR 2019
- 若干条并行的分支,每条分支上包含4个残差单元。在整个过程中保持高分辨率的特征表示,逐步增加High-to-Low的子网,将多分辨率的子网并行连接。
- 在并行的多分辨率子网之间反复交换信息,进行多尺度融合,高分辨率特征与低分辨率特征之间相互增强。
- 4张不同分辨率特征图经过bottleneck层,通道数翻倍后,从高分辨率图依次经过strided convolution与低分辨率图进行元素加操作,在经过1*1卷积使通道翻倍(1024->2048),全局平均池化后送入分类器。
EfficientNet
《EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks》 ICML 2019
通常我们可以将CNN模型结构放大以便获得更好的精度,通用的几种方法是放大CNN的深度、宽度和分辨率,但之前的方法都是单独放大这三个维度中的一个。EfficientNet使用一组固定的缩放系数统一缩放网络深度、宽度和分辨率。EfficientNet的v0到v7版本基本结构不变,就是改变了MBConv模块的个数以及卷积的输入输出通道数。
- 创新点:优化了结构,模型参数少,在ImageNet上性能靠前;通过固定的缩放系数使模型有进一步优化的能力。
class SwishImplementation(torch.autograd.Function):
@staticmethod
def forward(ctx, i):
result = i * torch.sigmoid(i)
ctx.save_for_backward(i)
return result
@staticmethod
def backward(ctx, grad_output):
i = ctx.saved_tensors[0]
sigmoid_i = torch.sigmoid(i)
return grad_output * (sigmoid_i * (1 + i * (1 - sigmoid_i)))
class MemoryEfficientSwish(nn.Module):
def forward(self, x):
return SwishImplementation.apply(x)
def depthconv(in_planes, out_planes, kernel_size, stride, padding, groups):
return nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size,
stride=stride, padding=padding, groups=groups, bias=False)
def conv3x3(in_planes, out_planes, stride=1):
return nn.Conv2d(in_planes, out_planes, kernel_size=3,
stride=stride, padding=1, bias=False)
def conv1x1(in_planes, out_planes, stride=1, bias=False):
return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=bias)
class MBConvBlock(nn.Module):
def __init__(self, inplanes, outplanes, kernel_size, stride, droprate, expand_ratio=1, se_ratio=0.25):
super(MBConvBlock, self).__init__()
self.expand_ratio = expand_ratio
self.has_se = 0 < se_ratio <= 1
self.residual = (stride == 1 and inplanes == outplanes)
midplanes = int(inplanes * expand_ratio)
if expand_ratio != 1:
self._expand_conv = conv1x1(inplanes, midplanes)
self._bn0 = nn.BatchNorm2d(midplanes)
self._depthwise_conv = depthconv(midplanes, midplanes, kernel_size, stride, groups=midplanes,
padding=self.cal_pad(kernel_size, stride))
self._bn1 = nn.BatchNorm2d(midplanes)
if self.has_se:
self._se_reduce = conv1x1(midplanes, int(inplanes * se_ratio), bias=True)
self._se_expand = conv1x1(int(inplanes * se_ratio), midplanes, bias=True)
self.avg_pool = nn.AdaptiveAvgPool2d((1, 1))
self._project_conv = conv1x1(midplanes, outplanes)
self._bn2 = nn.BatchNorm2d(outplanes)
self._swish = MemoryEfficientSwish()
self.dropout = nn.Dropout(droprate)
@staticmethod
def cal_pad(kernel_size, stride):
return math.ceil((kernel_size - stride) / 2)
def forward(self, x, droprate=None):
identity = x
if self.expand_ratio != 1:
x = self._expand_conv(x)
x = self._bn0(x)
x = self._swish(x)
x = self._depthwise_conv(x)
x = self._bn1(x)
x = self._swish(x)
if self.has_se:
x_squeezed = self.avg_pool(x)
x_squeezed = self._se_reduce(x_squeezed)
x_squeezed = self._swish(x_squeezed)
x_squeezed = self._se_expand(x_squeezed)
x = torch.sigmoid(x_squeezed) * x
x = self._project_conv(x)
x = self._bn2(x)
if self.residual:
x = self.dropout(x)
x = x + identity
return x
class EfficientNet(nn.Module):
def __init__(self, num_classes, num_blocks, planes, feat_dim, droprate=0.4):
super(EfficientNet, self).__init__()
kernels = (3, 3, 5, 3, 5, 5, 3)
strides = (1, 2, 2, 2, 1, 2, 1)
self.block_count = 0
self.residual_droprate = 0.2 / sum(num_blocks)
self._conv_stem = conv3x3(3, planes[0], stride=2)
self._bn0 = nn.BatchNorm2d(planes[0])
self._blocks = nn.ModuleList([])
for i in range(len(num_blocks)):
expand_ratio = 1 if i == 0 else 6
self._make_layer(planes[i], planes[i+1], num_blocks[i], kernels[i], strides[i], expand_ratio)
self._conv_head = conv1x1(planes[-1], feat_dim)
self._bn1 = nn.BatchNorm2d(feat_dim)
self._avg_pooling = nn.AdaptiveAvgPool2d(1)
self._dropout = nn.Dropout(droprate)
self.classifier = nn.Linear(feat_dim, num_classes)
self._swish = MemoryEfficientSwish()
def _get_dropout(self):
droprate = self.residual_droprate * float(self.block_count)
self.block_count += 1
return droprate
def _make_layer(self, inplanes, outplanes, num_blocks, kernel_size, stride, expand_ratio=6):
self._blocks.append(MBConvBlock(inplanes, outplanes, kernel_size, stride, self._get_dropout(), expand_ratio))
for _ in range(num_blocks - 1):
self._blocks.append(MBConvBlock(outplanes, outplanes, kernel_size, 1, self._get_dropout(), expand_ratio))
def forward(self, x):
x = self._swish(self._bn0(self._conv_stem(x)))
for block in self._blocks:
x = block(x)
x = self._swish(self._bn1(self._conv_head(x)))
x = self._avg_pooling(x)
x = x.flatten(start_dim=1)
x = self._dropout(x)
if self.training:
return self.classifier(x)
return x
性能对比
数据集:Market1501
- batchsize=64; loss function=CrossEntropyLoss()
- transform:Resize((256, 128)); RandomHorizontalFlip(); RandomCrop((256, 128)); ToTensor(); Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- pretrain classifier: optimizer=Adam; lr=5e-4; weight_decay=5e-4; epoch=10
- train: optimizer=Adam; lr=5e-4; weight_decay=5e-4; epoch=150; StepLR=50; checkpoints=10
- Evaluation protocol(Rank 1 Rank 5 Rank 10 mAP)
Model | performance | Params(M) | MACs(G) |
---|---|---|---|
AlexNet | 67.69% 84.04% 88.82% 44.52% | 57.00 | 0.47 |
VGG16 | 70.90% 85.52% 90.07% 48.40% | 134.26 | 10.15 |
Inceptionv3 | 79.67% 92.21% 94.77% 60.57% | 21.79 | 1.68 |
Inceptionv4 | 81.57% 92.39% 95.10% 62.86% | 41.14 | 3.62 |
Inception-Resnetv2 | 80.17% 92.09% 94.89% 62.35% | 54.31 | 3.78 |
Xception | 84.42% 93.61% 95.84% 66.15% | 20.81 | 2.98 |
ResNet50 | 86.65% 94.74% 96.61% 70.32% | 23.51 | 2.68 |
ResNeXt50-32x4d | 86.89% 94.47% 96.02% 71.35% | 22.98 | 2.78 |
Wide-ResNet50 | 85.52% 93.85% 96.05% 69.17% | 66.83 | 7.46 |
DenseNet169 | 85.08% 93.58% 95.57% 68.05% | 12.48 | 2.22 |
SEResNet50 | 87.10% 94.86% 96.61% 71.42% | 26.04 | 2.69 |
SENet154 | 82.25% 92.69% 95.51% 63.67% | 113.04 | 2.18 |
Mobilenetv2-100 | 78.30% 91.68% 94.32% 61.37% | 2.22 | 0.53 |
Mobilenetv3-small | 79.58% 91.32% 94.29% 59.52% | 1.52 | 0.039 |
Efficientnet-B4 | 80.44% 92.12% 95.04% 60.67% | 17.55 | 1.01 |
Efficientnetv2-S | 86.83% 94.77% 96.70% 70.15% | 20.18 | 1.88 |
Regnetx-032 | 86.53% 94.08% 95.96% 70.58% | 14.29 | 2.09 |
Regnety-032 | 85.14% 93.70% 95.99% 69.49% | 17.92 | 2.09 |
HRNet-w32 | 86.15% 93.79% 96.08% 70.84% | 39.18 | 5.85 |
OSNet | 91.23% 96.49% 97.77% 79.50% | 2.17 | 0.997 |
ConvNeXt-Tiny | 79.19% 91.41% 94.50% 57.72% | 27.80 | 2.91 |
Model | performance | Params(M) | MACs(G) |
---|---|---|---|
ViT-B16(256,128) | 78.12% 91.57% 95.04% 56.27% | 58.04 | 29.52 |
DeiT-T16(256,128) | 84.44% 94.21% 96.53% 64.64% | 5.48 | 0.704 |
DeiT-S16(256,128) | 85.24% 94.33% 96.64% 66.99% | 21.57 | 2.78 |
DeiT-B16(256,128) | 79.31% 91.03% 94.63% 56.96% | 85.61 | 11.03 |
DeiT-T16-Distiled-Hard(256,128) | 85.48% 94.42% 96.56% 66.40% | 5.48 | 0.704 |
DeiT-S16-Distiled-Hard(256,128) | 86.10% 95.34% 96.94% 67.71% | 21.57 | 2.78 |
DeiT-B16-Distiled-Hard(256,128) | 81.00% 91.98% 94.51% 58.76% | 85.61 | 11.03 |
DeiT-T16-Distiled-Soft(256,128) | 83.58% 93.76% 96.29% 63.16% | 5.48 | 0.704 |
DeiT-S16-Distiled-Soft(256,128) | 80.79% 92.73% 95.31% 60.96% | 21.57 | 2.78 |
DeiT-B16-Distiled-Soft(256,128) | 76.25% 89.99% 93.71% 53.13% | 85.61 | 11.03 |
VT-ResNet50-1F1R(256,128) | 79.04% 91.30% 94.48% 57.61% | 23.24 | 4.10 |
VT-ResNet50-1F2R(256,128) | 82.87% 93.56% 95.78% 63.35% | 32.70 | 4.68 |
TNT-S(256,128) | 79.72% 92.13% 95.16% 58.21% | 23.27 | 3.16 |
没有细调各个模型都用了同一的超参数,仅作参考。