《MobileNets-v2原始论文:Mobilenetv2: Inverted residuals and linear bottlenecks》
GitHub:tonylins/pytorch-mobilenet-v2
GitHub:d-li14/mobilenetv2.pytorch
相较MobelNets-v1,准确率更高,模型更小;
MobileNet v1的特色就是深度可分离卷积,但研究人员发现深度可分离卷积中有大量卷积核为0,即有很多卷积核没有参与实际计算。是什么原因造成的呢?v2的作者发现是ReLU激活函数的问题,认为 ReLU这个激活函数,在低维空间运算中会损失很多信息,而在高维空间中会保留较多有用信息 。
既然如此,v2的解决方案也很简单,就是直接将ReLU6激活换成线性激活函数,当然也不是全都换,只是将最后一层的ReLU换成线性函数。具体到v2网络中就是将最后的Point-Wise卷积的ReLU6都换成线性函数。v2给这个操作命名为linear bottleneck,这也是v2网络的第一个关键点。
深度卷积(Depth-Wise)本身没有改变通道的作用,比如本文前例中的深度可分离卷积的例子,在前一半的深度卷积操作中,输入是3个通道,输出还是3个通道。所以为了能让深度卷积能在高维上工作,v2提出在深度卷积之前加一个扩充通道的卷积操作,什么操作能给通道升维呢?当然是1*1卷积了。
这种在深度卷积之前扩充通道的操作在v2中被称作Expansion layer。这也是v2网络的第二个关键点。
MobileNet v1虽然加了深度可分离卷积,但网络主体仍然是VGG的直筒型结构。所以v2网络的第三个大的关键点就是借鉴了ResNet的残差结构,在v1网络结构基础上加入了跳跃连接。相较于ResNet的残差块结构,v2给这种结构命名为Inverted resdiual block,即倒残差块。与ResNet相比,我们来仔细看一下“倒”在哪。
从图中可以看到,ResNet是先0.25倍降维,然后标准3*3卷积,再升维,而MobileNet v2则是先6倍升维,然后深度可分离卷积,最后再降维。更形象一点我们可以这么画:
MobileNet v2的维度升降顺序跟ResNet完全相反,所以才叫倒残差。
综合上述三个关键点:Linear Bottlenecks、Expansion layer和Inverted resdiual之后就组成了MobileNet v2的block,如下图所示。
MobileNet v2的网络结构如下图所示。
可以看到,输入经过一个常规卷积之后,v2网络紧接着加了7个bottleneck block层,然后再两个11卷积和一个77的平均池化的组合操作。
一、经典残差结构
每层卷积层的输入维度、输出维度
- 输入:三维数据,(宽 w i n w_{in} win×高 h i n h_{in} hin×深 d i n d_{in} din)
- 每层卷积层的参数:
- 感受野(receptive field)的大小 f f f
- 过滤器(Filter)的数量(决定输出单元的深度) k k k
- 步幅(Stride) s s s
- 补零(zero-padding)的数量 p p p
- 输出:三维单元,(宽
w
o
u
t
w_{out}
wout×高
h
o
u
t
h_{out}
hout×深
d
o
u
t
d_{out}
dout),其中各维度大小为:
- w o u t = w i n − f + 2 p s + 1 w_{out}=\cfrac{w_{in}-f+2p}{s}+1 wout=swin−f+2p+1
- h o u t = h i n − f + 2 p s + 1 h_{out}=\cfrac{h_{in}-f+2p}{s}+1 hout=shin−f+2p+1
- d o u t = k d_{out}=k dout=k
二、MobelNets-v2结构
- t t t:扩展因子;
- c c c:输出特征矩阵的通道数,上图中的 k ′ k^{'} k′;
- n n n:Bottleneck(指的是Inverted Bottleneck)重复的次数;
- s s s:每一个sequence的第一层所采用的步距,该sequence的其他层都采用1;
- Each line describes a sequence of 1 or more identical (modulo stride) layers, repeated n times;
- All layers in the same sequence have the same number c of output channels;
- The first layer of each sequence has a stride s s s and all others use stride 1;
- All spatial convolutions use 3 × 3 kernels;
- The expansion factor t t t is always applied to the input size;
1、Inverted Residuals(倒残差结构)【中间胖两头瘦】【激活函数:ReLU6】
在MobileNetV2中的Inverted Residuals正好与ResNet的bottleneck residual block相反,其结构形状是中间胖两头窄。即:
- 在可分离卷积的前面增加一个大小为1*1的卷积进行升维(Expansion layer)【用1×1核卷积(增加通道数来升维)、3×3核卷积(不变)、用1×1核降维】;
- 将输入和输出的部分进行连接(residual connection), 如下图所示(Inverted Residuals(中间大两头小))。
- 激活函数采用ReLU6; y = R e L U 6 ( x ) = m i n ( m a x ( x , 0 ) , 6 ) y=ReLU6(x)=min(max(x,0),6) y=ReLU6(x)=min(max(x,0),6)
2、Linear Bottlenck结构
由于DW、PW都是以Relu作为激活函数,且PW会做降维,再对低维特征做ReLU时会丢失很多信息,所以:
- 从高维向低维转换,使用ReLU激活函数可能会造成信息丢失或破坏(所以不使用非线性激活数函数),即在PW这一部分(倒残差结构的最后一个1×1卷积层),我们不再使用ReLU激活函数而是使用线性激活函数,如下图。
三、性能对比
1、分类任务
2、目标检测
四、MobelNets-v2代码
import torch.nn as nn
import math
def conv_bn(inp, oup, stride):
return nn.Sequential(
nn.Conv2d(inp, oup, 3, stride, 1, bias=False),
nn.BatchNorm2d(oup),
nn.ReLU6(inplace=True)
)
def conv_1x1_bn(inp, oup):
return nn.Sequential(
nn.Conv2d(inp, oup, 1, 1, 0, bias=False),
nn.BatchNorm2d(oup),
nn.ReLU6(inplace=True)
)
def make_divisible(x, divisible_by=8):
import numpy as np
return int(np.ceil(x * 1. / divisible_by) * divisible_by)
class InvertedResidual(nn.Module):
def __init__(self, inp, oup, stride, expand_ratio):
super(InvertedResidual, self).__init__()
self.stride = stride
assert stride in [1, 2]
hidden_dim = int(inp * expand_ratio)
self.use_res_connect = self.stride == 1 and inp == oup
if expand_ratio == 1:
self.conv = nn.Sequential(
# dw
nn.Conv2d(hidden_dim, hidden_dim, 3, stride, 1, groups=hidden_dim, bias=False),
nn.BatchNorm2d(hidden_dim),
nn.ReLU6(inplace=True),
# pw-linear
nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
nn.BatchNorm2d(oup),
)
else:
self.conv = nn.Sequential(
# pw
nn.Conv2d(inp, hidden_dim, 1, 1, 0, bias=False),
nn.BatchNorm2d(hidden_dim),
nn.ReLU6(inplace=True),
# dw
nn.Conv2d(hidden_dim, hidden_dim, 3, stride, 1, groups=hidden_dim, bias=False),
nn.BatchNorm2d(hidden_dim),
nn.ReLU6(inplace=True),
# pw-linear
nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
nn.BatchNorm2d(oup),
)
def forward(self, x):
if self.use_res_connect:
return x + self.conv(x)
else:
return self.conv(x)
class MobileNetV2(nn.Module):
def __init__(self, n_class=1000, input_size=224, width_mult=1.):
super(MobileNetV2, self).__init__()
block = InvertedResidual
input_channel = 32
last_channel = 1280
interverted_residual_setting = [
# t, c, n, s
[1, 16, 1, 1],
[6, 24, 2, 2],
[6, 32, 3, 2],
[6, 64, 4, 2],
[6, 96, 3, 1],
[6, 160, 3, 2],
[6, 320, 1, 1],
]
# building first layer
assert input_size % 32 == 0
# input_channel = make_divisible(input_channel * width_mult) # first channel is always 32!
self.last_channel = make_divisible(last_channel * width_mult) if width_mult > 1.0 else last_channel
self.features = [conv_bn(3, input_channel, 2)]
# building inverted residual blocks
for t, c, n, s in interverted_residual_setting:
output_channel = make_divisible(c * width_mult) if t > 1 else c
for i in range(n):
if i == 0:
self.features.append(block(input_channel, output_channel, s, expand_ratio=t))
else:
self.features.append(block(input_channel, output_channel, 1, expand_ratio=t))
input_channel = output_channel
# building last several layers
self.features.append(conv_1x1_bn(input_channel, self.last_channel))
# make it nn.Sequential
self.features = nn.Sequential(*self.features)
# building classifier
self.classifier = nn.Linear(self.last_channel, n_class)
self._initialize_weights()
def forward(self, x):
x = self.features(x)
x = x.mean(3).mean(2)
x = self.classifier(x)
return x
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, math.sqrt(2. / n))
if m.bias is not None:
m.bias.data.zero_()
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
elif isinstance(m, nn.Linear):
n = m.weight.size(1)
m.weight.data.normal_(0, 0.01)
m.bias.data.zero_()
def mobilenet_v2(pretrained=True):
model = MobileNetV2(width_mult=1)
if pretrained:
try:
from torch.hub import load_state_dict_from_url
except ImportError:
from torch.utils.model_zoo import load_url as load_state_dict_from_url
state_dict = load_state_dict_from_url(
'https://www.dropbox.com/s/47tyzpofuuyyv1b/mobilenetv2_1.0-f2a8633.pth.tar?dl=1', progress=True)
model.load_state_dict(state_dict)
return model
if __name__ == '__main__':
net = mobilenet_v2(True)
"""
Creates a MobileNetV2 Model as defined in:
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen. (2018).
MobileNetV2: Inverted Residuals and Linear Bottlenecks
arXiv preprint arXiv:1801.04381.
import from https://github.com/tonylins/pytorch-mobilenet-v2
"""
import torch.nn as nn
import math
__all__ = ['mobilenetv2']
def _make_divisible(v, divisor, min_value=None):
"""
This function is taken from the original tf repo.
It ensures that all layers have a channel number that is divisible by 8
It can be seen here:
https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
:param v:
:param divisor:
:param min_value:
:return:
"""
if min_value is None:
min_value = divisor
new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
# Make sure that round down does not go down by more than 10%.
if new_v < 0.9 * v:
new_v += divisor
return new_v
def conv_3x3_bn(inp, oup, stride):
return nn.Sequential(
nn.Conv2d(inp, oup, 3, stride, 1, bias=False),
nn.BatchNorm2d(oup),
nn.ReLU6(inplace=True)
)
def conv_1x1_bn(inp, oup):
return nn.Sequential(
nn.Conv2d(inp, oup, 1, 1, 0, bias=False),
nn.BatchNorm2d(oup),
nn.ReLU6(inplace=True)
)
class InvertedResidual(nn.Module):
def __init__(self, inp, oup, stride, expand_ratio):
super(InvertedResidual, self).__init__()
assert stride in [1, 2]
hidden_dim = round(inp * expand_ratio)
self.identity = stride == 1 and inp == oup
if expand_ratio == 1:
self.conv = nn.Sequential(
# dw
nn.Conv2d(hidden_dim, hidden_dim, 3, stride, 1, groups=hidden_dim, bias=False),
nn.BatchNorm2d(hidden_dim),
nn.ReLU6(inplace=True),
# pw-linear
nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
nn.BatchNorm2d(oup),
)
else:
self.conv = nn.Sequential(
# pw
nn.Conv2d(inp, hidden_dim, 1, 1, 0, bias=False),
nn.BatchNorm2d(hidden_dim),
nn.ReLU6(inplace=True),
# dw
nn.Conv2d(hidden_dim, hidden_dim, 3, stride, 1, groups=hidden_dim, bias=False),
nn.BatchNorm2d(hidden_dim),
nn.ReLU6(inplace=True),
# pw-linear
nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
nn.BatchNorm2d(oup),
)
def forward(self, x):
if self.identity:
return x + self.conv(x)
else:
return self.conv(x)
class MobileNetV2(nn.Module):
def __init__(self, num_classes=1000, width_mult=1.):
super(MobileNetV2, self).__init__()
# setting of inverted residual blocks
self.cfgs = [
# t, c, n, s
[1, 16, 1, 1],
[6, 24, 2, 2],
[6, 32, 3, 2],
[6, 64, 4, 2],
[6, 96, 3, 1],
[6, 160, 3, 2],
[6, 320, 1, 1],
]
# building first layer
input_channel = _make_divisible(32 * width_mult, 4 if width_mult == 0.1 else 8)
layers = [conv_3x3_bn(3, input_channel, 2)]
# building inverted residual blocks
block = InvertedResidual
for t, c, n, s in self.cfgs:
output_channel = _make_divisible(c * width_mult, 4 if width_mult == 0.1 else 8)
for i in range(n):
layers.append(block(input_channel, output_channel, s if i == 0 else 1, t))
input_channel = output_channel
self.features = nn.Sequential(*layers)
# building last several layers
output_channel = _make_divisible(1280 * width_mult, 4 if width_mult == 0.1 else 8) if width_mult > 1.0 else 1280
self.conv = conv_1x1_bn(input_channel, output_channel)
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.classifier = nn.Linear(output_channel, num_classes)
self._initialize_weights()
def forward(self, x):
x = self.features(x)
x = self.conv(x)
x = self.avgpool(x)
x = x.view(x.size(0), -1)
x = self.classifier(x)
return x
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, math.sqrt(2. / n))
if m.bias is not None:
m.bias.data.zero_()
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
elif isinstance(m, nn.Linear):
m.weight.data.normal_(0, 0.01)
m.bias.data.zero_()
def mobilenetv2(**kwargs):
"""
Constructs a MobileNet V2 model
"""
return MobileNetV2(**kwargs)
参考资料:
迈微精选 | 轻量化CNN网络MobileNet系列详解
轻量化网络——MobileNet
深度学习在图像处理中的应用(tensorflow2.4以及pytorch1.10实现)
轻量级网络-Mobilenet系列(v1,v2,v3)
倒残差与线性瓶颈浅析 - MobileNetV2