移动端/嵌入式-CV模型-2018:MobelNets-v2【Inverted Residuals(中间胖两头瘦)、Linear Bottlenecks(每个倒残差的最后一个卷积层使用线性激活函数)】

《MobileNets-v2原始论文:Mobilenetv2: Inverted residuals and linear bottlenecks》

GitHub:tonylins/pytorch-mobilenet-v2

GitHub:d-li14/mobilenetv2.pytorch

相较MobelNets-v1,准确率更高,模型更小;

MobileNet v1的特色就是深度可分离卷积,但研究人员发现深度可分离卷积中有大量卷积核为0,即有很多卷积核没有参与实际计算。是什么原因造成的呢?v2的作者发现是ReLU激活函数的问题,认为 ReLU这个激活函数,在低维空间运算中会损失很多信息,而在高维空间中会保留较多有用信息

既然如此,v2的解决方案也很简单,就是直接将ReLU6激活换成线性激活函数,当然也不是全都换,只是将最后一层的ReLU换成线性函数。具体到v2网络中就是将最后的Point-Wise卷积的ReLU6都换成线性函数。v2给这个操作命名为linear bottleneck,这也是v2网络的第一个关键点。
在这里插入图片描述
深度卷积(Depth-Wise)本身没有改变通道的作用,比如本文前例中的深度可分离卷积的例子,在前一半的深度卷积操作中,输入是3个通道,输出还是3个通道。所以为了能让深度卷积能在高维上工作,v2提出在深度卷积之前加一个扩充通道的卷积操作,什么操作能给通道升维呢?当然是1*1卷积了。
在这里插入图片描述
这种在深度卷积之前扩充通道的操作在v2中被称作Expansion layer。这也是v2网络的第二个关键点。

MobileNet v1虽然加了深度可分离卷积,但网络主体仍然是VGG的直筒型结构。所以v2网络的第三个大的关键点就是借鉴了ResNet的残差结构,在v1网络结构基础上加入了跳跃连接。相较于ResNet的残差块结构,v2给这种结构命名为Inverted resdiual block,即倒残差块。与ResNet相比,我们来仔细看一下“倒”在哪。
在这里插入图片描述
从图中可以看到,ResNet是先0.25倍降维,然后标准3*3卷积,再升维,而MobileNet v2则是先6倍升维,然后深度可分离卷积,最后再降维。更形象一点我们可以这么画:
在这里插入图片描述

MobileNet v2的维度升降顺序跟ResNet完全相反,所以才叫倒残差。

综合上述三个关键点:Linear Bottlenecks、Expansion layer和Inverted resdiual之后就组成了MobileNet v2的block,如下图所示。
在这里插入图片描述
MobileNet v2的网络结构如下图所示。

在这里插入图片描述
可以看到,输入经过一个常规卷积之后,v2网络紧接着加了7个bottleneck block层,然后再两个11卷积和一个77的平均池化的组合操作。

一、经典残差结构

每层卷积层的输入维度、输出维度

  • 输入:三维数据,(宽 w i n w_{in} win×高 h i n h_{in} hin×深 d i n d_{in} din)
  • 每层卷积层的参数:
    • 感受野(receptive field)的大小 f f f
    • 过滤器(Filter)的数量(决定输出单元的深度) k k k
    • 步幅(Stride) s s s
    • 补零(zero-padding)的数量 p p p
  • 输出:三维单元,(宽 w o u t w_{out} wout×高 h o u t h_{out} hout×深 d o u t d_{out} dout),其中各维度大小为:
    • w o u t = w i n − f + 2 p s + 1 w_{out}=\cfrac{w_{in}-f+2p}{s}+1 wout=swinf+2p+1
    • h o u t = h i n − f + 2 p s + 1 h_{out}=\cfrac{h_{in}-f+2p}{s}+1 hout=shinf+2p+1
    • d o u t = k d_{out}=k dout=k

在这里插入图片描述

二、MobelNets-v2结构

在这里插入图片描述

在这里插入图片描述

  • t t t:扩展因子;
  • c c c:输出特征矩阵的通道数,上图中的 k ′ k^{'} k
  • n n n:Bottleneck(指的是Inverted Bottleneck)重复的次数;
  • s s s:每一个sequence的第一层所采用的步距,该sequence的其他层都采用1;
  • Each line describes a sequence of 1 or more identical (modulo stride) layers, repeated n times;
  • All layers in the same sequence have the same number c of output channels;
  • The first layer of each sequence has a stride s s s and all others use stride 1;
  • All spatial convolutions use 3 × 3 kernels;
  • The expansion factor t t t is always applied to the input size;

1、Inverted Residuals(倒残差结构)【中间胖两头瘦】【激活函数:ReLU6】

在这里插入图片描述

在MobileNetV2中的Inverted Residuals正好与ResNet的bottleneck residual block相反,其结构形状是中间胖两头窄。即:

  1. 在可分离卷积的前面增加一个大小为1*1的卷积进行升维(Expansion layer)【用1×1核卷积(增加通道数来升维)、3×3核卷积(不变)、用1×1核降维】;
  2. 将输入和输出的部分进行连接(residual connection), 如下图所示(Inverted Residuals(中间大两头小))。
  3. 激活函数采用ReLU6; y = R e L U 6 ( x ) = m i n ( m a x ( x , 0 ) , 6 ) y=ReLU6(x)=min(max(x,0),6) y=ReLU6(x)=min(max(x,0),6)

在这里插入图片描述
在这里插入图片描述

2、Linear Bottlenck结构

由于DW、PW都是以Relu作为激活函数,且PW会做降维,再对低维特征做ReLU时会丢失很多信息,所以:

  • 从高维向低维转换,使用ReLU激活函数可能会造成信息丢失或破坏(所以不使用非线性激活数函数),即在PW这一部分(倒残差结构的最后一个1×1卷积层),我们不再使用ReLU激活函数而是使用线性激活函数,如下图。

在这里插入图片描述

三、性能对比

1、分类任务

在这里插入图片描述

2、目标检测

在这里插入图片描述

四、MobelNets-v2代码

import torch.nn as nn
import math


def conv_bn(inp, oup, stride):
    return nn.Sequential(
        nn.Conv2d(inp, oup, 3, stride, 1, bias=False),
        nn.BatchNorm2d(oup),
        nn.ReLU6(inplace=True)
    )


def conv_1x1_bn(inp, oup):
    return nn.Sequential(
        nn.Conv2d(inp, oup, 1, 1, 0, bias=False),
        nn.BatchNorm2d(oup),
        nn.ReLU6(inplace=True)
    )


def make_divisible(x, divisible_by=8):
    import numpy as np
    return int(np.ceil(x * 1. / divisible_by) * divisible_by)


class InvertedResidual(nn.Module):
    def __init__(self, inp, oup, stride, expand_ratio):
        super(InvertedResidual, self).__init__()
        self.stride = stride
        assert stride in [1, 2]

        hidden_dim = int(inp * expand_ratio)
        self.use_res_connect = self.stride == 1 and inp == oup

        if expand_ratio == 1:
            self.conv = nn.Sequential(
                # dw
                nn.Conv2d(hidden_dim, hidden_dim, 3, stride, 1, groups=hidden_dim, bias=False),
                nn.BatchNorm2d(hidden_dim),
                nn.ReLU6(inplace=True),
                # pw-linear
                nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
                nn.BatchNorm2d(oup),
            )
        else:
            self.conv = nn.Sequential(
                # pw
                nn.Conv2d(inp, hidden_dim, 1, 1, 0, bias=False),
                nn.BatchNorm2d(hidden_dim),
                nn.ReLU6(inplace=True),
                # dw
                nn.Conv2d(hidden_dim, hidden_dim, 3, stride, 1, groups=hidden_dim, bias=False),
                nn.BatchNorm2d(hidden_dim),
                nn.ReLU6(inplace=True),
                # pw-linear
                nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
                nn.BatchNorm2d(oup),
            )

    def forward(self, x):
        if self.use_res_connect:
            return x + self.conv(x)
        else:
            return self.conv(x)


class MobileNetV2(nn.Module):
    def __init__(self, n_class=1000, input_size=224, width_mult=1.):
        super(MobileNetV2, self).__init__()
        block = InvertedResidual
        input_channel = 32
        last_channel = 1280
        interverted_residual_setting = [
            # t, c, n, s
            [1, 16, 1, 1],
            [6, 24, 2, 2],
            [6, 32, 3, 2],
            [6, 64, 4, 2],
            [6, 96, 3, 1],
            [6, 160, 3, 2],
            [6, 320, 1, 1],
        ]

        # building first layer
        assert input_size % 32 == 0
        # input_channel = make_divisible(input_channel * width_mult)  # first channel is always 32!
        self.last_channel = make_divisible(last_channel * width_mult) if width_mult > 1.0 else last_channel
        self.features = [conv_bn(3, input_channel, 2)]
        # building inverted residual blocks
        for t, c, n, s in interverted_residual_setting:
            output_channel = make_divisible(c * width_mult) if t > 1 else c
            for i in range(n):
                if i == 0:
                    self.features.append(block(input_channel, output_channel, s, expand_ratio=t))
                else:
                    self.features.append(block(input_channel, output_channel, 1, expand_ratio=t))
                input_channel = output_channel
        # building last several layers
        self.features.append(conv_1x1_bn(input_channel, self.last_channel))
        # make it nn.Sequential
        self.features = nn.Sequential(*self.features)

        # building classifier
        self.classifier = nn.Linear(self.last_channel, n_class)

        self._initialize_weights()

    def forward(self, x):
        x = self.features(x)
        x = x.mean(3).mean(2)
        x = self.classifier(x)
        return x

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
                if m.bias is not None:
                    m.bias.data.zero_()
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()
            elif isinstance(m, nn.Linear):
                n = m.weight.size(1)
                m.weight.data.normal_(0, 0.01)
                m.bias.data.zero_()


def mobilenet_v2(pretrained=True):
    model = MobileNetV2(width_mult=1)

    if pretrained:
        try:
            from torch.hub import load_state_dict_from_url
        except ImportError:
            from torch.utils.model_zoo import load_url as load_state_dict_from_url
        state_dict = load_state_dict_from_url(
            'https://www.dropbox.com/s/47tyzpofuuyyv1b/mobilenetv2_1.0-f2a8633.pth.tar?dl=1', progress=True)
        model.load_state_dict(state_dict)
    return model


if __name__ == '__main__':
    net = mobilenet_v2(True)
"""
Creates a MobileNetV2 Model as defined in:
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen. (2018). 
MobileNetV2: Inverted Residuals and Linear Bottlenecks
arXiv preprint arXiv:1801.04381.
import from https://github.com/tonylins/pytorch-mobilenet-v2
"""

import torch.nn as nn
import math

__all__ = ['mobilenetv2']


def _make_divisible(v, divisor, min_value=None):
    """
    This function is taken from the original tf repo.
    It ensures that all layers have a channel number that is divisible by 8
    It can be seen here:
    https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
    :param v:
    :param divisor:
    :param min_value:
    :return:
    """
    if min_value is None:
        min_value = divisor
    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_v < 0.9 * v:
        new_v += divisor
    return new_v


def conv_3x3_bn(inp, oup, stride):
    return nn.Sequential(
        nn.Conv2d(inp, oup, 3, stride, 1, bias=False),
        nn.BatchNorm2d(oup),
        nn.ReLU6(inplace=True)
    )


def conv_1x1_bn(inp, oup):
    return nn.Sequential(
        nn.Conv2d(inp, oup, 1, 1, 0, bias=False),
        nn.BatchNorm2d(oup),
        nn.ReLU6(inplace=True)
    )


class InvertedResidual(nn.Module):
    def __init__(self, inp, oup, stride, expand_ratio):
        super(InvertedResidual, self).__init__()
        assert stride in [1, 2]

        hidden_dim = round(inp * expand_ratio)
        self.identity = stride == 1 and inp == oup

        if expand_ratio == 1:
            self.conv = nn.Sequential(
                # dw
                nn.Conv2d(hidden_dim, hidden_dim, 3, stride, 1, groups=hidden_dim, bias=False),
                nn.BatchNorm2d(hidden_dim),
                nn.ReLU6(inplace=True),
                # pw-linear
                nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
                nn.BatchNorm2d(oup),
            )
        else:
            self.conv = nn.Sequential(
                # pw
                nn.Conv2d(inp, hidden_dim, 1, 1, 0, bias=False),
                nn.BatchNorm2d(hidden_dim),
                nn.ReLU6(inplace=True),
                # dw
                nn.Conv2d(hidden_dim, hidden_dim, 3, stride, 1, groups=hidden_dim, bias=False),
                nn.BatchNorm2d(hidden_dim),
                nn.ReLU6(inplace=True),
                # pw-linear
                nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
                nn.BatchNorm2d(oup),
            )

    def forward(self, x):
        if self.identity:
            return x + self.conv(x)
        else:
            return self.conv(x)


class MobileNetV2(nn.Module):
    def __init__(self, num_classes=1000, width_mult=1.):
        super(MobileNetV2, self).__init__()
        # setting of inverted residual blocks
        self.cfgs = [
            # t, c, n, s
            [1,  16, 1, 1],
            [6,  24, 2, 2],
            [6,  32, 3, 2],
            [6,  64, 4, 2],
            [6,  96, 3, 1],
            [6, 160, 3, 2],
            [6, 320, 1, 1],
        ]

        # building first layer
        input_channel = _make_divisible(32 * width_mult, 4 if width_mult == 0.1 else 8)
        layers = [conv_3x3_bn(3, input_channel, 2)]
        # building inverted residual blocks
        block = InvertedResidual
        for t, c, n, s in self.cfgs:
            output_channel = _make_divisible(c * width_mult, 4 if width_mult == 0.1 else 8)
            for i in range(n):
                layers.append(block(input_channel, output_channel, s if i == 0 else 1, t))
                input_channel = output_channel
        self.features = nn.Sequential(*layers)
        # building last several layers
        output_channel = _make_divisible(1280 * width_mult, 4 if width_mult == 0.1 else 8) if width_mult > 1.0 else 1280
        self.conv = conv_1x1_bn(input_channel, output_channel)
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.classifier = nn.Linear(output_channel, num_classes)

        self._initialize_weights()

    def forward(self, x):
        x = self.features(x)
        x = self.conv(x)
        x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        x = self.classifier(x)
        return x

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
                if m.bias is not None:
                    m.bias.data.zero_()
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()
            elif isinstance(m, nn.Linear):
                m.weight.data.normal_(0, 0.01)
                m.bias.data.zero_()

def mobilenetv2(**kwargs):
    """
    Constructs a MobileNet V2 model
    """
    return MobileNetV2(**kwargs)

参考资料:
迈微精选 | 轻量化CNN网络MobileNet系列详解
轻量化网络——MobileNet
深度学习在图像处理中的应用(tensorflow2.4以及pytorch1.10实现)
轻量级网络-Mobilenet系列(v1,v2,v3)
倒残差与线性瓶颈浅析 - MobileNetV2

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值