移动端/嵌入式-CV模型-2018：MobelNets-v2【Inverted Residuals（中间胖两头瘦）、Linear Bottlenecks（每个倒残差的最后一个卷积层使用线性激活函数）】

本文链接：https://blog.csdn.net/u013250861/article/details/126220575

相较MobelNets-v1，准确率更高，模型更小；

MobileNet v1的特色就是深度可分离卷积，但研究人员发现深度可分离卷积中有大量卷积核为0，即有很多卷积核没有参与实际计算。是什么原因造成的呢？v2的作者发现是ReLU激活函数的问题，认为 ReLU这个激活函数，在低维空间运算中会损失很多信息，而在高维空间中会保留较多有用信息 。

既然如此，v2的解决方案也很简单，就是直接将ReLU6激活换成线性激活函数，当然也不是全都换，只是将最后一层的ReLU换成线性函数。具体到v2网络中就是将最后的Point-Wise卷积的ReLU6都换成线性函数。v2给这个操作命名为linear bottleneck，这也是v2网络的第一个关键点。
在这里插入图片描述
深度卷积（Depth-Wise）本身没有改变通道的作用，比如本文前例中的深度可分离卷积的例子，在前一半的深度卷积操作中，输入是3个通道，输出还是3个通道。所以为了能让深度卷积能在高维上工作，v2提出在深度卷积之前加一个扩充通道的卷积操作，什么操作能给通道升维呢？当然是1*1卷积了。
在这里插入图片描述
这种在深度卷积之前扩充通道的操作在v2中被称作Expansion layer。这也是v2网络的第二个关键点。

MobileNet v1虽然加了深度可分离卷积，但网络主体仍然是VGG的直筒型结构。所以v2网络的第三个大的关键点就是借鉴了ResNet的残差结构，在v1网络结构基础上加入了跳跃连接。相较于ResNet的残差块结构，v2给这种结构命名为Inverted resdiual block，即倒残差块。与ResNet相比，我们来仔细看一下“倒”在哪。
在这里插入图片描述
从图中可以看到，ResNet是先0.25倍降维，然后标准3*3卷积，再升维，而MobileNet v2则是先6倍升维，然后深度可分离卷积，最后再降维。更形象一点我们可以这么画：

MobileNet v2的维度升降顺序跟ResNet完全相反，所以才叫倒残差。

综合上述三个关键点：Linear Bottlenecks、Expansion layer和Inverted resdiual之后就组成了MobileNet v2的block，如下图所示。
在这里插入图片描述
MobileNet v2的网络结构如下图所示。

在这里插入图片描述
可以看到，输入经过一个常规卷积之后，v2网络紧接着加了7个bottleneck block层，然后再两个11卷积和一个77的平均池化的组合操作。

一、经典残差结构

每层卷积层的输入维度、输出维度

输入：三维数据，(宽 $w_{in}$ ×高 $h_{in}$ ×深 $d_{in}$ )
每层卷积层的参数：
- 感受野(receptive field)的大小 $f$
- 过滤器(Filter)的数量(决定输出单元的深度) $k$
- 步幅(Stride) $s$
- 补零(zero-padding)的数量 $p$
输出：三维单元，(宽 $w_{out}$ ×高 $h_{out}$ ×深 $d_{out}$ )，其中各维度大小为：
- $w_{out}=\cfrac{w_{in}-f+2p}{s}+1$
- $h_{out}=\cfrac{h_{in}-f+2p}{s}+1$
- $d_{out}=k$

在这里插入图片描述

二、MobelNets-v2结构

在这里插入图片描述

$t$ ：扩展因子；
$c$ ：输出特征矩阵的通道数，上图中的 $k^{'}$ ；
$n$ ：Bottleneck（指的是Inverted Bottleneck）重复的次数；
$s$ ：每一个sequence的第一层所采用的步距，该sequence的其他层都采用1；
Each line describes a sequence of 1 or more identical (modulo stride) layers, repeated n times；
All layers in the same sequence have the same number c of output channels；
The ﬁrst layer of each sequence has a stride $s$ and all others use stride 1；
All spatial convolutions use 3 × 3 kernels；
The expansion factor $t$ is always applied to the input size；

1、Inverted Residuals（倒残差结构）【中间胖两头瘦】【激活函数：ReLU6】

在这里插入图片描述

在MobileNetV2中的Inverted Residuals正好与ResNet的bottleneck residual block相反，其结构形状是中间胖两头窄。即：

在可分离卷积的前面增加一个大小为1*1的卷积进行升维（Expansion layer）【用1×1核卷积（增加通道数来升维）、3×3核卷积（不变）、用1×1核降维】；
将输入和输出的部分进行连接（residual connection）, 如下图所示（Inverted Residuals（中间大两头小））。
激活函数采用ReLU6； $y = R e LU 6 (x) = min (ma x (x, 0), 6)$

在这里插入图片描述

2、Linear Bottlenck结构

由于DW、PW都是以Relu作为激活函数，且PW会做降维，再对低维特征做ReLU时会丢失很多信息，所以：

从高维向低维转换，使用ReLU激活函数可能会造成信息丢失或破坏（所以不使用非线性激活数函数），即在PW这一部分（倒残差结构的最后一个1×1卷积层），我们不再使用ReLU激活函数而是使用线性激活函数，如下图。

在这里插入图片描述

三、性能对比

1、分类任务

在这里插入图片描述

2、目标检测

在这里插入图片描述

四、MobelNets-v2代码

import torch.nn as nn
import math


def conv_bn(inp, oup, stride):
    return nn.Sequential(
        nn.Conv2d(inp, oup, 3, stride, 1, bias=False),
        nn.BatchNorm2d(oup),
        nn.ReLU6(inplace=True)
    )


def conv_1x1_bn(inp, oup):
    return nn.Sequential(
        nn.Conv2d(inp, oup, 1, 1, 0, bias=False),
        nn.BatchNorm2d(oup),
        nn.ReLU6(inplace=True)
    )


def make_divisible(x, divisible_by=8):
    import numpy as np
    return int(np.ceil(x * 1. / divisible_by) * divisible_by)


class InvertedResidual(nn.Module):
    def __init__(self, inp, oup, stride, expand_ratio):
        super(InvertedResidual, self).__init__()
        self.stride = stride
        assert stride in [1, 2]

        hidden_dim = int(inp * expand_ratio)
        self.use_res_connect = self.stride == 1 and inp == oup

        if expand_ratio == 1:
            self.conv = nn.Sequential(
                # dw
                nn.Conv2d(hidden_dim, hidden_dim, 3, stride, 1, groups=hidden_dim, bias=False),
                nn.BatchNorm2d(hidden_dim),
                nn.ReLU6(inplace=True),
                # pw-linear
                nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
                nn.BatchNorm2d(oup),
            )
        else:
            self.conv = nn.Sequential(
                # pw
                nn.Conv2d(inp, hidden_dim, 1, 1, 0, bias=False),
                nn.BatchNorm2d(hidden_dim),
                nn.ReLU6(inplace=True),
                # dw
                nn.Conv2d(hidden_dim, hidden_dim, 3, stride, 1, groups=hidden_dim, bias=False),
                nn.BatchNorm2d(hidden_dim),
                nn.ReLU6(inplace=True),
                # pw-linear
                nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
                nn.BatchNorm2d(oup),
            )

    def forward(self, x):
        if self.use_res_connect:
            return x + self.conv(x)
        else:
            return self.conv(x)


class MobileNetV2(nn.Module):
    def __init__(self, n_class=1000, input_size=224, width_mult=1.):
        super(MobileNetV2, self).__init__()
        block = InvertedResidual
        input_channel = 32
        last_channel = 1280
        interverted_residual_setting = [
            # t, c, n, s
            [1, 16, 1, 1],
            [6, 24, 2, 2],
            [6, 32, 3, 2],
            [6, 64, 4, 2],
            [6, 96, 3, 1],
            [6, 160, 3, 2],
            [6, 320, 1, 1],
        ]

        # building first layer
        assert input_size % 32 == 0
        # input_channel = make_divisible(input_channel * width_mult)  # first channel is always 32!
        self.last_channel = make_divisible(last_channel * width_mult) if width_mult > 1.0 else last_channel
        self.features = [conv_bn(3, input_channel, 2)]
        # building inverted residual blocks
        for t, c, n, s in interverted_residual_setting:
            output_channel = make_divisible(c * width_mult) if t > 1 else c
            for i in range(n):
                if i == 0:
                    self.features.append(block(input_channel, output_channel, s, expand_ratio=t))
                else:
                    self.features.append(block(input_channel, output_channel, 1, expand_ratio=t))
                input_channel = output_channel
        # building last several layers
        self.features.append(conv_1x1_bn(input_channel, self.last_channel))
        # make it nn.Sequential
        self.features = nn.Sequential(*self.features)

        # building classifier
        self.classifier = nn.Linear(self.last_channel, n_class)

        self._initialize_weights()

    def forward(self, x):
        x = self.features(x)
        x = x.mean(3).mean(2)
        x = self.classifier(x)
        return x

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
                if m.bias is not None:
                    m.bias.data.zero_()
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()
            elif isinstance(m, nn.Linear):
                n = m.weight.size(1)
                m.weight.data.normal_(0, 0.01)
                m.bias.data.zero_()


def mobilenet_v2(pretrained=True):
    model = MobileNetV2(width_mult=1)

    if pretrained:
        try:
            from torch.hub import load_state_dict_from_url
        except ImportError:
            from torch.utils.model_zoo import load_url as load_state_dict_from_url
        state_dict = load_state_dict_from_url(
            'https://www.dropbox.com/s/47tyzpofuuyyv1b/mobilenetv2_1.0-f2a8633.pth.tar?dl=1', progress=True)
        model.load_state_dict(state_dict)
    return model


if __name__ == '__main__':
    net = mobilenet_v2(True)

"""
Creates a MobileNetV2 Model as defined in:
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen. (2018). 
MobileNetV2: Inverted Residuals and Linear Bottlenecks
arXiv preprint arXiv:1801.04381.
import from https://github.com/tonylins/pytorch-mobilenet-v2
"""

import torch.nn as nn
import math

__all__ = ['mobilenetv2']


def _make_divisible(v, divisor, min_value=None):
    """
    This function is taken from the original tf repo.
    It ensures that all layers have a channel number that is divisible by 8
    It can be seen here:
    https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
    :param v:
    :param divisor:
    :param min_value:
    :return:
    """
    if min_value is None:
        min_value = divisor
    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_v < 0.9 * v:
        new_v += divisor
    return new_v


def conv_3x3_bn(inp, oup, stride):
    return nn.Sequential(
        nn.Conv2d(inp, oup, 3, stride, 1, bias=False),
        nn.BatchNorm2d(oup),
        nn.ReLU6(inplace=True)
    )


def conv_1x1_bn(inp, oup):
    return nn.Sequential(
        nn.Conv2d(inp, oup, 1, 1, 0, bias=False),
        nn.BatchNorm2d(oup),
        nn.ReLU6(inplace=True)
    )


class InvertedResidual(nn.Module):
    def __init__(self, inp, oup, stride, expand_ratio):
        super(InvertedResidual, self).__init__()
        assert stride in [1, 2]

        hidden_dim = round(inp * expand_ratio)
        self.identity = stride == 1 and inp == oup

        if expand_ratio == 1:
            self.conv = nn.Sequential(
                # dw
                nn.Conv2d(hidden_dim, hidden_dim, 3, stride, 1, groups=hidden_dim, bias=False),
                nn.BatchNorm2d(hidden_dim),
                nn.ReLU6(inplace=True),
                # pw-linear
                nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
                nn.BatchNorm2d(oup),
            )
        else:
            self.conv = nn.Sequential(
                # pw
                nn.Conv2d(inp, hidden_dim, 1, 1, 0, bias=False),
                nn.BatchNorm2d(hidden_dim),
                nn.ReLU6(inplace=True),
                # dw
                nn.Conv2d(hidden_dim, hidden_dim, 3, stride, 1, groups=hidden_dim, bias=False),
                nn.BatchNorm2d(hidden_dim),
                nn.ReLU6(inplace=True),
                # pw-linear
                nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
                nn.BatchNorm2d(oup),
            )

    def forward(self, x):
        if self.identity:
            return x + self.conv(x)
        else:
            return self.conv(x)


class MobileNetV2(nn.Module):
    def __init__(self, num_classes=1000, width_mult=1.):
        super(MobileNetV2, self).__init__()
        # setting of inverted residual blocks
        self.cfgs = [
            # t, c, n, s
            [1,  16, 1, 1],
            [6,  24, 2, 2],
            [6,  32, 3, 2],
            [6,  64, 4, 2],
            [6,  96, 3, 1],
            [6, 160, 3, 2],
            [6, 320, 1, 1],
        ]

        # building first layer
        input_channel = _make_divisible(32 * width_mult, 4 if width_mult == 0.1 else 8)
        layers = [conv_3x3_bn(3, input_channel, 2)]
        # building inverted residual blocks
        block = InvertedResidual
        for t, c, n, s in self.cfgs:
            output_channel = _make_divisible(c * width_mult, 4 if width_mult == 0.1 else 8)
            for i in range(n):
                layers.append(block(input_channel, output_channel, s if i == 0 else 1, t))
                input_channel = output_channel
        self.features = nn.Sequential(*layers)
        # building last several layers
        output_channel = _make_divisible(1280 * width_mult, 4 if width_mult == 0.1 else 8) if width_mult > 1.0 else 1280
        self.conv = conv_1x1_bn(input_channel, output_channel)
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.classifier = nn.Linear(output_channel, num_classes)

        self._initialize_weights()

    def forward(self, x):
        x = self.features(x)
        x = self.conv(x)
        x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        x = self.classifier(x)
        return x

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
                if m.bias is not None:
                    m.bias.data.zero_()
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()
            elif isinstance(m, nn.Linear):
                m.weight.data.normal_(0, 0.01)
                m.bias.data.zero_()

def mobilenetv2(**kwargs):
    """
    Constructs a MobileNet V2 model
    """
    return MobileNetV2(**kwargs)