7-基于pyorch的Mobilenet V1,2,3原理及代码展示

本文链接：https://blog.csdn.net/qq_51383470/article/details/136262190

1.前言

最近在b站发现了一个非常好的计算机视觉 + pytorch实战的教程，相见恨晚，能让初学者少走很多弯路。
因此决定按着up给的教程路线：图像分类→目标检测→…一步步学习用pytorch实现深度学习在cv上的应用，并做笔记整理和总结。

up主教程给出了pytorch和tensorflow两个版本的实现，我暂时只记录pytorch版本的笔记，后续会补上tensorflow版本的。

参考内容来自

up主的b站链接：https://space.bilibili.com/18161609/channel/index
up主将代码和ppt都放在了github：https://github.com/WZMIAOMIAO/deep-learning-for-image-processing
up主的CSDN博客：https://blog.csdn.net/qq_37541097/article/details/103482003

2. 数据集后续网络架构都使用该数据集

3.1 AlexNet网络结构详解与花分类数据集下载_哔哩哔哩_bilibili

3.2 使用pytorch搭建AlexNet并训练花分类数据集_哔哩哔哩_bilibili

3.MobileNet详解

MobileNet网络是由google团队在2017年提出的，专注于移动端或者嵌入式设备中的轻量级CNN网络。相比传统卷积神经网络，在准确率小幅降低的前提下大大减少模型参数与运算量。(相比VGG16准确率减少了0.9%，但模型参数只有VGG的1/32)

网络中的亮点之处：1.Depthwise Convolution(大大减少运算量和参数数量)

2.增加超参数α、β

假设我们的图片是RGB图片，有三个chanel,

DW卷积：在这里是对以上图片内容的做一个简单的介绍，一个3*3*1的卷积核只处理一个颜色通道，共三个卷积核，分别处理RGB通道，生成的feature map的维度是3，也就是输入特征矩阵channel和卷积核的个数。

传统的卷积：卷积核的维度必须保证与输入特征度的维度一致，且输出的特征维度与卷积核的个数保持一致，例如一个3*3*3的卷积核只处理一个颜色通道。

超参数α：控制卷积网络中的卷积核的个数

超参数β：控制输入图像的尺寸即分辨率

3.1 DW卷积与传统卷积参数量比较

PW卷积就是传统的卷积。

输入的图片的尺寸: $D_{F}*D_{F}*M$ M是输入特征的chanel, $D_{F}$ 是输入的尺寸。
卷积核的尺寸和个数： $D_{K}*D_{K}*M*N$ 传统卷积卷积核的Channel与输入特征维度相同,N是卷积核的个数。
传统卷积的参数量： $D_{K}*D_{K}*M*N*D_{F}*D_{F}$

DW卷积的参数量： $D_{K}*D_{K}*M*D_{F}*D_{F}$

PW卷积的参数量： $1*1*M*N*D_{F}*D_{F}$

DW+PW卷积的参数量： $D_{K}*D_{K}*M*D_{F}*D_{F}+1*1*M*N*D_{F}*D_{F}$

传统卷积的参数量/DW+PW卷积的参数量 = $\frac{1}{N}+\frac{1}{D_{K}^2}$ 一般我们使用的卷积核为3*3所以 $D_{K}^2=9$ ,可以推出普通卷积计算量是DW+PW的8-9倍。

3.2 MobileNet模型架构与其他的模型进行比较

右表中1.0，0.75，0.25，0.5代表的是α参数。正如图中所述DW卷积核参数大部分为零。所以我们不再去讨论V1版本，着重叙述V2,V3版本

3.3 MobileNetV2详解

V2在论文中的名称也叫做到残差结构和线性Bottlenecks。它与残差结构有些相似见如下图。

在此1*1的卷积主要用作升降维操作，3*3的卷积用于特征提取。

到残差结构中使用的是Relu6激活函数 $y=RELU6(x)=min(max(0,x),6)$ 从图像上可知它是对Relu函数的一种改造基本一样。

为什么不适用RELU函数呢，由论文中可知，RELU激活函数对低纬特征信息会造成大量的损失。由下图可知输入特征在二维图像上类似圆圈，可以看出在2，3维度时明显丢失大量的信息。

3.3.1 网络架构解析

上图为到残差结构流程图。残差结构与倒残差结构中的shorcut中的使用条件不同，到残差结构必须在DW卷积中的stride=1,且倒残差结构中的输入特征维度与输出特征的维度相同时才有shortcut连接，并不是论文中所叙述的在stride=1的时候使用shorcut.

上图右侧表格中t的含义与之前的α的作用是一样的，用于控制卷积核的额个数。DW不改变输入特征的维度。最后一层的线性卷积曾实质上为全连接层。

下图为MobileNetV2的具体网络结构参数。

3.4 MobileNet V3详解

主要针对该网络中优化的部分进行解释。

MobileNet V3有两个版本分别为V3-large,V3-small 版本。见下图性能对比。

MobileNetV3更新基于V2的block，在V2的基础上引入了注意力机制SE和并更新了激活函数。

SE模块：简而言之，池化操作是对输入特征的channel进行池化操作，且第一个全连接层的结点个数是输入特征维度的 $\frac{1}{4}$ ，第二个全连接层的结点个数与第一个全连接层的维度相。第二个全连接层的输出的是对输入特征的每一个channel进行权重分配，看哪一个channel更加重要，然后与输入的feature map相乘，的到输出的feature map。

下图是对SE模块进行一个简单的示意图，输入的feature map的channel为2。

重新设计耗时层结构

1.在V1,V2中的第一个卷积层的卷积核个数均为32，在V3中经过验证16和32个卷积核计算得出来的效果是一样的，因此在不影响正确率的前提下通过减少卷积核的个数可以减少计算的参数，降低训练时间。

重新设计激活函数

简而言之，sigmoid函数等效为h-sigmoid,h-sigmoid函数是通过RELU6激活函数变形得到继承了它的一些特性。故h-swish[x]函数就可以等效为x*h-sigmoid也就解决了sigmoid函数计算求导复杂，对量化过程不友好的问题。

3.4.1 MobileNet V3的两种网络结构详解

由图可以看出在V3-Large和V3-Small中第一个bneck模块参数表中增加的维度与输入的维度相同，在查阅pytorch和tensorflow的代
码中发现在该bneck中1*1卷积是可以省去的，相当于1*1卷积在此并没有起到升降维的作用。

表格中参数解释 Input:代表输入特征尺度

Operator:所要实现的网络层

exp size:代表第一个1*1卷积的个数，也称扩展的维度

out:输出特征的维度

SE:是否使用注意力机制

NL:非线性激活函数类型

s:stride步长

4.MobileNet V2,V3代码

4.1 module V2

from torch import nn
import torch

#  改变输入的通道数为8的倍数
def _make_divisible(ch, divisor=8, min_ch=None):
    """
    This function is taken from the original tf repo.
    It ensures that all layers have a channel number that is divisible by 8
    It can be seen here:
    https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
    """
    if min_ch is None:
        min_ch = divisor
    new_ch = max(min_ch, int(ch + divisor / 2) // divisor * divisor) 
    # Make sure that round down does not go down by more than 10%.  确保新的通道数在原始的通道数下，下降不超过10%，若超过，则加上divisor的数值
    if new_ch < 0.9 * ch:
        new_ch += divisor
    return new_ch

# 卷积-批量归一化-激活函数，打包为一个函数，这三个函数通常是整体出现在中间层的。
# 继承的父类是Sequential打包网络的，因此下面可以这样初始化。
# 我们在此经常使用的卷积核为1*1和3*3，此padding的计算方式保证了在以上卷积核卷积操作时保证输出特征的尺寸不发生变化
# groups分组参数，当groups = 1（默认值）时代表的时普通的卷积操作，当groups = 输入特征的channel时此时的卷积为DW卷积。
class ConvBNReLU(nn.Sequential):  
    def __init__(self, in_channel, out_channel, kernel_size=3, stride=1, groups=1):
        padding = (kernel_size - 1) // 2
        super(ConvBNReLU, self).__init__(
            nn.Conv2d(in_channel, out_channel, kernel_size, stride, padding,groups=groups, bias=False),                    
            nn.BatchNorm2d(out_channel),
            nn.ReLU6(inplace=True)
        )

# 到残差结构
class InvertedResidual(nn.Module):
    def __init__(self, in_channel, out_channel, stride, expand_ratio):
        super(InvertedResidual, self).__init__()
        hidden_channel = in_channel * expand_ratio
        self.use_shortcut = stride == 1 and in_channel == out_channel # 必须满足步长为1且输入特征的channel等于输出特征的channel,才能使用identy

        layers = []
        if expand_ratio != 1:
            # 1x1 pointwise conv
            layers.append(ConvBNReLU(in_channel, hidden_channel, kernel_size=1))

# extend可以一次添加多层，append一次只能添加一层
# 在倒残差结构的最后一层使用的是线性激活函数等效于y=x可以不用调用
        layers.extend([
            # 3x3 depthwise conv
            ConvBNReLU(hidden_channel, hidden_channel, stride=stride, groups=hidden_channel),
            # 1x1 pointwise conv(linear)
            nn.Conv2d(hidden_channel, out_channel, kernel_size=1, bias=False),
            nn.BatchNorm2d(out_channel),
        ])

        self.conv = nn.Sequential(*layers)

    def forward(self, x):
        if self.use_shortcut:
            return x + self.conv(x)
        else:
            return self.conv(x)

# alpha控制网络中卷积核的个数
class MobileNetV2(nn.Module):
    def __init__(self, num_classes=1000, alpha=1.0, round_nearest=8):
        super(MobileNetV2, self).__init__()
        block = InvertedResidual  # 倒残差模块
        input_channel = _make_divisible(32 * alpha, round_nearest)
        last_channel = _make_divisible(1280 * alpha, round_nearest)

        inverted_residual_setting = [
            # t, c, n, s
            [1, 16, 1, 1],
            [6, 24, 2, 2],
            [6, 32, 3, 2],
            [6, 64, 4, 2],
            [6, 96, 3, 1],
            [6, 160, 3, 2],
            [6, 320, 1, 1],
        ]

        features = []
        # conv1 layer
        features.append(ConvBNReLU(3, input_channel, stride=2))
        # building inverted residual residual blockes
        for t, c, n, s in inverted_residual_setting:
            output_channel = _make_divisible(c * alpha, round_nearest)
            for i in range(n):
                stride = s if i == 0 else 1  # 该循环保证了只对每个模块第一层的stride，其他层 
                                             # stride=1
                features.append(block(input_channel, output_channel, stride, expand_ratio=t))
                input_channel = output_channel
        # building last several layers
        features.append(ConvBNReLU(input_channel, last_channel, 1))
        # combine feature layers
        self.features = nn.Sequential(*features)

        # building classifier 全连接层为分类层
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.classifier = nn.Sequential(
            nn.Dropout(0.2),
            nn.Linear(last_channel, num_classes)
        )

        # weight initialization
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out')
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)

    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

4.2 module V3

from typing import Callable, List, Optional

import torch
from torch import nn, Tensor
from torch.nn import functional as F
from functools import partial


def _make_divisible(ch, divisor=8, min_ch=None):
    """
    This function is taken from the original tf repo.
    It ensures that all layers have a channel number that is divisible by 8
    It can be seen here:
    https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
    """
    if min_ch is None:
        min_ch = divisor
    new_ch = max(min_ch, int(ch + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_ch < 0.9 * ch:
        new_ch += divisor
    return new_ch

# 输入参数新增正则优化和激活函数根据要求选择合适的
class ConvBNActivation(nn.Sequential):
    def __init__(self,
                 in_planes: int,
                 out_planes: int,
                 kernel_size: int = 3,
                 stride: int = 1,
                 groups: int = 1,
                 norm_layer: Optional[Callable[..., nn.Module]] = None,
                 activation_layer: Optional[Callable[..., nn.Module]] = None):
        padding = (kernel_size - 1) // 2
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        if activation_layer is None:
            activation_layer = nn.ReLU6
        super(ConvBNActivation, self).__init__(nn.Conv2d(in_channels=in_planes,
                                                         out_channels=out_planes,
                                                         kernel_size=kernel_size,
                                                         stride=stride,
                                                         padding=padding,
                                                         groups=groups,
                                                         bias=False),
                                               norm_layer(out_planes),
                                               activation_layer(inplace=True))
# SE模块
# 全连接层可以当作普通的卷积
class SqueezeExcitation(nn.Module):
    def __init__(self, input_c: int, squeeze_factor: int = 4):
        super(SqueezeExcitation, self).__init__()
        squeeze_c = _make_divisible(input_c // squeeze_factor, 8)
        self.fc1 = nn.Conv2d(input_c, squeeze_c, 1)
        self.fc2 = nn.Conv2d(squeeze_c, input_c, 1)

    def forward(self, x: Tensor) -> Tensor:
        scale = F.adaptive_avg_pool2d(x, output_size=(1, 1))
        scale = self.fc1(scale)
        scale = F.relu(scale, inplace=True)
        scale = self.fc2(scale)
        scale = F.hardsigmoid(scale, inplace=True)
        return scale * x


class InvertedResidualConfig:
    def __init__(self,
                 input_c: int,
                 kernel: int,
                 expanded_c: int,
                 out_c: int,
                 use_se: bool,
                 activation: str,
                 stride: int,
                 width_multi: float):
        self.input_c = self.adjust_channels(input_c, width_multi)
        self.kernel = kernel
        self.expanded_c = self.adjust_channels(expanded_c, width_multi)
        self.out_c = self.adjust_channels(out_c, width_multi)
        self.use_se = use_se
        self.use_hs = activation == "HS"  # whether using h-swish activation
        self.stride = stride

    @staticmethod
    def adjust_channels(channels: int, width_multi: float):
        return _make_divisible(channels * width_multi, 8)


class InvertedResidual(nn.Module):
    def __init__(self,
                 cnf: InvertedResidualConfig,
                 norm_layer: Callable[..., nn.Module]):
        super(InvertedResidual, self).__init__()

        if cnf.stride not in [1, 2]:
            raise ValueError("illegal stride value.")

        self.use_res_connect = (cnf.stride == 1 and cnf.input_c == cnf.out_c)

        layers: List[nn.Module] = []
        activation_layer = nn.Hardswish if cnf.use_hs else nn.ReLU

        # expand
        if cnf.expanded_c != cnf.input_c:
            layers.append(ConvBNActivation(cnf.input_c,
                                           cnf.expanded_c,
                                           kernel_size=1,
                                           norm_layer=norm_layer,
                                           activation_layer=activation_layer))

        # depthwise
        layers.append(ConvBNActivation(cnf.expanded_c,
                                       cnf.expanded_c,
                                       kernel_size=cnf.kernel,
                                       stride=cnf.stride,
                                       groups=cnf.expanded_c,
                                       norm_layer=norm_layer,
                                       activation_layer=activation_layer))

        if cnf.use_se:
            layers.append(SqueezeExcitation(cnf.expanded_c))

        # project
        layers.append(ConvBNActivation(cnf.expanded_c,
                                       cnf.out_c,
                                       kernel_size=1,
                                       norm_layer=norm_layer,
                                       activation_layer=nn.Identity))

        self.block = nn.Sequential(*layers)
        self.out_channels = cnf.out_c
        self.is_strided = cnf.stride > 1

    def forward(self, x: Tensor) -> Tensor:
        result = self.block(x)
        if self.use_res_connect:
            result += x

        return result


class MobileNetV3(nn.Module):
    def __init__(self,
                 inverted_residual_setting: List[InvertedResidualConfig],
                 last_channel: int,
                 num_classes: int = 1000,
                 block: Optional[Callable[..., nn.Module]] = None,
                 norm_layer: Optional[Callable[..., nn.Module]] = None):
        super(MobileNetV3, self).__init__()

        if not inverted_residual_setting:
            raise ValueError("The inverted_residual_setting should not be empty.")
        elif not (isinstance(inverted_residual_setting, List) and
                  all([isinstance(s, InvertedResidualConfig) for s in inverted_residual_setting])):
            raise TypeError("The inverted_residual_setting should be List[InvertedResidualConfig]")

        if block is None:
            block = InvertedResidual

        if norm_layer is None:
            norm_layer = partial(nn.BatchNorm2d, eps=0.001, momentum=0.01)

        layers: List[nn.Module] = []

        # building first layer
        firstconv_output_c = inverted_residual_setting[0].input_c
        layers.append(ConvBNActivation(3,
                                       firstconv_output_c,
                                       kernel_size=3,
                                       stride=2,
                                       norm_layer=norm_layer,
                                       activation_layer=nn.Hardswish))
        # building inverted residual blocks
        for cnf in inverted_residual_setting:
            layers.append(block(cnf, norm_layer))

        # building last several layers
        lastconv_input_c = inverted_residual_setting[-1].out_c
        lastconv_output_c = 6 * lastconv_input_c
        layers.append(ConvBNActivation(lastconv_input_c,
                                       lastconv_output_c,
                                       kernel_size=1,
                                       norm_layer=norm_layer,
                                       activation_layer=nn.Hardswish))
        self.features = nn.Sequential(*layers)
        self.avgpool = nn.AdaptiveAvgPool2d(1)
        self.classifier = nn.Sequential(nn.Linear(lastconv_output_c, last_channel),
                                        nn.Hardswish(inplace=True),
                                        nn.Dropout(p=0.2, inplace=True),
                                        nn.Linear(last_channel, num_classes))

        # initial weights
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode="fan_out")
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)

    def _forward_impl(self, x: Tensor) -> Tensor:
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)

        return x

    def forward(self, x: Tensor) -> Tensor:
        return self._forward_impl(x)


def mobilenet_v3_large(num_classes: int = 1000,
                       reduced_tail: bool = False) -> MobileNetV3:
    """
    Constructs a large MobileNetV3 architecture from
    "Searching for MobileNetV3" <https://arxiv.org/abs/1905.02244>.

    weights_link:
    https://download.pytorch.org/models/mobilenet_v3_large-8738ca79.pth

    Args:
        num_classes (int): number of classes
        reduced_tail (bool): If True, reduces the channel counts of all feature layers
            between C4 and C5 by 2. It is used to reduce the channel redundancy in the
            backbone for Detection and Segmentation.
    """
    width_multi = 1.0
    bneck_conf = partial(InvertedResidualConfig, width_multi=width_multi)
    adjust_channels = partial(InvertedResidualConfig.adjust_channels, width_multi=width_multi)

    reduce_divider = 2 if reduced_tail else 1

    inverted_residual_setting = [
        # input_c, kernel, expanded_c, out_c, use_se, activation, stride
        bneck_conf(16, 3, 16, 16, False, "RE", 1),
        bneck_conf(16, 3, 64, 24, False, "RE", 2),  # C1
        bneck_conf(24, 3, 72, 24, False, "RE", 1),
        bneck_conf(24, 5, 72, 40, True, "RE", 2),  # C2
        bneck_conf(40, 5, 120, 40, True, "RE", 1),
        bneck_conf(40, 5, 120, 40, True, "RE", 1),
        bneck_conf(40, 3, 240, 80, False, "HS", 2),  # C3
        bneck_conf(80, 3, 200, 80, False, "HS", 1),
        bneck_conf(80, 3, 184, 80, False, "HS", 1),
        bneck_conf(80, 3, 184, 80, False, "HS", 1),
        bneck_conf(80, 3, 480, 112, True, "HS", 1),
        bneck_conf(112, 3, 672, 112, True, "HS", 1),
        bneck_conf(112, 5, 672, 160 // reduce_divider, True, "HS", 2),  # C4
        bneck_conf(160 // reduce_divider, 5, 960 // reduce_divider, 160 // reduce_divider, True, "HS", 1),
        bneck_conf(160 // reduce_divider, 5, 960 // reduce_divider, 160 // reduce_divider, True, "HS", 1),
    ]
    last_channel = adjust_channels(1280 // reduce_divider)  # C5

    return MobileNetV3(inverted_residual_setting=inverted_residual_setting,
                       last_channel=last_channel,
                       num_classes=num_classes)


def mobilenet_v3_small(num_classes: int = 1000,
                       reduced_tail: bool = False) -> MobileNetV3:
    """
    Constructs a large MobileNetV3 architecture from
    "Searching for MobileNetV3" <https://arxiv.org/abs/1905.02244>.

    weights_link:
    https://download.pytorch.org/models/mobilenet_v3_small-047dcff4.pth

    Args:
        num_classes (int): number of classes
        reduced_tail (bool): If True, reduces the channel counts of all feature layers
            between C4 and C5 by 2. It is used to reduce the channel redundancy in the
            backbone for Detection and Segmentation.
    """
    width_multi = 1.0
    bneck_conf = partial(InvertedResidualConfig, width_multi=width_multi)
    adjust_channels = partial(InvertedResidualConfig.adjust_channels, width_multi=width_multi)

    reduce_divider = 2 if reduced_tail else 1

    inverted_residual_setting = [
        # input_c, kernel, expanded_c, out_c, use_se, activation, stride
        bneck_conf(16, 3, 16, 16, True, "RE", 2),  # C1
        bneck_conf(16, 3, 72, 24, False, "RE", 2),  # C2
        bneck_conf(24, 3, 88, 24, False, "RE", 1),
        bneck_conf(24, 5, 96, 40, True, "HS", 2),  # C3
        bneck_conf(40, 5, 240, 40, True, "HS", 1),
        bneck_conf(40, 5, 240, 40, True, "HS", 1),
        bneck_conf(40, 5, 120, 48, True, "HS", 1),
        bneck_conf(48, 5, 144, 48, True, "HS", 1),
        bneck_conf(48, 5, 288, 96 // reduce_divider, True, "HS", 2),  # C4
        bneck_conf(96 // reduce_divider, 5, 576 // reduce_divider, 96 // reduce_divider, True, "HS", 1),
        bneck_conf(96 // reduce_divider, 5, 576 // reduce_divider, 96 // reduce_divider, True, "HS", 1)
    ]
    last_channel = adjust_channels(1024 // reduce_divider)  # C5

    return MobileNetV3(inverted_residual_setting=inverted_residual_setting,
                       last_channel=last_channel,
                       num_classes=num_classes)