MobileNet v2模型结构及pytorch代码实现

weixin_45919003

已于 2023-04-26 16:08:26 修改

阅读量3.1k

点赞数

分类专栏： pytorch量化学习文章标签： pytorch 深度学习机器学习

于 2023-04-26 16:04:27 首次发布

本文链接：https://blog.csdn.net/weixin_45919003/article/details/130373560

版权

pytorch量化学习专栏收录该内容

13 篇文章

订阅专栏

参考

霹雳吧啦Wz：使用pytorch搭建MobileNetV2并基于迁移学习训练
 MobileNetV2: Inverted Residuals and Linear Bottlenecks

MobileNet v2模型结构

倒残差结构

在这里插入图片描述
先升维，后降维；
将激活函数从relu改为relu6；
最后一个1 x 1卷积后使用线性激活函数（relu对低维特征信息造成较大损失）

倒残差模块结构（bottleneck）
其中shortcut连接只有当stride=1并且输入特征矩阵与输出特征矩阵shape相同时才有。stride=1保证了输出特征矩阵宽高不变，因此shape相同特指输入输出特征矩阵的深度 $k = k^{'}$
在这里插入图片描述

表中 t 为扩展因子，第一个1 x 1的卷积核个数为tk；第二层dw卷积s（stride为给定的），输出长宽变成1/s倍，深度不变；第三层1 x 1的卷积，降维操作，宽高不变，深度变为k’。

整体模型结构

在这里插入图片描述
表中参数：t 是扩展因子；c 是输出特征矩阵深度（前面的k’）； n是bottleneck的重复次数；s 是步距（针对每一个block第一层bottleneck的步距，其他为1）

第一个t = 1，在pytorch实现中去掉了第一个1x1卷积（因为没有任何变化）

在输入为14x14x64的block中有三个bottleneck，s=1，但是并没有shortcut，这是因为输入深度64，输出深度为96，无法进行相加。

最后的一个卷积层相当于一个全连接层，k代表分类的类别个数

pytorch实现

定义ConvBNReLU

conv+bn+relu共同组成，除了残差结构中最后一层使用的先行激活层，其余基本都一样

class ConvBNReLU(nn.Sequential):
    def __init__(self, in_channel, out_channel, kernel_size=3, stride=1, groups=1):
        padding = (kernel_size - 1) // 2
        super(ConvBNReLU, self).__init__(
            nn.Conv2d(in_channel, out_channel, kernel_size, stride, padding, groups=groups, bias=False),
            nn.BatchNorm2d(out_channel),
            nn.ReLU6(inplace=True)
        )

继承来自nn.Sequential，不需要写forward函数
初始化参数传入了groups，在pytorch中dw卷积也是调用的conv2d类进行实现的，groups=1则为普通卷积，groups设置成输入特征矩阵的深度则为dw卷积；padding根据kernel_size来设置。

InvertedResidual

倒残差结构继承于nn.Module父类

class InvertedResidual(nn.Module):
    def __init__(self, in_channel, out_channel, stride, expand_ratio):
        super(InvertedResidual, self).__init__()
        hidden_channel = in_channel * expand_ratio
        self.use_shortcut = stride == 1 and in_channel == out_channel

        layers = []                          # 定义层列表
        if expand_ratio != 1:
            # 1x1 pointwise conv
            layers.append(ConvBNReLU(in_channel, hidden_channel, kernel_size=1))
        layers.extend([
            # 3x3 depthwise conv
            ConvBNReLU(hidden_channel, hidden_channel, stride=stride, groups=hidden_channel),
            # 1x1 pointwise conv(linear)
            nn.Conv2d(hidden_channel, out_channel, kernel_size=1, bias=False),
            nn.BatchNorm2d(out_channel),
        ])

        self.conv = nn.Sequential(*layers)

    def forward(self, x):
        if self.use_shortcut:
            return x + self.conv(x)
        else:
            return self.conv(x)

在这里插入图片描述
初始化：

use_shortcut ：需要满足两个条件，stride == 1并且 in_channel == out_channel
layer的第一层：判断expand_ratio 是否为1，如果为1则不需要这一层，若不为1则输入为in_channel，输出为hidden_channel（就是这一层的卷积核个数），kernel_size=1
layer的第二层：dw卷积，因此设置groups=hidden_channel，即group为输入通道数
layer的第三层：没有直接使用前面定义的ConvBNReLU类，这是一因为最后一层没有使用relu激活函数。因为线性层相当与y=x，因此不需要额外添加一个线性层。
将layer通过位置参数传入Sequential（），打包组合在一起取名叫self.conv

正向传播过程：

use_shortcut 为true则有shortcut分支，输出为x + self.conv(x)；为false则无shortcut分支，输出为self.conv(x)。

定义MobileNetV2结构

类继承于nn.Module

class MobileNetV2(nn.Module):
    def __init__(self, num_classes=1000, alpha=1.0, round_nearest=8):
        super(MobileNetV2, self).__init__()
        block = InvertedResidual
        input_channel = _make_divisible(32 * alpha, round_nearest)          # 将卷积核个数调整到最接近8的整数倍数
        last_channel = _make_divisible(1280 * alpha, round_nearest)

        inverted_residual_setting = [
            # t, c, n, s
            [1, 16, 1, 1],
            [6, 24, 2, 2],
            [6, 32, 3, 2],
            [6, 64, 4, 2],
            [6, 96, 3, 1],
            [6, 160, 3, 2],
            [6, 320, 1, 1],
        ]

        features = []
        # conv1 layer
        features.append(ConvBNReLU(3, input_channel, stride=2))
        # building inverted residual residual blockes
        for t, c, n, s in inverted_residual_setting:
            output_channel = _make_divisible(c * alpha, round_nearest)
            for i in range(n):
                stride = s if i == 0 else 1
                features.append(block(input_channel, output_channel, stride, expand_ratio=t))
                input_channel = output_channel
        # building last several layers
        features.append(ConvBNReLU(input_channel, last_channel, 1))
        # combine feature layers
        self.features = nn.Sequential(*features)

        # building classifier
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.classifier = nn.Sequential(
            nn.Dropout(0.2),
            nn.Linear(last_channel, num_classes)
        )

        # weight initialization
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out')
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.ones_(m.weight)         # 初始化均值为0
                nn.init.zeros_(m.bias)          # 初始化方差为1
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)

    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

def _make_divisible(ch, divisor=8, min_ch=None):
    """
    This function is taken from the original tf repo.
    It ensures that all layers have a channel number that is divisible by 8
    It can be seen here:
    https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
    """
    if min_ch is None:
        min_ch = divisor
    new_ch = max(min_ch, int(ch + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_ch < 0.9 * ch:
        new_ch += divisor
    return new_ch

初始化：

参数：num_classes为分类个数；alpha为v1中提出的超参数，用来控制卷积核个数的倍率；round_nearest
将定义的InvertedResidual类传给block
定义input_channel：使用了_make_divisible函数，输入32 x alpha，将其调整为最接近round_nearest的整数倍，也就是8的整数倍。
– _make_divisible函数中：就是给ch加一个0.5倍的divisor，实现四舍五入的操作，将ch调整为最接近8的整数倍
最后一层的输入为1280，同样使用_make_divisible函数
创建一个list列表，对应上面整体模型结构表格中的t、c、n、s
定义空列表features：
先添加第一层卷积，输入为3，输出为前面定义的input_channel，s=2；
然后使用循环遍历t、c、n、s，并将输出output_channel使用_make_divisible函数进行调整，将c调整为最接近8的整数倍；
循环n次block，即n次残差结构
– 因为表格中s代表的是block中的第一层，其余层为1，因此进行判断，如果i=0则stride = s，否则stride = 1
– 接下来就在features例表中添加一系列倒残差结构
– 然后将output_channel传给input_channel作为下一层的输入
使用循环将所有的bottleneck定义完后，使用ConvBNReLU类定义后面的卷积层，输出为前面的last_channel
到这里特征提取部分已经全部完成，使用nn.Sequential将features通过位置参数传入，打包成一个整体。
最后的定义的分类器部分，就是一个平均池化下采样（自适应的，参数为高和宽均为1），一个全连接层（将dropout层和全连接层组合在一起定义为分类器）
初始化权重流程：遍历每一个子模块。子模块如果是conv2d，将权重进行初始化，存在bias则置零；如果是bn，将方差设置为1，均值设置为0；如果是全连接层，对权重初始化为均值为0，方差为1的一个正态分布，bias设置为0.

正向传播：

features
平均池化下采样
将输出展平
最后通过分类器

模型训练

预训练下载

import torchvision.models.mobilenetv2

进入后找到预训练模型下载连接：
https://download.pytorch.org/models/mobilenet_v2-b0353104.pth

从model文件中导入MobileNetV2网络结构

from model import MobileNetV2

	net = MobileNetV2(num_classes=5)
	pre_weights = torch.load(model_weight_path, map_location='cpu')
	pre_dict = {k: v for k, v in pre_weights.items() if net.state_dict()[k].numel() == v.numel()}
	missing_keys, unexpected_keys = net.load_state_dict(pre_dict, strict=False)
    # freeze features weights
    for param in net.features.parameters():
        param.requires_grad = False

    net.to(device)

实例化模型，定义类别个数为5，载入预训练模型参数，因为分类类别个数不同，因此最后一层用不了
所以遍历权重字典，看权重名称中是否有classifier，如果有则是最后一层全连接层的参数，如果不在就进行保存到pre_dict
再通过load_state_dict将权重字典pre_dict进行载入
实现除了最后一层参数外全部载入进去
冻结特征提取部分的所有权中，遍历net.features.parameters()下所有参数，将requires_grad 全部设置为 False，这样就不会对其进行求导，也不会进行参数更新。

预测

    # create model
    model = MobileNetV2(num_classes=5).to(device)
    # load model weights
    model_weight_path = "./MobileNetV2.pth"
    model.load_state_dict(torch.load(model_weight_path, map_location=device))
    model.eval()
    with torch.no_grad():
        # predict class
        output = torch.squeeze(model(img.to(device))).cpu()
        predict = torch.softmax(output, dim=0)
        predict_cla = torch.argmax(predict).numpy()