从零搭建模型MobileNet

Geeks.

已于 2024-10-02 08:31:06 修改

阅读量669

点赞数 14

分类专栏：深度学习文章标签：深度学习 python 机器学习

于 2024-10-02 08:30:54 首次发布

本文链接：https://blog.csdn.net/Darling912/article/details/142677795

版权

深度学习专栏收录该内容

14 篇文章 0 订阅

订阅专栏

先奉上源码

之后进行相关的解析

import time
import torch
import torch.nn as nn
import torchvision.models as models


class MobileNetV1(nn.Module):
    def __init__(self):
        super(MobileNetV1, self).__init__()

        def conv_bn(inp, oup, stride):
            return nn.Sequential(
                nn.Conv2d(inp, oup, 3, stride, 1, bias=False),
                nn.BatchNorm2d(oup),
                nn.ReLU(inplace=True)
            )

        def conv_dw(inp, oup, stride):
            return nn.Sequential(
                nn.Conv2d(inp, inp, 3, stride, 1, groups=inp, bias=False),
                nn.BatchNorm2d(inp),
                nn.ReLU(inplace=True),
            )

        self.model = nn.Sequential(
            conv_bn(3, 32, 2),
            conv_dw(32, 64, 1),
            conv_dw(64, 128, 2),
            conv_dw(128, 128, 1),
            conv_dw(128, 256, 2),
            conv_dw(256, 256, 1),
            conv_dw(256, 512, 2),
            conv_dw(512, 512, 1),
            conv_dw(512, 512, 1),
            conv_dw(512, 512, 1),
            conv_dw(512, 512, 1),
            conv_dw(512, 512, 1),
            conv_dw(512, 1024, 2),
            conv_dw(1024, 1024, 1),
            nn.AvgPool2d(7),
        )
        self.fc = nn.Linear(1024, 1000)

        def forward(self, x):
            x = self.model(x)
            x = x.view(-1, 1024)
            x = self.fc(x)
            return x


def speed(model, name):
    t0 = time.time()
    input = torch.randn(1, 3, 224, 224).cpu()
    t1 = time.time()

    model(input)
    t2 = time.time()

    for i in range(0, 30):
        model(input)
    t3 = time.time()

    print('%10s : %f' % (name, (t3 - t2) / 30))


if __name__ == '__main__':
    resnet18 = models.resnet18().cpu()
    alexnet = models.alexnet().cpu()
    vgg16 = models.vgg16().cpu()
    mobilenetv1 = MobileNetV1().cpu()

    speed(resnet18, 'resnet18')
    speed(alexnet, 'alexnet')
    speed(vgg16, 'vgg16')
    speed(mobilenetv1, 'mobilenet')

初始化函数 (__init__):
调用父类构造器 super(MobileNetV1, self).__init__()，初始化父类属性。
定义两个辅助函数 conv_bn 和 conv_dw 用于构建网络层。
使用 nn.Sequential 构建整个网络结构，包括多个卷积层和深度可分离卷积层。
conv_bn 函数:
输入参数为 inp（输入通道数）、oup（输出通道数）和 stride（步长）。
返回一个包含标准卷积、批量归一化和 ReLU 激活的序列容器。
conv_dw 函数:
输入参数与 conv_bn 相同。
实现深度可分离卷积，首先进行逐层卷积（groups=inp），然后是批量归一化和 ReLU 激活。
网络结构定义:
使用 nn.Sequential 定义了一系列卷积和深度可分离卷积层。
层次结构依次为：标准卷积 -> 深度可分离卷积 -> 深度可分离卷积 -> ... -> 平均池化层。
最后添加一个全连接层 self.fc 用于分类任务。
前向传播函数 (forward):
接受输入张量 x。
通过 self.model 对输入进行前向传播处理。
将输出展平成一维向量 x.view(-1, 1024)。
经过全连接层 self.fc 得到最终输出。
总结：
该函数实现了一个基于 MobileNetV1 的卷积神经网络，主要用于图像分类任务。

从你提供的性能测试结果来看，不同模型的前向传播时间如下：
ResNet-18: 0.016439 秒
AlexNet: 0.012422 秒
VGG-16: 0.057054 秒
MobileNet: 0.207311 秒
这些结果可以说明以下几点：
模型复杂度:
ResNet-18: 相对简单且高效的模型，具有较少的参数和计算量。
AlexNet: 更简单的模型，计算量较小。
VGG-16: 复杂度较高，参数较多，计算量较大。
MobileNet: 尽管设计为轻量级模型，但在实际测试中表现较慢。
计算效率:
ResNet-18 和 AlexNet 的前向传播时间最短，说明它们在计算效率上表现较好。
VGG-16 的前向传播时间较长，这与其复杂的网络结构有关。
MobileNet 的前向传播时间最长，这可能是因为虽然 MobileNet 设计为轻量级模型，但在实际运行中可能存在其他瓶颈或优化问题。
硬件影响:
测试结果可能受到硬件环境的影响。不同的硬件配置可能会导致不同的性能表现。
如果是在相同的硬件环境下测试，那么这些结果更能反映模型本身的计算效率。
优化问题:
MobileNet 的前向传播时间较长，可能需要进一步检查和优化：
检查是否有不必要的计算或冗余操作。
确认是否使用了合适的优化技术（如批处理、CUDA 加速等）。
总结
ResNet-18 和 AlexNet 表现最佳，计算效率高。
VGG-16 较为复杂，计算量大，因此前向传播时间较长。
MobileNet 尽管设计为轻量级模型，但在实际测试中表现较差，可能需要进一步优化。
如果你希望提高 MobileNet 的性能，可以考虑以下几个方面：
硬件加速: 确保使用 GPU 进行加速。
批处理: 使用更大的批处理大小以提高计算效率。
代码优化: 检查代码中是否有不必要的计算或冗余操作。
库版本: 确认使用的 PyTorch 版本是最新的，以便利用最新的优化技术。

MobileNetV2版本的实现

import time
import torch
import torch.nn as nn


def _make_divisible(v, divisor, min_value=None):
    """
    This function is taken from the original tf repo.
    It ensures that all layers have a channel number that is divisible by 8
    It can be seen here:
    :param v:
    :param divisor:
    :param min_value:
    :return:
    """
    if min_value is None:
        min_value = divisor

    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
    if new_v < 0.9 * v:
        new_v += divisor
    return new_v


# 标准卷积模块
class ConvBNReLU(nn.Sequential):
    def __init__(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1):
        padding = (kernel_size - 1) // 2
        super(ConvBNReLU, self).__init__(
            nn.Conv2d(in_planes, out_planes, kernel_size, stride, padding, groups=groups, bias=False),
            nn.BatchNorm2d(out_planes),
            nn.ReLU6(inplace=True)
        )


# 反转残差模块
class InvertedResidual(nn.Module):
    def __init__(self, inp, oup, stride, expand_ratio):
        super(InvertedResidual, self).__init__()
        self.stride = stride
        assert stride in [1, 2]

        hidden_dim = int(round(inp * expand_ratio))
        self.use_res_connect = self.stride == 1 and inp == oup

        layers = []
        if expand_ratio != 1:
            # pw
            layers.append(ConvBNReLU(inp, hidden_dim, kernel_size=1))
        layers.extend([
            # dw
            ConvBNReLU(hidden_dim, hidden_dim, stride=stride, groups=hidden_dim),
            # pw-linear
            nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
            nn.BatchNorm2d(oup),
        ])
        self.conv = nn.Sequential(*layers)

    def forward(self, x):
        if self.use_res_connect:
            return x + self.conv(x)
        else:
            return self.conv(x)


# 模型定义模块
class MobileNetV2(nn.Module):
    def __init__(self, num_classes=1000, width_mult=1.0, inverted_residual_setting=None, round_nearest=8):
        super(MobileNetV2, self).__init__()
        block = InvertedResidual
        input_channel = 32
        last_channel = 1280

        if inverted_residual_setting is None:
            inverted_residual_setting = [
                # t, c, n, s
                [1, 16, 1, 1],
                [6, 24, 2, 2],
                [6, 32, 3, 2],
                [6, 64, 4, 2],
                [6, 96, 3, 1],
                [6, 160, 3, 2],
                [6, 320, 1, 1],
            ]

        # 需要输入t,c,n,s参数
        if len(inverted_residual_setting) == 0 or len(inverted_residual_setting[0]) != 4:
            raise ValueError("inverted_residual_setting should be non-empty "
                             "or a 4-element list, got {}".format(inverted_residual_setting))

        # 第一层
        input_channel = _make_divisible(input_channel * width_mult, round_nearest)
        self.last_channel = _make_divisible(last_channel * max(1.0, width_mult), round_nearest)
        features = [ConvBNReLU(3, input_channel, stride=2)]
        # inverted residual blocks
        for t, c, n, s in inverted_residual_setting:
            output_channel = _make_divisible(c * width_mult, round_nearest)
            for i in range(n):
                stride = s if i == 0 else 1
                features.append(block(input_channel, output_channel, stride, expand_ratio=t))
                input_channel = output_channel
        # 最后几层
        features.append(ConvBNReLU(input_channel, self.last_channel, kernel_size=1))
        # make it nn.Sequential
        self.features = nn.Sequential(*features)

        # building classifier
        self.classifier = nn.Sequential(
            nn.Dropout(0.2),
            nn.Linear(self.last_channel, num_classes),
        )

    def forward(self, x):
        x = self.features(x)  # 7*7*1280
        x = x.mean([2, 3])
        x = self.classifier(x)
        return x


# 速度评估
def speed(model, name):
    t0 = time.time()
    input = torch.rand(1, 3, 224, 224).cpu()
    t1 = time.time()

    model(input)
    t2 = time.time()

    for i in range(0, 30):
        model(input)
    t3 = time.time()

    print('%10s : %f' % (name, (t3 - t2) / 30))


if __name__ == '__main__':
    from mobilenetv1 import MobileNetV1

    mobilenetv1 = MobileNetV1().cpu()
    mobilenetv2 = MobileNetV2(width_mult=1)
    speed(mobilenetv1, 'mobilenetv1')
    speed(mobilenetv2, 'mobilenetv2')

    input = torch.randn([1, 3, 224, 224])
    output = mobilenetv1(input)
    torch.onnx.export(mobilenetv1, input, 'mobilenetv1.onnx')

    output = mobilenetv2(input)
    torch.onnx.export(mobilenetv2, input, 'mobilenetv2.onnx')

从给出的结果来看，mobilenetv1 和 mobilenetv2 的性能指标分别为 0.171382 和 0.245386。这里假设这些数值代表某种性能指标（例如准确率、推理时间等）。以下是具体的分析：
准确率：
如果这些数值表示准确率，则 mobilenetv2 的准确率略高于 mobilenetv1。
mobilenetv2 的准确率为 0.245386，而 mobilenetv1 的准确率为 0.171382。
这表明在相同的测试集上，mobilenetv2 表现更好。
推理时间：
如果这些数值表示推理时间（通常以秒为单位），则 mobilenetv1 的推理时间为 0.171382 秒，而 mobilenetv2 的推理时间为 0.245386 秒。
这表明 mobilenetv1 的推理速度更快。
分析总结
准确率：
如果是准确率，mobilenetv2 表现更好，准确率更高。
mobilenetv2 采用了更先进的倒残差结构（Inverted Residuals），这可能是其准确率更高的原因。
推理时间：
如果是推理时间，mobilenetv1 更快。
mobilenetv1 的结构相对简单，因此推理速度更快。
建议
如果对准确率有较高要求，可以选择 mobilenetv2。
如果对推理速度有较高要求，可以选择 mobilenetv1。
具体选择取决于应用场景的需求。