简单有趣的轻量级网络MobileNets v1 、MobileNets v2、MobileNets v3（可以直接替换自己数据集）（网络结构详解+详细注释代码+核心思想讲解）——pytorch实现

小馨馨的小翟

已于 2023-05-15 23:09:23 修改

阅读量1k

点赞数 3

分类专栏：有趣的轻量级神经网络文章标签：深度学习 pytorch 人工智能计算机视觉卷积神经网络

于 2023-04-14 19:52:11 首次发布

本文链接：https://blog.csdn.net/qq_43215597/article/details/130120171

版权

有趣的轻量级神经网络专栏收录该内容

4 篇文章 3 订阅

订阅专栏

本文介绍了MobileNets系列轻量级卷积神经网络，包括v1、v2、v3的结构、创新点和代码实现。MobileNets主要特点是使用深度可分离卷积减少参数量，适用于移动设备。MobileNet v2引入了逆向残差块和线性瓶颈，而MobileNet v3则进一步优化了块设计并采用NAS搜索网络参数。文章提供详细的PyTorch代码，可用于训练和预测。

摘要由CSDN通过智能技术生成

终于终于终于终于肝完毕业论文了，终于可以开始更新博客了！！！！！！！

那么这期博客我们就开始学习一个比较简单有趣的轻量级卷积神经网络 MobileNets系列MobileNets v1 、MobileNets v2、MobileNets v3。

本博客代码可以直接生成训练集和测试集的损失和准确率的折线图，便于写论文使用。

之前出现的卷积神经网络，不论是Alexnet，还是VGG，亦或者是Resnet都存在模型参数量较大，对算力要求较高，等等各种原因，无法在嵌入式设备商运行，这些模型都一味追求精度的提升，对于模型的落地没考虑，因此模型轻量级的研究也作为了一个重要的研究方向。因此同学们在研究论文创新的时候，除了检测速度和检测精度，模型的大小，轻量级也可以作为一个研究的点，即我们所谓的创新点。

MobileNets系列的轻量级神经网络可谓是我第一次接触的正儿八经的轻量级神经网络，它这个名字也很好理解，移动网络，顾名思义就是可以放在移动设备商的卷积神经网络，在考虑精度的同时也开始考虑模型的大小，在保持模型精度的同时，优化模型结构，减少模型的大小和对算力的要求，在很大程度上便利了模型在嵌入式微型设备商的运行，这个嵌入式微型设备可以理解成我们常用的树莓派、手机、等等人工智能小芯片。

首先我们来学习一下MobileNet v1 ：

MobileNet v1 论文下载地址如下：

论文地址： https://arxiv.org/abs/1704.04861

MobileNets v1 论文题目MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications 翻译过来就是用于移动视觉的高效卷积神经网络应用，我们简单总结一下MobileNets v1 相较于之前的卷积神经网络，最主要的创新和特殊之处在于使用了深度可分离卷积对网络的景点卷积过程进行了改进。按照下图论文作者在ImageNet数据集上的实验结果来看，MobileNets v1 在准确率基本接近GoogleNet和VGG 16的情况下，大大减少了模型的复杂度和参数量。参数量只有VGG的1/32，这显然是个极大的创新，在同样的效果下，需要将模型实地部署在嵌入式设备上的人肯定更喜欢MobileNets。

好了，我们知道了MobileNets v1使用深度可分离卷积使得原理依靠卷积堆叠的卷积神经网络的参数量大大减少，这时候我们发现了问题的关键点在于这个深度可分离卷积的原理是啥，跟普通的卷积到底有什么区别，这是我们需要了解的重点。好的，接下来我们看一下普通卷积跟深度可分离卷积的区别。

普通卷积：卷积核channel=输入特征矩阵channel（一般来说是图片都是RGB三通道）

输出特征矩阵 channel=卷积核个数（计算过的都知道，输出的通道数为卷积核的个数）

传统卷积原理如下图所示：

深度可分离卷积： 卷积核 channel=1

输入特征矩阵channel=卷积核个数=输出特征矩阵channel

深度可分离卷积原理如下图所示：

由以上可知，因为输出的通道少了很多，所以参数量大大减少，这就是深度可分离卷积带来的优势之处。

很好理解，这就是Mobilenet v1首先提出时候的创新之处，深度可分离卷积的特殊性。

其次我们来学习一下MobileNet v2 ：

论文下载地址：http://arxiv.org/pdf/1801.04381.pdf

论文题目：MobileNetV2: Inverted Residuals and Linear Bottlenecks

翻译过来就是，反向残差与线性瓶颈，MobileNet v2的可以带来了跟Resnet一样的检测精度，同时还具有比MobileNet v1更小的参数量，这就是MobileNet v2的进步之处，带来如此大的突破的最主要的原因在于MobileNet v2使用了Inverted residual block结构，即逆向残差模块和Linear Bottlenecks，线性激活函数，下面我们来学习一下这两个创新点，很简单。

Inverted residual block结构如下图所示：

左边是Resnet的残差结构，右边是MobileNet v2的逆向残差结构，两个结构很类似，

Residual的流程是： Inverted residual block的流程是：

1、1*1卷积核降维 1、1*1卷积核升维

2、3*3卷积特征提取 2、3*3卷积特征提取

3、1*1卷积升维 3、1*1卷积降维

这里简单解释一下这个升降维是什么关系，降维就是下采样缩小图像，升维就是上采样放大图像，具体的请看我写的这篇博客：目标检测算法中的上采样和下采样的具体含义_什么是上采样_小馨馨的小翟的博客-CSDN博客

由结构图和流程也可以清晰的看出，MobileNet v2是把残差结构的过程反向进行了一下，且Resnet模块中间窄两边宽，MobileNet v2的模块是中间宽两边宰，相较于之前是反过来了所以称之为逆向残差结构，作者认为深度可分离卷积对于通道较少的低维目标特征提取效果较好，但是对于高维的通道较高的效果就很差，因为通过对特征进行升维，使得卷积核在高维空间内进行特征提取，这样可以达到对高维特征较好的提取方式，所以设计了这样的高维残差结构。

Linear Bottlenecks线性激活函数，relu6激活函数，与原来的relu的不同之处在于，给函数设置了一个上限为6，在x=6及其以后的值都为6，如下图所示：

作者认为relu激活函数会损失低维特征信息，然后改成了relu6，据说这个效果好点，可以避免低维特征的损失。

好的原理说完了，简单下面就是开源代码的时刻：

MobileNet v2 训练和预测代码：

import torch
import torchvision
import torchvision.models

from matplotlib import pyplot as plt
from tqdm import tqdm
from torch import nn
from torch.utils.data import DataLoader
from torchvision.transforms import transforms


data_transform = {
    "train": transforms.Compose([transforms.RandomResizedCrop(120),
                                 transforms.RandomHorizontalFlip(),
                                 transforms.ToTensor(),
                                 transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]),
    "val": transforms.Compose([transforms.Resize((120, 120)),  # cannot 224, must (224, 224)
                               transforms.ToTensor(),
                               transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])}

train_data = torchvision.datasets.ImageFolder(root = "./data/train" ,   transform = data_transform["train"])

traindata = DataLoader(dataset=train_data, batch_size=128, shuffle=True, num_workers=0)  # 将训练数据以每次32张图片的形式抽出进行训练

test_data = torchvision.datasets.ImageFolder(root = "./data/val" , transform = data_transform["val"])

train_size = len(train_data)  # 训练集的长度
test_size = len(test_data)  # 测试集的长度
print(train_size)   #输出训练集长度看一下，相当于看看有几张图片
print(test_size)    #输出测试集长度看一下，相当于看看有几张图片
testdata = DataLoader(dataset=test_data, batch_size=128, shuffle=True, num_workers=0)  # 将训练数据以每次32张图片的形式抽出进行测试

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print("using {} device.".format(device))


def _make_divisible(ch, divisor=8, min_ch=None):
    """
    This function is taken from the original tf repo.
    It ensures that all layers have a channel number that is divisible by 8
    It can be seen here:
    https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
    """
    if min_ch is None:
        min_ch = divisor
    new_ch = max(min_ch, int(ch + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_ch < 0.9 * ch:
        new_ch += divisor
    return new_ch


class ConvBNReLU(nn.Sequential):
    def __init__(self, in_channel, out_channel, kernel_size=3, stride=1, groups=1):
        padding = (kernel_size - 1) // 2
        super(ConvBNReLU, self).__init__(
            nn.Conv2d(in_channel, out_channel, kernel_size, stride, padding, groups=groups, bias=False),
            nn.BatchNorm2d(out_channel),
            nn.ReLU6(inplace=True)
        )


class InvertedResidual(nn.Module):
    def __init__(self, in_channel, out_channel, stride, expand_ratio):
        super(InvertedResidual, self).__init__()
        hidden_channel = in_channel * expand_ratio
        self.use_shortcut = stride == 1 and in_channel == out_channel

        layers = []
        if expand_ratio != 1:
            # 1x1 pointwise conv
            layers.append(ConvBNReLU(in_channel, hidden_channel, kernel_size=1))
        layers.extend([
            # 3x3 depthwise conv
            ConvBNReLU(hidden_channel, hidden_channel, stride=stride, groups=hidden_channel),
            # 1x1 pointwise conv(linear)
            nn.Conv2d(hidden_channel, out_channel, kernel_size=1, bias=False),
            nn.BatchNorm2d(out_channel),
        ])

        self.conv = nn.Sequential(*layers)

    def forward(self, x):
        if self.use_shortcut:
            return x + self.conv(x)
        else:
            return self.conv(x)


class MobileNetV2(nn.Module):
    def __init__(self, num_classes=2, alpha=1.0, round_nearest=8):
        super(MobileNetV2, self).__init__()
        block = InvertedResidual
        input_channel = _make_divisible(32 * alpha, round_nearest)
        last_channel = _make_divisible(1280 * alpha, round_nearest)

        inverted_residual_setting = [
            # t, c, n, s
            [1, 16, 1, 1],
            [6, 24, 2, 2],
            [6, 32, 3, 2],
            [6, 64, 4, 2],
            [6, 96, 3, 1],
            [6, 160, 3, 2],
            [6, 320, 1, 1],
        ]

        features = []
        # conv1 layer
        features.append(ConvBNReLU(3, input_channel, stride=2))
        # building inverted residual residual blockes
        for t, c, n, s in inverted_residual_setting:
            output_channel = _make_divisible(c * alpha, round_nearest)
            for i in range(n):
                stride = s if i == 0 else 1
                features.append(block(input_channel, output_channel, stride, expand_ratio=t))
                input_channel = output_channel
        # building last several layers
        features.append(ConvBNReLU(input_channel, last_channel, 1))
        # combine feature layers
        self.features = nn.Sequential(*features)

        # building classifier
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.classifier = nn.Sequential(
            nn.Dropout(0.2),
            nn.Linear(last_channel, num_classes)
        )

        # weight initialization
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out')
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)

    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x


mobilenet = MobileNetV2(num_classes=2)   #将模型命名为mobilenet，这里num_classes是数据集的 #种类，我用的猫狗数据集两类，所以等于2 你设置成你数据集的种类即可
mobilenet.to(device)
print(mobilenet.to(device))  #输出模型结构


test1 = torch.ones(64, 3, 120, 120)  # 测试一下输出的形状大小 输入一个64,3,120,120的向量

test1 = mobilenet(test1.to(device))    #将向量打入神经网络进行测试
print(test1.shape)  #查看输出的结果

epoch = 1  # 迭代次数即训练次数
learning = 0.0001  # 学习率
optimizer = torch.optim.Adam(mobilenet.parameters(), lr=learning)  # 使用Adam优化器-写论文的话可以具体查一下这个优化器的原理
loss = nn.CrossEntropyLoss()  # 损失计算方式，交叉熵损失函数

train_loss_all = []  # 存放训练集损失的数组
train_accur_all = []  # 存放训练集准确率的数组
test_loss_all = []  # 存放测试集损失的数组
test_accur_all = []  # 存放测试集准确率的数组
for i in range(epoch):  #开始迭代
    train_loss = 0   #训练集的损失初始设为0
    train_num = 0.0   #
    train_accuracy = 0.0  #训练集的准确率初始设为0
    mobilenet.train()   #将模型设置成 训练模式
    train_bar = tqdm(traindata)  #用于进度条显示，没啥实际用处
    for step, data in enumerate(train_bar):  #开始迭代跑， enumerate这个函数不懂可以查查，将训练集分为 data是序号，data是数据
        img, target = data    #将data 分位 img图片，target标签
        optimizer.zero_grad()  # 清空历史梯度
        outputs = mobilenet(img.to(device))  # 将图片打入网络进行训练,outputs是输出的结果

        loss1 = loss(outputs, target.to(device))  # 计算神经网络输出的结果outputs与图片真实标签target的差别-这就是我们通常情况下称为的损失
        outputs = torch.argmax(outputs, 1)   #会输出10个值，最大的值就是我们预测的结果 求最大值
        loss1.backward()   #神经网络反向传播
        optimizer.step()  #梯度优化 用上面的abam优化
        train_loss = train_loss + loss1.item()  #将所有损失的绝对值加起来
        accuracy = torch.sum(outputs == target.to(device))   #outputs == target的 即使预测正确的，统计预测正确的个数,从而计算准确率
        train_accuracy = train_accuracy + accuracy   #求训练集的准确率
        train_num += img.size(0)  #

    print("epoch：{} ， train-Loss：{} , train-accuracy：{}".format(i + 1, train_loss / train_num,   #输出训练情况
                                                                train_accuracy / train_num))
    train_loss_all.append(train_loss / train_num)   #将训练的损失放到一个列表里 方便后续画图
    train_accur_all.append(train_accuracy.double().item() / train_num)#训练集的准确率
    test_loss = 0   #同上 测试损失
    test_accuracy = 0.0  #测试准确率
    test_num = 0
    mobilenet.eval()   #将模型调整为测试模型
    with torch.no_grad():  #清空历史梯度，进行测试  与训练最大的区别是测试过程中取消了反向传播
        test_bar = tqdm(testdata)
        for data in test_bar:
            img, target = data

            outputs = mobilenet(img.to(device))
            loss2 = loss(outputs, target.to(device)).cpu()
            outputs = torch.argmax(outputs, 1)
            test_loss = test_loss + loss2.item()
            accuracy = torch.sum(outputs == target.to(device))
            test_accuracy = test_accuracy + accuracy
            test_num += img.size(0)

    print("test-Loss：{} , test-accuracy：{}".format(test_loss / test_num, test_accuracy / test_num))
    test_loss_all.append(test_loss / test_num)
    test_accur_all.append(test_accuracy.double().item() / test_num)

#下面的是画图过程，将上述存放的列表  画出来即可，分别画出训练集和测试集的损失和准确率图
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(range(epoch), train_loss_all,
         "ro-", label="Train loss")
plt.plot(range(epoch), test_loss_all,
         "bs-", label="test loss")
plt.legend()
plt.xlabel("epoch")
plt.ylabel("Loss")
plt.subplot(1, 2, 2)
plt.plot(range(epoch), train_accur_all,
         "ro-", label="Train accur")
plt.plot(range(epoch), test_accur_all,
         "bs-", label="test accur")
plt.xlabel("epoch")
plt.ylabel("acc")
plt.legend()
plt.show()

torch.save(mobilenet, "mobile_V2.pth")
print("模型已保存")

MobileNet v2 预测代码：

import torch
from PIL import Image
from torch import nn
from torchvision.transforms import transforms

image_path = "1.jpg"#相对路径 导入图片，图片放在当前文件夹下就行
trans = transforms.Compose([transforms.Resize((120 , 120)),
                           transforms.ToTensor()])   #将图片缩放为跟训练集图片的大小一样 方便预测，且将图片转换为张量
image = Image.open(image_path)  #打开图片
# print(image)  #输出图片 看看图片格式
image = image.convert("RGB")  #将图片转换为RGB格式
image = trans(image)   #上述的缩放和转张量操作在这里实现
# print(image)   #查看转换后的样子
image = torch.unsqueeze(image, dim=0)  #将图片维度扩展一维

classes = ["cat" , "dog" ]  #预测种类，我这里用的猫狗数据集，所以是这两种，你调成你的种类即可

class ConvBNReLU(nn.Sequential):
    def __init__(self, in_channel, out_channel, kernel_size=3, stride=1, groups=1):
        padding = (kernel_size - 1) // 2
        super(ConvBNReLU, self).__init__(
            nn.Conv2d(in_channel, out_channel, kernel_size, stride, padding, groups=groups, bias=False),
            nn.BatchNorm2d(out_channel),
            nn.ReLU6(inplace=True)
        )


class InvertedResidual(nn.Module):
    def __init__(self, in_channel, out_channel, stride, expand_ratio):
        super(InvertedResidual, self).__init__()
        hidden_channel = in_channel * expand_ratio
        self.use_shortcut = stride == 1 and in_channel == out_channel

        layers = []
        if expand_ratio != 1:
            # 1x1 pointwise conv
            layers.append(ConvBNReLU(in_channel, hidden_channel, kernel_size=1))
        layers.extend([
            # 3x3 depthwise conv
            ConvBNReLU(hidden_channel, hidden_channel, stride=stride, groups=hidden_channel),
            # 1x1 pointwise conv(linear)
            nn.Conv2d(hidden_channel, out_channel, kernel_size=1, bias=False),
            nn.BatchNorm2d(out_channel),
        ])

        self.conv = nn.Sequential(*layers)

    def forward(self, x):
        if self.use_shortcut:
            return x + self.conv(x)
        else:
            return self.conv(x)


class MobileNetV2(nn.Module):
    def __init__(self, num_classes=2, alpha=1.0, round_nearest=8):
        super(MobileNetV2, self).__init__()
        block = InvertedResidual
        input_channel = _make_divisible(32 * alpha, round_nearest)
        last_channel = _make_divisible(1280 * alpha, round_nearest)

        inverted_residual_setting = [
            # t, c, n, s
            [1, 16, 1, 1],
            [6, 24, 2, 2],
            [6, 32, 3, 2],
            [6, 64, 4, 2],
            [6, 96, 3, 1],
            [6, 160, 3, 2],
            [6, 320, 1, 1],
        ]

        features = []
        # conv1 layer
        features.append(ConvBNReLU(3, input_channel, stride=2))
        # building inverted residual residual blockes
        for t, c, n, s in inverted_residual_setting:
            output_channel = _make_divisible(c * alpha, round_nearest)
            for i in range(n):
                stride = s if i == 0 else 1
                features.append(block(input_channel, output_channel, stride, expand_ratio=t))
                input_channel = output_channel
        # building last several layers
        features.append(ConvBNReLU(input_channel, last_channel, 1))
        # combine feature layers
        self.features = nn.Sequential(*features)

        # building classifier
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.classifier = nn.Sequential(
            nn.Dropout(0.2),
            nn.Linear(last_channel, num_classes)
        )

        # weight initialization
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out')
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)

    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x



#以上是神经网络结构，因为读取了模型之后代码还得知道神经网络的结构才能进行预测
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")  #将代码放入GPU进行训练
print("using {} device.".format(device))


model = torch.load("mobile_V2.pth")  #读取模型
model.eval()  #关闭梯度，将模型调整为测试模式
with torch.no_grad():  #梯度清零
    outputs = model(image.to(device))  #将图片打入神经网络进行测试
    # print(model)  #输出模型结构
    # print(outputs)  #输出预测的张量数组
    ans = (outputs.argmax(1)).item()  #最大的值即为预测结果，找出最大值在数组中的序号，
    # 对应找其在种类中的序号即可然后输出即为其种类
    print("该图片的种类为：",classes[ans])
    # print(classes[ans])

最终我们来学习一下MobileNet v3 ：

论文下载地址：https://arxiv.org/pdf/1905.02244v2.pdf

MobileNet v3有两种形式MobileNetV3-Large、MobileNetV3-Small，的相较于MobileNet v2的改进之处在于

1、重新设计了特征提取模块block

2、使用NAS搜索的方式获取具体的模型参数

3、重新设计耗时结构层

新设计的block结构，在MobileNet v2的基础上加入了SE通道注意力机制，同时使用hards-wish激活函数，如下图所示。

当stride == 1且

input_c == output_c

才有shortcut连接

MobileNetV3-Large、MobileNetV3-Small的具体网络结构参数如下：

MobileNetV3-Large：

MobileNetV3-Small：

使用新的激活函数hards-wish

使用NAS搜索的方式获取最佳的网络参数，作者通过减慢搜索的延迟性，通过搜索的方式获取最佳网络参数，我个人觉得这方法可能就注定mobilenet v3走不远，因为NAS的方式注定不是最佳的方式，因为搜索的过程毕竟是个漫步边际的过程，这部分的计算量一定会大，一定有更好的办法，感兴趣的可以查查NAS搜索和了解一下我的另一篇博客，RegNet——颠覆常规神经网络认知的卷积神经网络（网络结构详解+详细注释代码+核心思想讲解）——pytorch实现_regnet代码_小馨馨的小翟的博客-CSDN博客

这个里面提出了更好的方法去搜索最佳结构。

重新设计耗时结构层：如下图所示，

1.减少第一个卷积层的卷积核个数(32->16)

2.精简Last Stage

下面上代码MobileNetV3-Large、MobileNetV3-Small的代码都在如下提供，可以看注释进行修改，选择使用那个代码进行训练。

MobileNet v3 训练和预测代码：

MobileNet v3 训练代码：

import torch
from torch import nn, Tensor
import torchvision.models
from torch.nn import functional as F
from matplotlib import pyplot as plt
from tqdm import tqdm
from torch import nn
from torch.utils.data import DataLoader
from torchvision.transforms import transforms
from typing import Callable, List, Optional
from functools import partial
data_transform = {
    "train": transforms.Compose([transforms.RandomResizedCrop(120),
                                 transforms.RandomHorizontalFlip(),
                                 transforms.ToTensor(),
                                 transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]),
    "val": transforms.Compose([transforms.Resize((120, 120)),  # cannot 224, must (224, 224)
                               transforms.ToTensor(),
                               transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])}

train_data = torchvision.datasets.ImageFolder(root = "./data/train" ,   transform = data_transform["train"])

traindata = DataLoader(dataset=train_data, batch_size=128, shuffle=True, num_workers=0)  # 将训练数据以每次32张图片的形式抽出进行训练

test_data = torchvision.datasets.ImageFolder(root = "./data/val" , transform = data_transform["val"])

train_size = len(train_data)  # 训练集的长度
test_size = len(test_data)  # 测试集的长度
print(train_size)   #输出训练集长度看一下，相当于看看有几张图片
print(test_size)    #输出测试集长度看一下，相当于看看有几张图片
testdata = DataLoader(dataset=test_data, batch_size=128, shuffle=True, num_workers=0)  # 将训练数据以每次32张图片的形式抽出进行测试

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print("using {} device.".format(device))


def _make_divisible(ch, divisor=8, min_ch=None):
    """
    This function is taken from the original tf repo.
    It ensures that all layers have a channel number that is divisible by 8
    It can be seen here:
    https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
    """
    if min_ch is None:
        min_ch = divisor
    new_ch = max(min_ch, int(ch + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_ch < 0.9 * ch:
        new_ch += divisor
    return new_ch


class ConvBNActivation(nn.Sequential):
    def __init__(self,
                 in_planes: int,
                 out_planes: int,
                 kernel_size: int = 3,
                 stride: int = 1,
                 groups: int = 1,
                 norm_layer: Optional[Callable[..., nn.Module]] = None,
                 activation_layer: Optional[Callable[..., nn.Module]] = None):
        padding = (kernel_size - 1) // 2
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        if activation_layer is None:
            activation_layer = nn.ReLU6
        super(ConvBNActivation, self).__init__(nn.Conv2d(in_channels=in_planes,
                                                         out_channels=out_planes,
                                                         kernel_size=kernel_size,
                                                         stride=stride,
                                                         padding=padding,
                                                         groups=groups,
                                                         bias=False),
                                               norm_layer(out_planes),
                                               activation_layer(inplace=True))


class SqueezeExcitation(nn.Module):
    def __init__(self, input_c: int, squeeze_factor: int = 4):
        super(SqueezeExcitation, self).__init__()
        squeeze_c = _make_divisible(input_c // squeeze_factor, 8)
        self.fc1 = nn.Conv2d(input_c, squeeze_c, 1)
        self.fc2 = nn.Conv2d(squeeze_c, input_c, 1)

    def forward(self, x: Tensor) -> Tensor:
        scale = F.adaptive_avg_pool2d(x, output_size=(1, 1))
        scale = self.fc1(scale)
        scale = F.relu(scale, inplace=True)
        scale = self.fc2(scale)
        scale = F.hardsigmoid(scale, inplace=True)
        return scale * x


class InvertedResidualConfig:
    def __init__(self,
                 input_c: int,
                 kernel: int,
                 expanded_c: int,
                 out_c: int,
                 use_se: bool,
                 activation: str,
                 stride: int,
                 width_multi: float):
        self.input_c = self.adjust_channels(input_c, width_multi)
        self.kernel = kernel
        self.expanded_c = self.adjust_channels(expanded_c, width_multi)
        self.out_c = self.adjust_channels(out_c, width_multi)
        self.use_se = use_se
        self.use_hs = activation == "HS"  # whether using h-swish activation
        self.stride = stride

    @staticmethod
    def adjust_channels(channels: int, width_multi: float):
        return _make_divisible(channels * width_multi, 8)


class InvertedResidual(nn.Module):
    def __init__(self,
                 cnf: InvertedResidualConfig,
                 norm_layer: Callable[..., nn.Module]):
        super(InvertedResidual, self).__init__()

        if cnf.stride not in [1, 2]:
            raise ValueError("illegal stride value.")

        self.use_res_connect = (cnf.stride == 1 and cnf.input_c == cnf.out_c)

        layers: List[nn.Module] = []
        activation_layer = nn.Hardswish if cnf.use_hs else nn.ReLU

        # expand
        if cnf.expanded_c != cnf.input_c:
            layers.append(ConvBNActivation(cnf.input_c,
                                           cnf.expanded_c,
                                           kernel_size=1,
                                           norm_layer=norm_layer,
                                           activation_layer=activation_layer))

        # depthwise
        layers.append(ConvBNActivation(cnf.expanded_c,
                                       cnf.expanded_c,
                                       kernel_size=cnf.kernel,
                                       stride=cnf.stride,
                                       groups=cnf.expanded_c,
                                       norm_layer=norm_layer,
                                       activation_layer=activation_layer))

        if cnf.use_se:
            layers.append(SqueezeExcitation(cnf.expanded_c))

        # project
        layers.append(ConvBNActivation(cnf.expanded_c,
                                       cnf.out_c,
                                       kernel_size=1,
                                       norm_layer=norm_layer,
                                       activation_layer=nn.Identity))

        self.block = nn.Sequential(*layers)
        self.out_channels = cnf.out_c
        self.is_strided = cnf.stride > 1

    def forward(self, x: Tensor) -> Tensor:
        result = self.block(x)
        if self.use_res_connect:
            result += x

        return result


class MobileNetV3(nn.Module):
    def __init__(self,
                 inverted_residual_setting: List[InvertedResidualConfig],
                 last_channel: int,
                 num_classes: int = 2,
                 block: Optional[Callable[..., nn.Module]] = None,
                 norm_layer: Optional[Callable[..., nn.Module]] = None):
        super(MobileNetV3, self).__init__()

        if not inverted_residual_setting:
            raise ValueError("The inverted_residual_setting should not be empty.")
        elif not (isinstance(inverted_residual_setting, List) and
                  all([isinstance(s, InvertedResidualConfig) for s in inverted_residual_setting])):
            raise TypeError("The inverted_residual_setting should be List[InvertedResidualConfig]")

        if block is None:
            block = InvertedResidual

        if norm_layer is None:
            norm_layer = partial(nn.BatchNorm2d, eps=0.001, momentum=0.01)

        layers: List[nn.Module] = []

        # building first layer
        firstconv_output_c = inverted_residual_setting[0].input_c
        layers.append(ConvBNActivation(3,
                                       firstconv_output_c,
                                       kernel_size=3,
                                       stride=2,
                                       norm_layer=norm_layer,
                                       activation_layer=nn.Hardswish))
        # building inverted residual blocks
        for cnf in inverted_residual_setting:
            layers.append(block(cnf, norm_layer))

        # building last several layers
        lastconv_input_c = inverted_residual_setting[-1].out_c
        lastconv_output_c = 6 * lastconv_input_c
        layers.append(ConvBNActivation(lastconv_input_c,
                                       lastconv_output_c,
                                       kernel_size=1,
                                       norm_layer=norm_layer,
                                       activation_layer=nn.Hardswish))
        self.features = nn.Sequential(*layers)
        self.avgpool = nn.AdaptiveAvgPool2d(1)
        self.classifier = nn.Sequential(nn.Linear(lastconv_output_c, last_channel),
                                        nn.Hardswish(inplace=True),
                                        nn.Dropout(p=0.2, inplace=True),
                                        nn.Linear(last_channel, num_classes))

        # initial weights
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode="fan_out")
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)

    def _forward_impl(self, x: Tensor) -> Tensor:
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)

        return x

    def forward(self, x: Tensor) -> Tensor:
        return self._forward_impl(x)


def mobilenet_v3_large(num_classes: int = 1000,
                       reduced_tail: bool = False) -> MobileNetV3:
    """
    Constructs a large MobileNetV3 architecture from
    "Searching for MobileNetV3" <https://arxiv.org/abs/1905.02244>.

    weights_link:
    https://download.pytorch.org/models/mobilenet_v3_large-8738ca79.pth

    Args:
        num_classes (int): number of classes
        reduced_tail (bool): If True, reduces the channel counts of all feature layers
            between C4 and C5 by 2. It is used to reduce the channel redundancy in the
            backbone for Detection and Segmentation.
    """
    width_multi = 1.0
    bneck_conf = partial(InvertedResidualConfig, width_multi=width_multi)
    adjust_channels = partial(InvertedResidualConfig.adjust_channels, width_multi=width_multi)

    reduce_divider = 2 if reduced_tail else 1

    inverted_residual_setting = [
        # input_c, kernel, expanded_c, out_c, use_se, activation, stride
        bneck_conf(16, 3, 16, 16, False, "RE", 1),
        bneck_conf(16, 3, 64, 24, False, "RE", 2),  # C1
        bneck_conf(24, 3, 72, 24, False, "RE", 1),
        bneck_conf(24, 5, 72, 40, True, "RE", 2),  # C2
        bneck_conf(40, 5, 120, 40, True, "RE", 1),
        bneck_conf(40, 5, 120, 40, True, "RE", 1),
        bneck_conf(40, 3, 240, 80, False, "HS", 2),  # C3
        bneck_conf(80, 3, 200, 80, False, "HS", 1),
        bneck_conf(80, 3, 184, 80, False, "HS", 1),
        bneck_conf(80, 3, 184, 80, False, "HS", 1),
        bneck_conf(80, 3, 480, 112, True, "HS", 1),
        bneck_conf(112, 3, 672, 112, True, "HS", 1),
        bneck_conf(112, 5, 672, 160 // reduce_divider, True, "HS", 2),  # C4
        bneck_conf(160 // reduce_divider, 5, 960 // reduce_divider, 160 // reduce_divider, True, "HS", 1),
        bneck_conf(160 // reduce_divider, 5, 960 // reduce_divider, 160 // reduce_divider, True, "HS", 1),
    ]
    last_channel = adjust_channels(1280 // reduce_divider)  # C5

    return MobileNetV3(inverted_residual_setting=inverted_residual_setting,
                       last_channel=last_channel,
                       num_classes=num_classes)


def mobilenet_v3_small(num_classes: int = 2,
                       reduced_tail: bool = False) -> MobileNetV3:
    """
    Constructs a large MobileNetV3 architecture from
    "Searching for MobileNetV3" <https://arxiv.org/abs/1905.02244>.

    weights_link:
    https://download.pytorch.org/models/mobilenet_v3_small-047dcff4.pth

    Args:
        num_classes (int): number of classes
        reduced_tail (bool): If True, reduces the channel counts of all feature layers
            between C4 and C5 by 2. It is used to reduce the channel redundancy in the
            backbone for Detection and Segmentation.
    """
    width_multi = 1.0
    bneck_conf = partial(InvertedResidualConfig, width_multi=width_multi)
    adjust_channels = partial(InvertedResidualConfig.adjust_channels, width_multi=width_multi)

    reduce_divider = 2 if reduced_tail else 1

    inverted_residual_setting = [
        # input_c, kernel, expanded_c, out_c, use_se, activation, stride
        bneck_conf(16, 3, 16, 16, True, "RE", 2),  # C1
        bneck_conf(16, 3, 72, 24, False, "RE", 2),  # C2
        bneck_conf(24, 3, 88, 24, False, "RE", 1),
        bneck_conf(24, 5, 96, 40, True, "HS", 2),  # C3
        bneck_conf(40, 5, 240, 40, True, "HS", 1),
        bneck_conf(40, 5, 240, 40, True, "HS", 1),
        bneck_conf(40, 5, 120, 48, True, "HS", 1),
        bneck_conf(48, 5, 144, 48, True, "HS", 1),
        bneck_conf(48, 5, 288, 96 // reduce_divider, True, "HS", 2),  # C4
        bneck_conf(96 // reduce_divider, 5, 576 // reduce_divider, 96 // reduce_divider, True, "HS", 1),
        bneck_conf(96 // reduce_divider, 5, 576 // reduce_divider, 96 // reduce_divider, True, "HS", 1)
    ]
    last_channel = adjust_channels(1024 // reduce_divider)  # C5

    return MobileNetV3(inverted_residual_setting=inverted_residual_setting,
                       last_channel=last_channel,
                       num_classes=num_classes)


mobilenet = mobilenet_v3_large(num_classes=2)   #将模型命名为mobilenet，这里num_classes是数据集的种类，我用的猫狗数据集两类，所以等于2 你设置成你数据集的种类即可
#上面用的是mobilenet_v3_large，如果想用mobilenet_v3_small，直接把上面的mobilenet_v3_large替换成mobilenet_v3_small即可
mobilenet.to(device)

print(mobilenet.to(device))  #输出模型结构


test1 = torch.ones(64, 3, 120, 120)  # 测试一下输出的形状大小 输入一个64,3,120,120的向量

test1 = mobilenet(test1.to(device))    #将向量打入神经网络进行测试
print(test1.shape)  #查看输出的结果

epoch = 1  # 迭代次数即训练次数
learning = 0.0001  # 学习率
optimizer = torch.optim.Adam(mobilenet.parameters(), lr=learning)  # 使用Adam优化器-写论文的话可以具体查一下这个优化器的原理
loss = nn.CrossEntropyLoss()  # 损失计算方式，交叉熵损失函数

train_loss_all = []  # 存放训练集损失的数组
train_accur_all = []  # 存放训练集准确率的数组
test_loss_all = []  # 存放测试集损失的数组
test_accur_all = []  # 存放测试集准确率的数组
for i in range(epoch):  #开始迭代
    train_loss = 0   #训练集的损失初始设为0
    train_num = 0.0   #
    train_accuracy = 0.0  #训练集的准确率初始设为0
    mobilenet.train()   #将模型设置成 训练模式
    train_bar = tqdm(traindata)  #用于进度条显示，没啥实际用处
    for step, data in enumerate(train_bar):  #开始迭代跑， enumerate这个函数不懂可以查查，将训练集分为 data是序号，data是数据
        img, target = data    #将data 分位 img图片，target标签
        optimizer.zero_grad()  # 清空历史梯度
        outputs = mobilenet(img.to(device))  # 将图片打入网络进行训练,outputs是输出的结果

        loss1 = loss(outputs, target.to(device))  # 计算神经网络输出的结果outputs与图片真实标签target的差别-这就是我们通常情况下称为的损失
        outputs = torch.argmax(outputs, 1)   #会输出10个值，最大的值就是我们预测的结果 求最大值
        loss1.backward()   #神经网络反向传播
        optimizer.step()  #梯度优化 用上面的abam优化
        train_loss = train_loss + loss1.item()  #将所有损失的绝对值加起来
        accuracy = torch.sum(outputs == target.to(device))   #outputs == target的 即使预测正确的，统计预测正确的个数,从而计算准确率
        train_accuracy = train_accuracy + accuracy   #求训练集的准确率
        train_num += img.size(0)  #

    print("epoch：{} ， train-Loss：{} , train-accuracy：{}".format(i + 1, train_loss / train_num,   #输出训练情况
                                                                train_accuracy / train_num))
    train_loss_all.append(train_loss / train_num)   #将训练的损失放到一个列表里 方便后续画图
    train_accur_all.append(train_accuracy.double().item() / train_num)#训练集的准确率
    test_loss = 0   #同上 测试损失
    test_accuracy = 0.0  #测试准确率
    test_num = 0
    mobilenet.eval()   #将模型调整为测试模型
    with torch.no_grad():  #清空历史梯度，进行测试  与训练最大的区别是测试过程中取消了反向传播
        test_bar = tqdm(testdata)
        for data in test_bar:
            img, target = data

            outputs = mobilenet(img.to(device))
            loss2 = loss(outputs, target.to(device)).cpu()
            outputs = torch.argmax(outputs, 1)
            test_loss = test_loss + loss2.item()
            accuracy = torch.sum(outputs == target.to(device))
            test_accuracy = test_accuracy + accuracy
            test_num += img.size(0)

    print("test-Loss：{} , test-accuracy：{}".format(test_loss / test_num, test_accuracy / test_num))
    test_loss_all.append(test_loss / test_num)
    test_accur_all.append(test_accuracy.double().item() / test_num)

#下面的是画图过程，将上述存放的列表  画出来即可，分别画出训练集和测试集的损失和准确率图
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(range(epoch), train_loss_all,
         "ro-", label="Train loss")
plt.plot(range(epoch), test_loss_all,
         "bs-", label="test loss")
plt.legend()
plt.xlabel("epoch")
plt.ylabel("Loss")
plt.subplot(1, 2, 2)
plt.plot(range(epoch), train_accur_all,
         "ro-", label="Train accur")
plt.plot(range(epoch), test_accur_all,
         "bs-", label="test accur")
plt.xlabel("epoch")
plt.ylabel("acc")
plt.legend()
plt.show()

torch.save(mobilenet, "mobile_V3.pth")
print("模型已保存")

MobileNet v3 预测代码：

import torch
from PIL import Image
from torch import nn
from torchvision.transforms import transforms
from typing import Callable, List, Optional
from torch import nn, Tensor
from torch.nn import functional as F
image_path = "1.jpg"#相对路径 导入图片，图片放在当前文件夹下就行
trans = transforms.Compose([transforms.Resize((120 , 120)),
                           transforms.ToTensor()])   #将图片缩放为跟训练集图片的大小一样 方便预测，且将图片转换为张量
image = Image.open(image_path)  #打开图片
# print(image)  #输出图片 看看图片格式
image = image.convert("RGB")  #将图片转换为RGB格式
image = trans(image)   #上述的缩放和转张量操作在这里实现
# print(image)   #查看转换后的样子
image = torch.unsqueeze(image, dim=0)  #将图片维度扩展一维

classes = ["cat" , "dog" ]  #预测种类，我这里用的猫狗数据集，所以是这两种，你调成你的种类即可

def _make_divisible(ch, divisor=8, min_ch=None):
    """
    This function is taken from the original tf repo.
    It ensures that all layers have a channel number that is divisible by 8
    It can be seen here:
    https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
    """
    if min_ch is None:
        min_ch = divisor
    new_ch = max(min_ch, int(ch + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_ch < 0.9 * ch:
        new_ch += divisor
    return new_ch


class ConvBNActivation(nn.Sequential):
    def __init__(self,
                 in_planes: int,
                 out_planes: int,
                 kernel_size: int = 3,
                 stride: int = 1,
                 groups: int = 1,
                 norm_layer: Optional[Callable[..., nn.Module]] = None,
                 activation_layer: Optional[Callable[..., nn.Module]] = None):
        padding = (kernel_size - 1) // 2
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        if activation_layer is None:
            activation_layer = nn.ReLU6
        super(ConvBNActivation, self).__init__(nn.Conv2d(in_channels=in_planes,
                                                         out_channels=out_planes,
                                                         kernel_size=kernel_size,
                                                         stride=stride,
                                                         padding=padding,
                                                         groups=groups,
                                                         bias=False),
                                               norm_layer(out_planes),
                                               activation_layer(inplace=True))


class SqueezeExcitation(nn.Module):
    def __init__(self, input_c: int, squeeze_factor: int = 4):
        super(SqueezeExcitation, self).__init__()
        squeeze_c = _make_divisible(input_c // squeeze_factor, 8)
        self.fc1 = nn.Conv2d(input_c, squeeze_c, 1)
        self.fc2 = nn.Conv2d(squeeze_c, input_c, 1)

    def forward(self, x: Tensor) -> Tensor:
        scale = F.adaptive_avg_pool2d(x, output_size=(1, 1))
        scale = self.fc1(scale)
        scale = F.relu(scale, inplace=True)
        scale = self.fc2(scale)
        scale = F.hardsigmoid(scale, inplace=True)
        return scale * x


class InvertedResidualConfig:
    def __init__(self,
                 input_c: int,
                 kernel: int,
                 expanded_c: int,
                 out_c: int,
                 use_se: bool,
                 activation: str,
                 stride: int,
                 width_multi: float):
        self.input_c = self.adjust_channels(input_c, width_multi)
        self.kernel = kernel
        self.expanded_c = self.adjust_channels(expanded_c, width_multi)
        self.out_c = self.adjust_channels(out_c, width_multi)
        self.use_se = use_se
        self.use_hs = activation == "HS"  # whether using h-swish activation
        self.stride = stride

    @staticmethod
    def adjust_channels(channels: int, width_multi: float):
        return _make_divisible(channels * width_multi, 8)


class InvertedResidual(nn.Module):
    def __init__(self,
                 cnf: InvertedResidualConfig,
                 norm_layer: Callable[..., nn.Module]):
        super(InvertedResidual, self).__init__()

        if cnf.stride not in [1, 2]:
            raise ValueError("illegal stride value.")

        self.use_res_connect = (cnf.stride == 1 and cnf.input_c == cnf.out_c)

        layers: List[nn.Module] = []
        activation_layer = nn.Hardswish if cnf.use_hs else nn.ReLU

        # expand
        if cnf.expanded_c != cnf.input_c:
            layers.append(ConvBNActivation(cnf.input_c,
                                           cnf.expanded_c,
                                           kernel_size=1,
                                           norm_layer=norm_layer,
                                           activation_layer=activation_layer))

        # depthwise
        layers.append(ConvBNActivation(cnf.expanded_c,
                                       cnf.expanded_c,
                                       kernel_size=cnf.kernel,
                                       stride=cnf.stride,
                                       groups=cnf.expanded_c,
                                       norm_layer=norm_layer,
                                       activation_layer=activation_layer))

        if cnf.use_se:
            layers.append(SqueezeExcitation(cnf.expanded_c))

        # project
        layers.append(ConvBNActivation(cnf.expanded_c,
                                       cnf.out_c,
                                       kernel_size=1,
                                       norm_layer=norm_layer,
                                       activation_layer=nn.Identity))

        self.block = nn.Sequential(*layers)
        self.out_channels = cnf.out_c
        self.is_strided = cnf.stride > 1

    def forward(self, x: Tensor) -> Tensor:
        result = self.block(x)
        if self.use_res_connect:
            result += x

        return result


class MobileNetV3(nn.Module):
    def __init__(self,
                 inverted_residual_setting: List[InvertedResidualConfig],
                 last_channel: int,
                 num_classes: int = 2,
                 block: Optional[Callable[..., nn.Module]] = None,
                 norm_layer: Optional[Callable[..., nn.Module]] = None):
        super(MobileNetV3, self).__init__()

        if not inverted_residual_setting:
            raise ValueError("The inverted_residual_setting should not be empty.")
        elif not (isinstance(inverted_residual_setting, List) and
                  all([isinstance(s, InvertedResidualConfig) for s in inverted_residual_setting])):
            raise TypeError("The inverted_residual_setting should be List[InvertedResidualConfig]")

        if block is None:
            block = InvertedResidual

        if norm_layer is None:
            norm_layer = partial(nn.BatchNorm2d, eps=0.001, momentum=0.01)

        layers: List[nn.Module] = []

        # building first layer
        firstconv_output_c = inverted_residual_setting[0].input_c
        layers.append(ConvBNActivation(3,
                                       firstconv_output_c,
                                       kernel_size=3,
                                       stride=2,
                                       norm_layer=norm_layer,
                                       activation_layer=nn.Hardswish))
        # building inverted residual blocks
        for cnf in inverted_residual_setting:
            layers.append(block(cnf, norm_layer))

        # building last several layers
        lastconv_input_c = inverted_residual_setting[-1].out_c
        lastconv_output_c = 6 * lastconv_input_c
        layers.append(ConvBNActivation(lastconv_input_c,
                                       lastconv_output_c,
                                       kernel_size=1,
                                       norm_layer=norm_layer,
                                       activation_layer=nn.Hardswish))
        self.features = nn.Sequential(*layers)
        self.avgpool = nn.AdaptiveAvgPool2d(1)
        self.classifier = nn.Sequential(nn.Linear(lastconv_output_c, last_channel),
                                        nn.Hardswish(inplace=True),
                                        nn.Dropout(p=0.2, inplace=True),
                                        nn.Linear(last_channel, num_classes))

        # initial weights
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode="fan_out")
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)

    def _forward_impl(self, x: Tensor) -> Tensor:
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)

        return x

    def forward(self, x: Tensor) -> Tensor:
        return self._forward_impl(x)


def mobilenet_v3_large(num_classes: int = 1000,
                       reduced_tail: bool = False) -> MobileNetV3:
    """
    Constructs a large MobileNetV3 architecture from
    "Searching for MobileNetV3" <https://arxiv.org/abs/1905.02244>.

    weights_link:
    https://download.pytorch.org/models/mobilenet_v3_large-8738ca79.pth

    Args:
        num_classes (int): number of classes
        reduced_tail (bool): If True, reduces the channel counts of all feature layers
            between C4 and C5 by 2. It is used to reduce the channel redundancy in the
            backbone for Detection and Segmentation.
    """
    width_multi = 1.0
    bneck_conf = partial(InvertedResidualConfig, width_multi=width_multi)
    adjust_channels = partial(InvertedResidualConfig.adjust_channels, width_multi=width_multi)

    reduce_divider = 2 if reduced_tail else 1

    inverted_residual_setting = [
        # input_c, kernel, expanded_c, out_c, use_se, activation, stride
        bneck_conf(16, 3, 16, 16, False, "RE", 1),
        bneck_conf(16, 3, 64, 24, False, "RE", 2),  # C1
        bneck_conf(24, 3, 72, 24, False, "RE", 1),
        bneck_conf(24, 5, 72, 40, True, "RE", 2),  # C2
        bneck_conf(40, 5, 120, 40, True, "RE", 1),
        bneck_conf(40, 5, 120, 40, True, "RE", 1),
        bneck_conf(40, 3, 240, 80, False, "HS", 2),  # C3
        bneck_conf(80, 3, 200, 80, False, "HS", 1),
        bneck_conf(80, 3, 184, 80, False, "HS", 1),
        bneck_conf(80, 3, 184, 80, False, "HS", 1),
        bneck_conf(80, 3, 480, 112, True, "HS", 1),
        bneck_conf(112, 3, 672, 112, True, "HS", 1),
        bneck_conf(112, 5, 672, 160 // reduce_divider, True, "HS", 2),  # C4
        bneck_conf(160 // reduce_divider, 5, 960 // reduce_divider, 160 // reduce_divider, True, "HS", 1),
        bneck_conf(160 // reduce_divider, 5, 960 // reduce_divider, 160 // reduce_divider, True, "HS", 1),
    ]
    last_channel = adjust_channels(1280 // reduce_divider)  # C5

    return MobileNetV3(inverted_residual_setting=inverted_residual_setting,
                       last_channel=last_channel,
                       num_classes=num_classes)


def mobilenet_v3_small(num_classes: int = 2,
                       reduced_tail: bool = False) -> MobileNetV3:
    """
    Constructs a large MobileNetV3 architecture from
    "Searching for MobileNetV3" <https://arxiv.org/abs/1905.02244>.

    weights_link:
    https://download.pytorch.org/models/mobilenet_v3_small-047dcff4.pth

    Args:
        num_classes (int): number of classes
        reduced_tail (bool): If True, reduces the channel counts of all feature layers
            between C4 and C5 by 2. It is used to reduce the channel redundancy in the
            backbone for Detection and Segmentation.
    """
    width_multi = 1.0
    bneck_conf = partial(InvertedResidualConfig, width_multi=width_multi)
    adjust_channels = partial(InvertedResidualConfig.adjust_channels, width_multi=width_multi)

    reduce_divider = 2 if reduced_tail else 1

    inverted_residual_setting = [
        # input_c, kernel, expanded_c, out_c, use_se, activation, stride
        bneck_conf(16, 3, 16, 16, True, "RE", 2),  # C1
        bneck_conf(16, 3, 72, 24, False, "RE", 2),  # C2
        bneck_conf(24, 3, 88, 24, False, "RE", 1),
        bneck_conf(24, 5, 96, 40, True, "HS", 2),  # C3
        bneck_conf(40, 5, 240, 40, True, "HS", 1),
        bneck_conf(40, 5, 240, 40, True, "HS", 1),
        bneck_conf(40, 5, 120, 48, True, "HS", 1),
        bneck_conf(48, 5, 144, 48, True, "HS", 1),
        bneck_conf(48, 5, 288, 96 // reduce_divider, True, "HS", 2),  # C4
        bneck_conf(96 // reduce_divider, 5, 576 // reduce_divider, 96 // reduce_divider, True, "HS", 1),
        bneck_conf(96 // reduce_divider, 5, 576 // reduce_divider, 96 // reduce_divider, True, "HS", 1)
    ]
    last_channel = adjust_channels(1024 // reduce_divider)  # C5

    return MobileNetV3(inverted_residual_setting=inverted_residual_setting,
                       last_channel=last_channel,
                       num_classes=num_classes)



#以上是神经网络结构，因为读取了模型之后代码还得知道神经网络的结构才能进行预测
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")  #将代码放入GPU进行训练
print("using {} device.".format(device))


model = torch.load("mobile_V3.pth")  #读取模型
model.eval()  #关闭梯度，将模型调整为测试模式
with torch.no_grad():  #梯度清零
    outputs = model(image.to(device))  #将图片打入神经网络进行测试
    # print(model)  #输出模型结构
    # print(outputs)  #输出预测的张量数组
    ans = (outputs.argmax(1)).item()  #最大的值即为预测结果，找出最大值在数组中的序号，
    # 对应找其在种类中的序号即可然后输出即为其种类
    print("该图片的种类为：",classes[ans])
    # print(classes[ans])

网络结构搭建部分注释的很详细，有问题朋友欢迎在评论区指出，感谢！

不懂我代码使用方法的可以看看我之前开源的代码，更为详细：手撕Resnet卷积神经网络-pytorch-详细注释版（可以直接替换自己数据集）-直接放置自己的数据集就能直接跑。跑的代码有问题的可以在评论区指出，看到了会回复。训练代码和预测代码均有。_pytorch 替换自己数据集_小馨馨的小翟的博客-CSDN博客

本代码使用的数据集是猫狗数据集，已经分好训练集和测试集了，下面给出数据集的下载链接。

链接：https://pan.baidu.com/s/1_gUznMQnzI0UhvsV7wPgzw
提取码：3ixd

mobilenet v2 和 mobilenet v3代码下载链接如下：

链接：https://pan.baidu.com/s/1eL8k4OGcL14NPHDqsUTsNA
提取码：72zg