pytorch实现MobileNet

最新推荐文章于 2024-07-26 08:59:41 发布

winycg

最新推荐文章于 2024-07-26 08:59:41 发布

阅读量1.4w

点赞数 6

分类专栏：深度学习与pytorch

本文链接：https://blog.csdn.net/winycg/article/details/86662347

版权

深度学习与pytorch 专栏收录该内容

49 篇文章

订阅专栏

MobileNet是一种专为移动端和嵌入式设备设计的轻量级深度神经网络，通过采用深度可分离卷积替代传统卷积，显著降低了参数量和计算量，实现了在CPU上的高效运行。其创新点在于将标准卷积分解为深度卷积和逐点卷积，大幅减少了计算成本，尤其是在3x3卷积核下，计算量减少8-9倍，仅损失少量精度。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

论文：https://arxiv.org/pdf/1704.04861.pdf

背景：

为移动端和嵌入式端深度学习应用设计的网络，使得在cpu上也能达到理想的速度要求。2017年4月发布的MobileNet是一个轻量级深度神经网络。

创新点：

主要应用了深度可分离卷积来代替传统的卷积操作，并且放弃pooling层。把标准卷积分解成深度卷积(depthwise convolution)和逐点卷积(pointwise convolution)。这么做的好处是可以大幅度降低参数量和计算量。分解过程示意图如下：
在这里插入图片描述
设输入的特征映射为 $F$ 尺寸为 $D_{F}\times D_{F}\times M$ .
标准卷积核 $K$ 的尺寸为 $D_{K}\times D_{K}\times M\times N$ ,则经过标准卷积核处理后，输出的特征维度为： $D_{F}\times D_{F}\times N$ ,计算量为： $N\cdot D_{F}\cdot D_{F} \cdot M\cdot D_{K}\cdot D_{K}$
其中，一共 $N$ 个卷积核，每个卷积核约进行 $D_{F}\cdot D_{F}$ 次扫描，每次扫描的深度(通道数)为 $M$ ，每一个通道需要进行 $D_{K}\cdot D_{K}$ 次加权求和。

深度卷积的卷积核尺寸为 $D_{K}\times D_{K}\times1\times M$ ,对特征图处理后的输出维度为: $D_{F}\times D_{F}\times M$
计算量为： $M\cdot D_{F}\cdot D_{F} \cdot D_{K}\cdot D_{K}$
使用逐点卷积对深度卷积处理后的特征图操作，输出维度为： $D_{F}\times D_{F}\times N$ ,
计算量为： $N\cdot D_{F}\cdot D_{F} \cdot M$

标准卷积和深度可分离卷积相比，输出的特征图维度相同的，但是计算量的比较为：
$\frac{M\cdot D_{F}\cdot D_{F} \cdot D_{K}\cdot D_{K}+N\cdot D_{F}\cdot D_{F} \cdot M}{N\cdot D_{F}\cdot D_{F} \cdot M\cdot D_{K}\cdot D_{K}}=\frac{1}{N}+\frac{1}{D_{K}^{2}}$
当卷积核的尺寸为 $3\times 3$ 时，与标准卷积相比，深度可分离卷积可以减少8-9倍的计算量，仅仅有很小的准确率损失。

网络结构

卷积层之后对应的BN和ReLU的关系：
在这里插入图片描述
除了第一层为标准的卷积层之外，其他的层都为深度可分离卷积。

宽度因子 $\alpha$ ：
对于深度可分离卷积层，输入的通道数 $M$ 变为 $\alpha M$ ，输出通道数由 $N$ 变为 $\alpha N$ ,此时深度可分离卷积层的计算代价变为：
$\alpha M\cdot D_{F}\cdot D_{F} \cdot D_{K}\cdot D_{K}+\alpha N\cdot D_{F}\cdot D_{F} \cdot \alpha M$
$\alpha\in (0,1]$ ，典型的设置为1,0.75,0.5,0.25.

分辨率因子 $\rho$
对于深度可分离卷积层，输入图像和输出的中间表示的分辨率通过 $\rho$ 来改变，,此时深度可分离卷积层的计算代价变为：
$\alpha M\cdot \rho D_{F}\cdot \rho D_{F} \cdot D_{K}\cdot D_{K}+\alpha N\cdot \rho D_{F}\cdot \rho D_{F} \cdot \alpha M$
$\rho\in (0,1]$ ，通过隐式地设置可以使得输入分辨率为224,192,160,128.
通过设置因子可以在资源和准确率之间进行权衡。

import torch
import torch.nn as nn
import torch.nn.functional as F


class Block(nn.Module):
    '''Depthwise conv + Pointwise conv'''
    def __init__(self, in_planes, out_planes, stride=1):
        super(Block, self).__init__()
        self.conv1 = nn.Conv2d\
            (in_planes, in_planes, kernel_size=3, stride=stride, 
             padding=1, groups=in_planes, bias=False)
        self.bn1 = nn.BatchNorm2d(in_planes)
        self.conv2 = nn.Conv2d\
            (in_planes, out_planes, kernel_size=1, 
            stride=1, padding=0, bias=False)
        self.bn2 = nn.BatchNorm2d(out_planes)

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = F.relu(self.bn2(self.conv2(out)))
        return out


class MobileNet(nn.Module):
    # (128,2) means conv planes=128, conv stride=2, 
    # by default conv stride=1
    cfg = [64, (128,2), 128, (256,2), 256, (512,2), 
           512, 512, 512, 512, 512, (1024,2), 1024]

    def __init__(self, num_classes=10):
        super(MobileNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, 
        	stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(32)
        self.layers = self._make_layers(in_planes=32)
        self.linear = nn.Linear(1024, num_classes)

    def _make_layers(self, in_planes):
        layers = []
        for x in self.cfg:
            out_planes = x if isinstance(x, int) else x[0]
            stride = 1 if isinstance(x, int) else x[1]
            layers.append(Block(in_planes, out_planes, stride))
            in_planes = out_planes
        return nn.Sequential(*layers)

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.layers(out)
        out = F.avg_pool2d(out, 2)
        out = out.view(out.size(0), -1)
        out = self.linear(out)
        return out


def test():
    net = MobileNet()
    x = torch.randn(1,3,32,32)
    y = net(x)
    print(y.size())

test()