华为的GhostNet是如何超越Google的MobileNet的

本文链接：https://blog.csdn.net/flyfish1986/article/details/105426366

华为的GhostNet是如何超越Google的MobileNet的

论文作者：华为诺亚方舟实验室
论文链接
 TensorFlow版代码
 PyTorch代码
 GhostNet改造的执行目标检测任务下载地址

推荐先看GhostNet的作者之一王云鹤写的汉字对该论文的阐述
我们不仅有了字母的论文参考，还有汉字的论文参考，多多多多益善

GhostNet的成绩

在ImageNet分类任务，GhostNet在相似计算量情况下Top-1 正确率达75.7%，高于MobileNetV3的75.2%。

对比的是MobileNetV3 Large版本的，计算量相似的意思是GhostNet比MobileNetV3 Large 慢大约1毫秒

白话就是在损失极低速度的情况下，正确率有较大的提升。
先有Ghost Module，再有Ghost Bottleneck，最后组装成了Ghost Net

Ghost Module

paper中的名词，为了解决一个问题而创造的新的概念，继续向下看，就能明白这些名词指代的什么

原始特征图（raw feature map）
幻影特征图（ghost feature map）
冗余特征图（redundant feature map）
内在特征图（intrinsic feature map）

在ResNet-50中，将经过第一个残差块处理后的特征图拿出来，3组特征图，分别用红绿蓝三种颜色标出来，因为他们是相似的，通过一个feature map用简单的线性变换来
生成另一个相似的feature map，这些cheap operations就是图中的小扳手（ linear operations =cheap operations=线性变换）。
在这里插入图片描述
Identitiy是单位映射，恒等映射

Identitiy 对应的是内在特征图（intrinsic feature map）
1,2,3 … k对应的是幻影特征图（ghost feature map）
改下图就是
在这里插入图片描述

 class GhostModule(nn.Module):
    def __init__(self, inp, oup, kernel_size=1, ratio=2, dw_size=3, stride=1, relu=True):
        super(GhostModule, self).__init__()
        self.oup = oup
        init_channels = math.ceil(oup / ratio)
        new_channels = init_channels*(ratio-1)

        self.primary_conv = nn.Sequential(
            nn.Conv2d(inp, init_channels, kernel_size, stride, kernel_size//2, bias=False),
            nn.BatchNorm2d(init_channels),
            nn.ReLU(inplace=True) if relu else nn.Sequential(),
        )

        self.cheap_operation = nn.Sequential(
            nn.Conv2d(init_channels, new_channels, dw_size, 1, dw_size//2, groups=init_channels, bias=False),
            nn.BatchNorm2d(new_channels),
            nn.ReLU(inplace=True) if relu else nn.Sequential(),
        )

    def forward(self, x):
        x1 = self.primary_conv(x)
        x2 = self.cheap_operation(x1)
        out = torch.cat([x1,x2], dim=1)
        return out[:,:self.oup,:,:]

当看到图的颜色的时候，我最新想到是到是下图

在这里插入图片描述

关于如何计算普通卷积的FLOPs看这里，只不过字母换了

Ghost bottleneck

在这里插入图片描述

在这里插入图片描述 PyTorch代码
ghostnet.pytorch/ghost_net.py
Line 88
GhostModule(inp, hidden_dim, kernel_size=1, relu=True),

Line 94
GhostModule(hidden_dim, oup, kernel_size=1, relu=False),

为什么两处不同，一个relu=True，一个relu=False？
答：受到ShuffleNet的启发 :
ShuffleNet: An Extremely Efficient Convolutional Neural Network for MobileDevices
在这里插入图片描述

Ghost Net

在这里插入图片描述对比下GhostNet和MobileNetV3 Large两者的不同
以下MobileNetV3指代MobileNetV3 Large版本
不同点1
MobileNetV3中的64 #exp， 24 #out
GhostNet中改成了48 #exp和24 #out，

不同点2
在MobileNetV3中有两个120 #exp和40 #out层，
GhostNet有一个

不同点3
MobileNetV3两个960 #exp和160 #out层，
GhostNet有其中四个。
GhostNet后面的数字，如1.0, 0.5, 1.3表示神经网络的宽度(深度乘子)，论文里面的 $\alpha$
在这里插入图片描述
GhostNet可以在RetinaNet和Faster R-CNN框架上达到和MobileNetV2和MobileNetV3类似的mAP。
RetinaNet 是单阶段的（one-stage ）
Faster R-CNN 是两阶段的（ two-stage）
该目标检测的代码是在mmdetection框架上做的，mmdetection 是基于PyTorch的开源目标检测框架，截至2020年4月14日，有9k多个星。比facebook research的
detectron2的星还多一些。
Ghost Module充当了深度可分离卷的逐点卷积（pointwise convolution）
Ghost Bottlenet的结构
1、Pointwise convolution（Ghost Module）
2、带 Identity的Depthwise convolution
3、SELayer
4、Pointwise convolution-Linear（Ghost Module）

Ghost Bottleneck看上去与MobileNet v2层级结构十分相似的
MobileNet v2的结构
1、Pointwise convolution
2、Depthwise convolution
3、Pointwise convolution

如果再把MobileNet v3拿过来看，结构相似性又进了一步
MobileNet v3的结构
1、Pointwise convolution
2、Depthwise convolution
3、SELayer
4、Pointwise convolution-Linear

经验

1、GhostNet如果不加SE结构top1会有约0.5个点的损失，加SE是follow MobileNet v3的结构
PyTorch训练的超参可以参考这里 https://github.com/megvii-model/ShuffleNet-Series

2、可以尝试的训练策略
iter=450000, lr=0.4, dropout=0.15 or dropout=0.1
5epochs的warmup和cosine学习率，weight decay 3e-5，1e-5
Ghost-VGG-16和Ghost-ResNet-56在训练时的参数设置可尝试
400 epochs, lr=0.1, 在200,300,375epoch时乘以0.1，weight decay在1e-4, 1e-5调一调

3、代码中使用了clamp，原因是clamp比 sigmoid快

4、d* d cheap op 不仅仅用来生成 ghost feature maps, 也用来扩展感受野就像 MobileNetV3 bottleneck中的k * k depthwise conv

SELayer中的Linear是可以改成卷积的
怎么改看这里使劲下滑看后面或者搜索关键字也行

class SELayer(nn.Module):
    def __init__(self, channel, reduction=4):
        super(SELayer, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Sequential(
                nn.Linear(channel, channel // reduction),
                nn.ReLU(inplace=True),
                nn.Linear(channel // reduction, channel),        )

    def forward(self, x):
        b, c, _, _ = x.size()
        y = self.avg_pool(x).view(b, c)
        y = self.fc(y).view(b, c, 1, 1)
        y = torch.clamp(y, 0, 1)
        return x * y