Ghostnet论文实践：Ghost-Resnet56 复现

DeepAlchemy

已于 2023-07-06 10:25:30 修改

阅读量4.7k

点赞数 7

分类专栏：炼丹文章标签：神经网络 pytorch 深度学习

于 2020-03-24 13:52:52 首次发布

本文链接：https://blog.csdn.net/yinkaishikd/article/details/105070337

版权

炼丹专栏收录该内容

5 篇文章 2 订阅

订阅专栏

Ghostnet论文实践：Ghost-Resnet56 复现

Ghostnet 是2020年CVPR的一篇轻量级网络并超越了MobilenetV3。本文旨在探究Ghostnet 中Ghost module的实际效果，主要复现了paper中Ghost-Resnet56，并使用其训练Cifar10。然而并没有达到与文中所述一致的效果，仅在此做一些记录。

构建Ghost-Resnet网络

需求：

构建Ghost-Resnet没有必要从底层一步步实现，只需要有现成的两个部分组合就可以：

- Ghostnet 的作者的github: [iamhankai](https://github.com/iamhankai/ghostnet.pytorch)有关于ghostnet的网络实现，我们仅需要使用其中的Ghost module模块。
- github上寻找一个resnet56的实现，笔者参考了[akamaster](https://github.com/akamaster/pytorch_resnet_cifar10)的实现，并在此基础上进行重构。

实现Ghost-Resnet：

在resnet56代码当中导入Ghost module。from ghost_net import GhostModule
在Ghostnet代码中实现对Ghost module的权重初始化。
- 文章采用了何凯明大神的初始化方法。然而该文档仅对Ghostnet进行初始化，导入到resnet56的初始化与文章不同，因此需要我们在resnet以及Ghost module的实现里重新构筑初始化：
```
def _weights_init(m): 
    if isinstance(m, nn.Conv2d): 
        nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
    elif isinstance(m, nn.BatchNorm2d):
        m.weight.data.fill_(1)
        m.bias.data.zero_()
```
将除了第一层之外的所有卷积都替换为Ghost module。

（关于保留第一层，这里我和作者进行过沟通，虽然原文里明确指出：

All the convolutional layers in these two models are replaced by the proposed Ghost module, and the new models are denoted as Ghost-VGG-16 and Ghost-ResNet-56, respectively.

但是无论观察Ghostnet架构本身还是理性考虑，都应该是保留第一个卷积层。文中的这里应该是会引起歧义的“笔误”？）
运行测试网络并获取参数数量。

在akamaster的resnet实现当中有可以测试出网络参数量的功能，运行代码以测试结果。
从结果可以明显地看出，所有的网络参数都被削减了将近一半，Ghost module确实能够有效的减少网络参数的数量。resnet56的参数也是来到了0.44m，虽然如此还是和文章当中的0.43m有所有点区别

Proper ResNet-s for CIFAR10 (for fair comparision and etc.) has following

number of layers and parameters:

RESNETs: Ghost-RESNETs:

name | layers | params name | layers | params

ResNet20 | 20 | 0.27M gResNet20 | 20 | 140474

ResNet32 | 32 | 0.46M gResNet32 | 32 | 241050

ResNet44 | 44 | 0.66M gResNet44 | 44 | 341626

ResNet56 | 56 | 0.85M gResNet56 | 56 | 442202

ResNet110 | 110 | 1.7M gResNet110 | 110 | 894794

ResNet1202| 1202 | 19.4m gResNet1202| 1202 | 10047210

训练Cifar10

训练的过程完全参考引文[16], 是这么描述的：

4.2.CIFAR-10andAnalysis

…

We use a weight decay of 0.0001 and momentum of 0.9, and adopt the weight initialization in [13] and BN [16] but with no dropout. These models are trained with a minibatch size of 128 on two GPUs. We start with a learning rate of 0.1, divide it by 10 at 32k and 48k iterations, and terminate training at 64k iterations, which is determined on a 45k/5k train/val split. We follow the simple data augmentation in [24] for training: 4 pixels are padded on each side, and a 32×32 crop is randomly sampled from the padded image or its horizontal ﬂip. For testing, we only evaluate the single view of the original 32×32 image.