【SqueezeNet】《SqueezeNet：AlexNet-Level accuracy with 50× fewer parameters and 小于 0.5MB model size》

最新推荐文章于 2023-12-10 15:38:11 发布

bryant_meng

最新推荐文章于 2023-12-10 15:38:11 发布

阅读量443

点赞数 1

分类专栏： CNN / Transformer 文章标签： SqueezeNet bypass conncetion FPGA

本文链接：https://blog.csdn.net/bryant_meng/article/details/86314745

版权

CNN / Transformer 专栏收录该内容

208 篇文章 7 订阅

订阅专栏

在这里插入图片描述

ICRL-2017

1 Background and Motivation

CNN 百家争鸣（accuracy），同一精度（相仿）下，可能有很多网络结构，作者从模型压缩的角度展开，追求相仿精度，更小的模型。

smaller CNN architectures 有如下3点优势：

More efficient distributed training
因为通讯开销正比于模型参数量（Forrest N. Iandola, Khalid Ashraf, MatthewW. Moskewicz, and Kurt Keutzer. FireCaffe: near-linear acceleration of deep neural network training on compute clusters. In CVPR, 2016.）
Less overhead（开支） when exporting new models to clients
eg：自动驾驶，over-the-air update，可以更快，更频繁的更新
Feasible FPGA and embedded deployment

当时 inception-v1、v2、v3、v4 已问世！

The overarching goal of our work is to identify a model that has very few parameters while preserving accuracy.

2 Advantages / Contributions

提出 SqueezeNet，模型压缩的新型网络结构
AlexNet level accuracy on ImageNet，50× fewer parameters（AlexNet 240MB，SqueezeNet 4.8MB），配合压缩技术可以让模型 <0.5 MB（510× smaller than AlexNet）
探讨了网络结构的设计对精度和模型大小的影响（Design Space Exploration，从 microarchitecture 和 macroarchitecture两个方面，详情见后面的方法）

3 Notions / Innovations

3.1 Innovations

提出了 SqueezeNet 网络结构，AlexNet 相仿精度，大大压缩了模型的参数量
探索了 how CNN architecture design choices impact model size and accuracy（从 microarchitecture 和 macroarchitecture 角度展开，有自己深刻的见解）

3.2 Notions

microarchitecture：the organization and dimensionality of individual layers and modules.（卷积的形式，卷积的组合方式-module，卷积核的 number，参考 figure 3）
macroarchitecture：the system-level organization of multiple modules into an end-to-end CNN architecture（本文中涉及到 bypass 结构，也即 residual connections，参考 figure2）

4 Related work

5 Method

5.1 Architecture design strategy

参数量计算方法
$\ of \ input \ channels) * (number\ of\ filters) * (3*3)$

strategy 1：Replace 3x3 filters with 1x1 filters（9X fewer parameters）
strategy 2：Decrease the number of input channels to 3x3 filters
strategy 3：Downsample late in the network so that convolution layers have large activation maps（延迟 down sampling）

关于 strategy 3，Our intuition is that large activation maps (due to delayed downsampling) can lead to higher
classification accuracy. 这个好理解，因为feature map 的 resolution 大，包含的信息可能就越多！极端情况， feature map $2 * 2$ ，卷积核 $3 * 3$ ，这样卷积的效果肯定不好！

strategy 1 和 2 都是从减小参数量的角度考虑的，strategy 3 is about maximizing accuracy on a limited budget of parameters.

5.2 The Fire Module

fire module = squeeze layer + expand layer
在这里插入图片描述
用到了 $1 * 1$ 体现了 strategy 1
$s_{1x1} < e_{1x1} + e_{3x3}$ 体现了 strategy 2

实现的时候如下所示： $1 * 1$ 和 $3 * 3$ 并行，最后 concatenate
在这里插入图片描述

5.3 The SqueezeNet architecture

figure 2 左边结构
在这里插入图片描述

maxpooling 策略体现了 strategy 3，maxpooling 8 已经在很后面了，有点像【Xception】《Xception: Deep Learning with Depthwise Separable Convolutions》，值得注意的是， imagenet 上，往往 down sampling 5 次，而这里仅仅 down 了 4 次。

compression info 是用了 deep compress 中的压缩技术，Data Type 由 32 bit 变成 6 bit

6 Experiments

Dataset：ImageNet

6.1 Evaluation of SqueezeNet

在这里插入图片描述

作者发问？
are small models amenable to compression, or do small models “need” all of the representational power afforded
by dense floating-point values?（SqueezeNet 适合继续压缩吗？有必要 32 bit 表示吗？）

Table 2 最后两行给出了答案，采用 Deep Compression 的压缩技术，在 SqueezeNet 的基础上，还能压缩！
Our small model is indeed amenable to compression.（侧面体现了 Deep Compression 的压缩能力）

改变 Data Type 从 coding 上好像不容易实现！

6.2 CNN microarchitecture design space exploration

However, SqueezeNet and other models reside in a broad and largely unexplored design space of CNN architectures.

microarchitecture：the organization and dimensionality of individual layers and modules.（卷积的形式，卷积的组合方式-module，卷积核的 number，参考 figure 3）

6.2.1 CNN microarchitecture metaparameters

metaparameters 可以理解为 the parameters that are used to control other parameters.
在这里插入图片描述
每个 fire module 有 3 个 hyper parameters（ $s_{1x1},e_{1x1},e_{3x3}$ ），一共 8 个 modules 共 24 个 hyper parameters，为了方便控制这些 hyper parameters，作者设计了一套 metaparameters！具体如下：

规定 $e_i$ 为 the number of expand layer filters， $i$ 表示第 $i$ 个 fire module

expand layer 中有 $1 \times 1$ 和 $3 \times 3$ 卷积， $e_i = e_{i,1x1} + e_{i,3x3}$
设定 expan layer 中 $3 \times 3$ 卷积的比例为 $pct_{3x3}$ ，所以 $e_{i,3x3} = e_i*pct_{3x3}，e_{i,1x1} = e_i*(1-pct_{3x3})$

$s_{i,1x1} = e_i*SR$ ， $S R$ 为 squeeze ratio，0-1 之间。

所以，知道 $e_i$ 就可以通过 $pct_{3x3}$ 推导出 $e_{i,1x1}$ 和 $e_{i,3x3}$ ，知道 $e_i$ 就可以通过 $S R$ 可以推导出 $s_{i,1x1}$

那么整个 fire module 的超参数可以由如下公式计算：
$e_i = base_e + (incr_e)*\left \lfloor \frac{i}{freq}\right \rfloor$

We define $base_e$ as the number of expand filters in the first Fire module in a CNN. After every $f r e q$ Fire modules, we increase the number of expand filters by $incr_e$ .

$base_e = 128$
$incr_e = 128$
$f r e q = 2$
$pct_3x3 = 0.5$
$S R = 0.125$
$0\sim 7$

对比下 table 1
在这里插入图片描述
编程算算

from math import floor
base_e = 128
incr_e = 128
freq = 2
pct_3x3 = 0.5
SR = 0.125
print ("s1 e1 e3 ")
for i in range(0,8):
    e_i = base_e + (incr_e * floor(i/freq))
    e_i_3x3 = e_i * pct_3x3
    e_i_1x1 = e_i * (1-pct_3x3)
    s_i_1x1 = e_i * SR
    print("%d %d %d"%(s_i_1x1,e_i_1x1,e_i_3x3))

output

6.2.2 Squeeze Ratio

在这里插入图片描述
横坐标 model size，纵坐标 accuracy，Accuracy plateaus at 86.0% with $S R = 0.75$ （精度停滞期）

train from scratch

6.2.3 Trading off 1x1 and 3x3 filters

SR = 0.5，mostly 1x1 to mostly 3x3
在这里插入图片描述
横坐标 model size，纵坐标 accuracy，Accuracy plateaus at 85.3% with $pct_{3x3} = 50\%$ （精度停滞期）

6.3 CNN macroarchitecture design space exploration

macroarchitecture：the system-level organization of multiple modules into an end-to-end CNN architecture（本文中涉及到 bypass 结构，也即 residual connections，参考 figure2）

Vanilla SqueezeNet（figure 2 左）
SqueezeNet with simple bypass connection（just a wire，figure 2 中）
SqueezeNet with complex bypass connection（1x1 conv，figure 2 右）

在这里插入图片描述

bypass 结构的优势

would help to alleviate the representational bottleneck introduced by squeeze layers.
SR 压缩太多，有旁路，曲径通幽
Due to this severe dimensionality reduction, a limited amount of information can pass through squeeze layers. However, by adding bypass connections to SqueezeNet, we open up avenues for information to flow around the squeeze layers.

在这里插入图片描述
Interestingly, simple bypass connection 效果好

7 Conclusion / Future work

SqueeNet 在 FPGA 上可以应用

We think SqueezeNet will be a good candidate CNN architecture for a variety of applications, especially those in which small model size is of importance.（引用量还是超级恐怖的）
在这里插入图片描述
2019年1月16日 20:47:32

We hope that SqueezeNet will inspire the reader to consider and explore the broad range of possibilities in the design space of CNN architectures and to perform that exploration in a more systematic manner.（嘿嘿，Auto ML）

bryant_meng

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
【SqueezeNet】《SqueezeNet：AlexNet-Level accuracy with 50× fewer parameters and 小于 0.5MB model size》

ICRL-2017文章目录1 Background and Motivation2 Advantages / Contributions3 Notions / Innovations3.1 Innovations3.2 Notions4 Related work5 Method5.1 Architecture design strategy5.2 The Fire Module5.3 Th...
复制链接

扫一扫

专栏目录