【SqueezeNet】《SqueezeNet:AlexNet-Level accuracy with 50× fewer parameters and 小于 0.5MB model size》

在这里插入图片描述

ICRL-2017



1 Background and Motivation

CNN 百家争鸣(accuracy),同一精度(相仿)下,可能有很多网络结构,作者从模型压缩的角度展开,追求相仿精度,更小的模型。

smaller CNN architectures 有如下3点优势:

  • More efficient distributed training
    因为通讯开销正比于模型参数量(Forrest N. Iandola, Khalid Ashraf, MatthewW. Moskewicz, and Kurt Keutzer. FireCaffe: near-linear acceleration of deep neural network training on compute clusters. In CVPR, 2016.)
  • Less overhead(开支) when exporting new models to clients
    eg:自动驾驶,over-the-air update,可以更快,更频繁的更新
  • Feasible FPGA and embedded deployment

当时 inception-v1、v2、v3、v4 已问世!

The overarching goal of our work is to identify a model that has very few parameters while preserving accuracy.

2 Advantages / Contributions

  • 提出 SqueezeNet,模型压缩的新型网络结构
  • AlexNet level accuracy on ImageNet,50× fewer parameters(AlexNet 240MB,SqueezeNet 4.8MB),配合压缩技术可以让模型 <0.5 MB(510× smaller than AlexNet)
  • 探讨了网络结构的设计对精度和模型大小的影响(Design Space Exploration,从 microarchitecture 和 macroarchitecture两个方面,详情见后面的方法)

3 Notions / Innovations

3.1 Innovations

  • 提出了 SqueezeNet 网络结构,AlexNet 相仿精度,大大压缩了模型的参数量
  • 探索了 how CNN architecture design choices impact model size and accuracy(从 microarchitecture 和 macroarchitecture 角度展开,有自己深刻的见解)

3.2 Notions

  • microarchitecture:the organization and dimensionality of individual layers and modules.(卷积的形式,卷积的组合方式-module,卷积核的 number,参考 figure 3)
  • macroarchitecture:the system-level organization of multiple modules into an end-to-end CNN architecture(本文中涉及到 bypass 结构,也即 residual connections,参考 figure2)

4 Related work

相关工作从如下四个方面展开

  • Mode Compression

  • CNN microarchitecture(LeNet、VGG、Inception family—— 5 ∗ 5 , 3 ∗ 3 , 1 ∗ 1 5*5,3*3,1*1 55,33,11

  • CNN macroarchitecture
    一开始大家关注的多的是 depth. The choice of connections across multiple layers or modules is an emerging area(新兴领域) of CNN macroarchitectural research.

  • Neural Network Design Space Exploration

    Much of the work on design space exploration (DSE) of NNs has focused on developing automated approaches for finding NN architectures that deliver higher accuracy.

    • bayesian optimization(贝叶斯优化)

    • simulated annealing(模拟退火)

    • randomized search(随机搜索)

    • genetic algorithms(遗传算法)

    However, these papers make no attempt to provide intuition about the shape of the NN design space

5 Method

5.1 Architecture design strategy

参数量计算方法
( n u m b e r   o f   i n p u t   c h a n n e l s ) ∗ ( n u m b e r   o f   f i l t e r s ) ∗ ( 3 ∗ 3 ) (number \ of \ input \ channels) * (number\ of\ filters) * (3*3) (number of input channels)(number of filters)(33)

  • strategy 1:Replace 3x3 filters with 1x1 filters(9X fewer parameters)
  • strategy 2:Decrease the number of input channels to 3x3 filters
  • strategy 3:Downsample late in the network so that convolution layers have large activation maps(延迟 down sampling)

关于 strategy 3,Our intuition is that large activation maps (due to delayed downsampling) can lead to higher
classification accuracy. 这个好理解,因为feature map 的 resolution 大,包含的信息可能就越多!极端情况, feature map 2 ∗ 2 2*2 22,卷积核 3 ∗ 3 3*3 33,这样卷积的效果肯定不好!

strategy 1 和 2 都是从减小参数量的角度考虑的,strategy 3 is about maximizing accuracy on a limited budget of parameters.

5.2 The Fire Module

fire module = squeeze layer + expand layer
在这里插入图片描述
用到了 1 ∗ 1 1*1 11 体现了 strategy 1
s 1 x 1 &lt; e 1 x 1 + e 3 x 3 s_{1x1} &lt; e_{1x1} + e_{3x3} s1x1<e1x1+e3x3 体现了 strategy 2

实现的时候如下所示: 1 ∗ 1 1*1 11 3 ∗ 3 3*3 33 并行,最后 concatenate
在这里插入图片描述

5.3 The SqueezeNet architecture

figure 2 左边结构
在这里插入图片描述
在这里插入图片描述

maxpooling 策略体现了 strategy 3,maxpooling 8 已经在很后面了,有点像 【Xception】《Xception: Deep Learning with Depthwise Separable Convolutions》,值得注意的是, imagenet 上,往往 down sampling 5 次,而这里仅仅 down 了 4 次。

compression info 是用了 deep compress 中的压缩技术,Data Type 由 32 bit 变成 6 bit

6 Experiments

Dataset:ImageNet

6.1 Evaluation of SqueezeNet

在这里插入图片描述

作者发问?
are small models amenable to compression, or do small models “need” all of the representational power afforded
by dense floating-point values?(SqueezeNet 适合继续压缩吗?有必要 32 bit 表示吗?)

Table 2 最后两行给出了答案,采用 Deep Compression 的压缩技术,在 SqueezeNet 的基础上,还能压缩!
Our small model is indeed amenable to compression.(侧面体现了 Deep Compression 的压缩能力)

改变 Data Type 从 coding 上好像不容易实现!

6.2 CNN microarchitecture design space exploration

However, SqueezeNet and other models reside in a broad and largely unexplored design space of CNN architectures.

microarchitecture:the organization and dimensionality of individual layers and modules.(卷积的形式,卷积的组合方式-module,卷积核的 number,参考 figure 3)

6.2.1 CNN microarchitecture metaparameters

metaparameters 可以理解为 the parameters that are used to control other parameters.
在这里插入图片描述
每个 fire module 有 3 个 hyper parameters( s 1 x 1 , e 1 x 1 , e 3 x 3 s_{1x1},e_{1x1},e_{3x3} s1x1,e1x1,e3x3),一共 8 个 modules 共 24 个 hyper parameters,为了方便控制这些 hyper parameters,作者设计了一套 metaparameters!具体如下:

规定 e i e_i ei 为 the number of expand layer filters, i i i 表示第 i i i 个 fire module

expand layer 中有 1 × 1 1×1 1×1 3 × 3 3×3 3×3 卷积, e i = e i , 1 x 1 + e i , 3 x 3 e_i = e_{i,1x1} + e_{i,3x3} ei=ei,1x1+ei,3x3
设定 expan layer 中 3 × 3 3×3 3×3 卷积的比例为 p c t 3 x 3 pct_{3x3} pct3x3,所以 e i , 3 x 3 = e i ∗ p c t 3 x 3 , e i , 1 x 1 = e i ∗ ( 1 − p c t 3 x 3 ) e_{i,3x3} = e_i*pct_{3x3},e_{i,1x1} = e_i*(1-pct_{3x3}) ei,3x3=eipct3x3ei,1x1=ei(1pct3x3)

s i , 1 x 1 = e i ∗ S R s_{i,1x1} = e_i*SR si,1x1=eiSR S R SR SR 为 squeeze ratio,0-1 之间。

所以,知道 e i e_i ei 就可以通过 p c t 3 x 3 pct_{3x3} pct3x3 推导出 e i , 1 x 1 e_{i,1x1} ei,1x1 e i , 3 x 3 e_{i,3x3} ei,3x3,知道 e i e_i ei 就可以通过 S R SR SR 可以推导出 s i , 1 x 1 s_{i,1x1} si,1x1

那么整个 fire module 的超参数可以由如下公式计算:
e i = b a s e e + ( i n c r e ) ∗ ⌊ i f r e q ⌋ e_i = base_e + (incr_e)*\left \lfloor \frac{i}{freq}\right \rfloor ei=basee+(incre)freqi

We define b a s e e base_e basee as the number of expand filters in the first Fire module in a CNN. After every f r e q freq freq Fire modules, we increase the number of expand filters by i n c r e incr_e incre.

b a s e e = 128 base_e = 128 basee=128
i n c r e = 128 incr_e = 128 incre=128
f r e q = 2 freq = 2 freq=2
p c t 3 x 3 = 0.5 pct_3x3 = 0.5 pct3x3=0.5
S R = 0.125 SR = 0.125 SR=0.125
i = 0 ∼ 7 i = 0\sim 7 i=07

对比下 table 1
在这里插入图片描述
编程算算

from math import floor
base_e = 128
incr_e = 128
freq = 2
pct_3x3 = 0.5
SR = 0.125
print ("s1 e1 e3 ")
for i in range(0,8):
    e_i = base_e + (incr_e * floor(i/freq))
    e_i_3x3 = e_i * pct_3x3
    e_i_1x1 = e_i * (1-pct_3x3)
    s_i_1x1 = e_i * SR
    print("%d %d %d"%(s_i_1x1,e_i_1x1,e_i_3x3))

output

s1 e1 e3 
16 64 64
16 64 64
32 128 128
32 128 128
48 192 192
48 192 192
64 256 256
64 256 256

6.2.2 Squeeze Ratio

在这里插入图片描述
横坐标 model size,纵坐标 accuracy,Accuracy plateaus at 86.0% with S R = 0.75 SR=0.75 SR=0.75(精度停滞期)

train from scratch

6.2.3 Trading off 1x1 and 3x3 filters

SR = 0.5,mostly 1x1 to mostly 3x3
在这里插入图片描述
横坐标 model size,纵坐标 accuracy,Accuracy plateaus at 85.3% with p c t 3 x 3 = 50 % pct_{3x3} = 50\% pct3x3=50%(精度停滞期)

6.3 CNN macroarchitecture design space exploration

macroarchitecture:the system-level organization of multiple modules into an end-to-end CNN architecture(本文中涉及到 bypass 结构,也即 residual connections,参考 figure2)

  • Vanilla SqueezeNet(figure 2 左)
  • SqueezeNet with simple bypass connection(just a wire,figure 2 中)
  • SqueezeNet with complex bypass connection(1x1 conv,figure 2 右)

在这里插入图片描述

bypass 结构的优势

  • would help to alleviate the representational bottleneck introduced by squeeze layers.
  • SR 压缩太多,有旁路,曲径通幽
    Due to this severe dimensionality reduction, a limited amount of information can pass through squeeze layers. However, by adding bypass connections to SqueezeNet, we open up avenues for information to flow around the squeeze layers.

在这里插入图片描述
Interestingly, simple bypass connection 效果好

7 Conclusion / Future work

SqueeNet 在 FPGA 上可以应用

We think SqueezeNet will be a good candidate CNN architecture for a variety of applications, especially those in which small model size is of importance.(引用量还是超级恐怖的)
在这里插入图片描述
2019年1月16日 20:47:32

We hope that SqueezeNet will inspire the reader to consider and explore the broad range of possibilities in the design space of CNN architectures and to perform that exploration in a more systematic manner.(嘿嘿,Auto ML)

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
2021-03-26 20:54:33,596 - Model - INFO - Epoch 1 (1/200): 2021-03-26 20:57:40,380 - Model - INFO - Train Instance Accuracy: 0.571037 2021-03-26 20:58:16,623 - Model - INFO - Test Instance Accuracy: 0.718528, Class Accuracy: 0.627357 2021-03-26 20:58:16,623 - Model - INFO - Best Instance Accuracy: 0.718528, Class Accuracy: 0.627357 2021-03-26 20:58:16,623 - Model - INFO - Save model... 2021-03-26 20:58:16,623 - Model - INFO - Saving at log/classification/pointnet2_msg_normals/checkpoints/best_model.pth 2021-03-26 20:58:16,698 - Model - INFO - Epoch 2 (2/200): 2021-03-26 21:01:26,685 - Model - INFO - Train Instance Accuracy: 0.727947 2021-03-26 21:02:03,642 - Model - INFO - Test Instance Accuracy: 0.790858, Class Accuracy: 0.702316 2021-03-26 21:02:03,642 - Model - INFO - Best Instance Accuracy: 0.790858, Class Accuracy: 0.702316 2021-03-26 21:02:03,642 - Model - INFO - Save model... 2021-03-26 21:02:03,643 - Model - INFO - Saving at log/classification/pointnet2_msg_normals/checkpoints/best_model.pth 2021-03-26 21:02:03,746 - Model - INFO - Epoch 3 (3/200): 2021-03-26 21:05:15,349 - Model - INFO - Train Instance Accuracy: 0.781606 2021-03-26 21:05:51,538 - Model - INFO - Test Instance Accuracy: 0.803641, Class Accuracy: 0.738575 2021-03-26 21:05:51,538 - Model - INFO - Best Instance Accuracy: 0.803641, Class Accuracy: 0.738575 2021-03-26 21:05:51,539 - Model - INFO - Save model... 2021-03-26 21:05:51,539 - Model - INFO - Saving at log/classification/pointnet2_msg_normals/checkpoints/best_model.pth 我有类似于这样的一段txt文件,请你帮我写一段代码来可视化这些训练结果
02-06

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值