【Tiny CNN】《Quantization Mimic: Towards Very Tiny CNN for Object Detection》

最新推荐文章于 2022-04-07 11:57:27 发布

bryant_meng

最新推荐文章于 2022-04-07 11:57:27 发布

阅读量1.1k

点赞数

分类专栏： CNN / Transformer

本文链接：https://blog.csdn.net/bryant_meng/article/details/83056203

版权

CNN / Transformer 专栏收录该内容

244 篇文章

订阅专栏

在这里插入图片描述

ECCV-2018

文章目录

1 Background and Motivation
2 Innovation
3 Advantages / Contributions
4 Methods
5 Datasets
6 Experiments
- 6.1 WIDER FACE
- 6.2 PASCAL VOC
7 Conclusions / Feature work
- 7.1 Conclusions
- 7.2 Future work
【附录】 ResNet-18

1 Background and Motivation

1）Accelerate CNNs 的方法：

Quantization（filters/weights）：eg 二进制，或者 2的幂+ 0
Group convolution（MobileNet、ShuffleNet、Xception）
Pruning（weight importance）and Sparse connection（parameter-wise）
Mimic（Knowledge Transfer）

but VGG to VGG-1-14 or VGG-1-16（本文 VGG-1-32）

2）Notion：

Quantization：convert a full-precision network to a quantized one（teacher network）
Mimic：transfer knowledge from teacher network to student network

但是Quantization 往往需要 extra speci fic implementation（FPGA），Mimic （does not work well on very tiny network），本文用 Quantization + Mimic 来做 model compression

3）Quantization 和 Mimic 的关系

The quantization operation can help student network to better match the feature maps from teacher network（别人用 quantization 来 compression model，作者用 quantization 是为了 mimic learning）.

2 Innovation

Quantization + Mimic for object detection（first）

It first quantizes the large network, then mimic a quantized small network.

3 Advantages / Contributions

Very tiny（1/32）：propose an effective algorithm to train very tiny networks（first work）.
Quantization + Mimic：utilize quantized feature maps to facilitate knowledge distilling，二合一，大大降低模型复杂度和计算量
Effectiveness：vertify it by using object detection（more difficult），not classification
Easy to implement：no special limitation during training and inference.， although two stage family，easy transform to YOLO and SSD

4 Methods

Backbone：

VGG with R-FCN
Resnet with Faster R-CNN

在这里插入图片描述

4.1 Quantization

别人用量化直接去压缩模型，作者用量化是为了更好的 mimic

Uniform quantization can better describe large value than power of two quantization.

作者用的是 Uniform quantization，因为 object detection 中 RoI pooling 是 max pooling，要保证大数值的量化质量（ $2^2,2^3, 2^4$ 跳跃还是太大了）。

量化经过激活函数后的 feature map，不量化梯度，梯度用 full-precision（正常网络）的
在这里插入图片描述

4.2 Mimic

如下是【Mimic】《Mimicking Very Efficient Network for Object Detection》当中的方法，下图只是 RPN，加上头部的监督信息（引入类别信息）会更好，加入头部信息后叫 two-stage-mimic
在这里插入图片描述

本篇论文都采用的是 two-stage-mimic，文中的叫法为 joint-train version
在这里插入图片描述

r：rpn
d：detection
s：student
t：teacher
N：number of RoI
i：i-th RoI
f：feature map
r：regression function 让学生提取出来的 RoI 大小等于老师的

4.3 Quantization + Mimic

在这里插入图片描述

损失在 mimic 的基础上加个量化操作
在这里插入图片描述

老师 quantization，学生也 quantization。前者好理解，后者如何理解呢？
如果只是老师 quantization，而学生不 quantization
在这里插入图片描述
用流形（manifold）去拟合上图中8个中心点（每个格子的点都对quantization到相应的 * 中去）

如果学生也 Quantization 了，就相当于用 8个中心点去拟合 8个中心点，问题变得更简单，实验证明，这样效果也是最好的

问题来了，什么是流形（manifold）呢？参考
机器学习算法总结(十二)——流形学习（Manifold Learning）

在这里插入图片描述

5 Datasets

Database：

WIDER FACE
32K images with 394K annotated faces
validation and test：easy , medium and hard subsets.
Pascal VOC

在这里插入图片描述
http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/

6 Experiments

6.1 WIDER FACE

Structure：VGG with R-FCN
teacher：VGG-1-4（因为和 VGG效果差不多，见表3）
student：VGG-1-32
loss 中 $\lambda = 1$
RPN 的 anchor：1 ratio，4 scales $4^2,8^2,16^2,32^2$
RoI Pooling 后：3*3
OHEM

在这里插入图片描述
看出比 scratch （不用ImageNet 预训练，为什么不呢，因为channels太小，容量不够）要好，这样表示作者的 student network 从 teacher network 中学到东西了。

进化，超进化：Speed and Size
在这里插入图片描述

舌战群儒：Comparison with other 压缩模型
在这里插入图片描述

可以看出 Quantization+Mimic 夺魁，且注意到 Group Convolution 效果不好，作者的解释是 channels 太少，这样会 block the information flow.

砍你一只手，看你是不是杨过那样的大侠，Quantization vs Nonquantization：
在这里插入图片描述
实验表明，他不是杨过，同时在 teacher 和 student network 中 quantization 效果最好。前两行的对比也表明了老师提炼的知识的好坏（量化与否），直接影响了学生的学习好坏。

量化是有 regularization 的作用的，验证一下，我们效果的提升是不是只是因为量化的 regularization 作用
在这里插入图片描述
teacher network 不量化，实验表明，只 quantization student network 不提升效果。说明还是这种 mimic 的方式在起作用，quantization 是为了更好的mimic。