Neural Network Compression Framework for fast model inference

最新推荐文章于 2022-10-25 10:01:07 发布

Ruff_XY

最新推荐文章于 2022-10-25 10:01:07 发布

阅读量268

点赞数

分类专栏： paper reading 量化压缩深度学习文章标签：深度学习 paper 模型压缩

本文链接：https://blog.csdn.net/xieyi4650/article/details/118393883

版权

深度学习同时被 3 个专栏收录

19 篇文章 0 订阅

订阅专栏

paper reading

6 篇文章 0 订阅

订阅专栏

量化压缩

5 篇文章 0 订阅

订阅专栏

论文背景

文章地址
 代码地址

Alexander Kozlov Ivan Lazarevich Vasily Shamporov Nikolay Lyalyushkin Yury Gorbachev
intel

名字看起来都是俄罗斯人

期刊/会议: CVPR 2020

Abstract

基于pytorch框架, 可以提供quantization, sparsity, filter pruning and binarization等压缩技术. 可独立使用, 也可以与现有的training code整合在一起.

features

Support of quantization, binarization, sparsity and filter pruning algorithms with fine-tuning.
Automatic model graph transformation in PyTorch – the model is wrapped and additional layers are inserted in the model graph.
Ability to stack compression methods and apply several of them at the same time.
Training samples for image classification, object detection and semantic segmentation tasks as well as configuration files to compress a range of models.
Ability to integrate compression-aware training into third-party repositories with minimal modifications of the existing training pipelines, which allows integrating NNCF into large-scale model/pipeline aggregation repositories such as MMDetection or Transformers.
Hardware-accelerated layers for fast model fine-tuning and multi-GPU training support.
Compatibility with $OpenVINO^{TM}$ Toolkit for model inference.

A few caveats and Framework Architecture

NNCF does not perform additional network graph transformations during the quantization process, such as batch normalization folding
The sparsity algorithms implemented in NNCF constitute non-structured network sparsification approaches. Another approach is the so-called structured sparsity, which aims to prune away whole neurons or convolutional filters.
Each compression method acts on this wrapper by defining the following basic components:
- Compression Algorithm Builder
- Compression Algorithm Controller
- Compression Loss
- Compression Scheduler
Another important novelty of NNCF is the support of algorithm stacking where the users can build custom compression pipelines by combining several compression methods.(可以在一次训练中同时生成稀疏且量化的模型)
使用步骤
- the model is wrapped by the transparent NNCFNetwork wrapper
- one or more particular compression algorithm builders are instantiated and applied to the wrapped model.
- The wrapped model can then be fine-tuned on the target dataset using either an original training pipeline, or a slightly modified pipeline.
- After the compressed model is trained we can export it to ONNX format for further usage in the $OpenVINO^{TM}$ inference toolkit

Compression Methods Overview

quantization

借鉴的方法有

QAT
PACT
TQT

	$q_{min}$	$q_{max}$
Weights	$2^{bits-1}+1$	$2^{bits-1}-1$
Signed Activation	$2^{bits-1}$	$2^{bits-1}-1$
Unsigned Activation	0	$2^{bits}-1$

对称量化

scale是训练得到的, 用以表示实际的范围
在这里插入图片描述

非对称量化

训练优化float的范围, 0点为最小是
float zero-point经过映射后需要是在量化范围内的一个整数, 这个限制可以使带padding的layer计算效率高
在这里插入图片描述

Training and inference

和QAT, TQT不同, 论文中的方法并不会进行BN fold, 但是为了train和inference时的统计量一致, 需要使用大的batch size.(>256)

混合精度量化

使用HAWQ-v2方法来选择bit位,
敏感度计算方式如下:
在这里插入图片描述

压缩率计算方式: int8的复杂度/mixed-precision复杂度
复杂度 = FLOPs * bit-width

混合精度就是在满足压缩率阈值的情况下, 找到具有最小敏感度的精度配置.

Binarization

weights通过XNOR和DoReFa实现.

Stage 1: the network is trained without any binarization,
Stage 2: the training continues with binarization enabled for activations only,
Stage 3: binarization is enabled both for activations and weights,
Stage 4: the optimizer learning rate, which had been kept constant at previous stages, is decreased according to a polynomial law, while weight decay parameter of the optimizer is set to 0.

Sparsity

NNCF支持两只sparsity方式:
1 根据weights大小来训练
2 基于L0 regularization的训练

Filter pruning

NNCF implements three different criteria for filter importance:

L1-norm,
L2-norm
geometric median.

Ruff_XY

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Neural Network Compression Framework for fast model inference

论文背景文章地址代码地址Alexander Kozlov Ivan Lazarevich Vasily Shamporov Nikolay Lyalyushkin Yury Gorbachevintel名字看起来都是俄罗斯人期刊/会议: CVPR 2020Abstract基于pytorch框架, 可以提供quantization, sparsity, filter pruning and binarization等压缩技术. 可独立使用, 也可以与现有的training co
复制链接

扫一扫