Neural Network Compression Framework for fast model inference

论文背景

文章地址
代码地址

  • Alexander Kozlov Ivan Lazarevich Vasily Shamporov Nikolay Lyalyushkin Yury Gorbachev
    intel

名字看起来都是俄罗斯人

  • 期刊/会议: CVPR 2020

Abstract

基于pytorch框架, 可以提供quantization, sparsity, filter pruning and binarization等压缩技术. 可独立使用, 也可以与现有的training code整合在一起.

features

  • Support of quantization, binarization, sparsity and filter pruning algorithms with fine-tuning.
  • Automatic model graph transformation in PyTorch – the model is wrapped and additional layers are inserted in the model graph.
  • Ability to stack compression methods and apply several of them at the same time.
  • Training samples for image classification, object detection and semantic segmentation tasks as well as configuration files to compress a range of models.
  • Ability to integrate compression-aware training into third-party repositories with minimal modifications of the existing training pipelines, which allows integrating NNCF into large-scale model/pipeline aggregation repositories such as MMDetection or Transformers.
  • Hardware-accelerated layers for fast model fine-tuning and multi-GPU training support.
  • Compatibility with O p e n V I N O T M OpenVINO^{TM} OpenVINOTM Toolkit for model inference.

A few caveats and Framework Architecture

  • NNCF does not perform additional network graph transformations during the quantization process, such as batch normalization folding
  • The sparsity algorithms implemented in NNCF constitute non-structured network sparsification approaches. Another approach is the so-called structured sparsity, which aims to prune away whole neurons or convolutional filters.
  • Each compression method acts on this wrapper by defining the following basic components:
    • Compression Algorithm Builder
    • Compression Algorithm Controller
    • Compression Loss
    • Compression Scheduler
  • Another important novelty of NNCF is the support of algorithm stacking where the users can build custom compression pipelines by combining several compression methods.(可以在一次训练中同时生成稀疏且量化的模型)
  • 使用步骤
    • the model is wrapped by the transparent NNCFNetwork wrapper
    • one or more particular compression algorithm builders are instantiated and applied to the wrapped model.
    • The wrapped model can then be fine-tuned on the target dataset using either an original training pipeline, or a slightly modified pipeline.
    • After the compressed model is trained we can export it to ONNX format for further usage in the O p e n V I N O T M OpenVINO^{TM} OpenVINOTM inference toolkit

Compression Methods Overview

quantization

借鉴的方法有

  • QAT
  • PACT
  • TQT
q m i n q_{min} qmin q m a x q_{max} qmax
Weights − 2 b i t s − 1 + 1 -2^{bits-1}+1 2bits1+1 2 b i t s − 1 − 1 2^{bits-1}-1 2bits11
Signed Activation − 2 b i t s − 1 -2^{bits-1} 2bits1 2 b i t s − 1 − 1 2^{bits-1}-1 2bits11
Unsigned Activation0 2 b i t s − 1 2^{bits}-1 2bits1

对称量化

scale是训练得到的, 用以表示实际的范围
在这里插入图片描述

非对称量化

训练优化float的范围, 0点为最小是
float zero-point经过映射后需要是在量化范围内的一个整数, 这个限制可以使带padding的layer计算效率高
在这里插入图片描述

Training and inference

和QAT, TQT不同, 论文中的方法并不会进行BN fold, 但是为了train和inference时的统计量一致, 需要使用大的batch size.(>256)

混合精度量化

使用HAWQ-v2方法来选择bit位,
敏感度计算方式如下:
在这里插入图片描述

压缩率计算方式: int8的复杂度/mixed-precision复杂度
复杂度 = FLOPs * bit-width

混合精度就是在满足压缩率阈值的情况下, 找到具有最小敏感度的精度配置.

Binarization

weights通过XNOR和DoReFa实现.

  • Stage 1: the network is trained without any binarization,
  • Stage 2: the training continues with binarization enabled for activations only,
  • Stage 3: binarization is enabled both for activations and weights,
  • Stage 4: the optimizer learning rate, which had been kept constant at previous stages, is decreased according to a polynomial law, while weight decay parameter of the optimizer is set to 0.

Sparsity

NNCF支持两只sparsity方式:
1 根据weights大小来训练
2 基于L0 regularization的训练

Filter pruning

NNCF implements three different criteria for filter importance:

  • L1-norm,
  • L2-norm
  • geometric median.
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值