最近在做量化相关的工作,在老师的推荐下看了这篇文章,这篇文章是google2018新的作品,非常良心,讲解非常详细,而且有代码可以work。
一、参考文献
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried out using integer-only arithmetic, which can be implemented more efficiently than floating point inference on commonly available integer-only hardware. We also co-design a training procedure to preserve end-to-end model accuracy post quantization. As a result, the proposed quantization scheme improves the tradeoff between accuracy and on-device latency. The improvements are significant even on MobileNets, a model family known for run-time efficiency, and are demonstrated in ImageNet classification and COCO detection on popular CPUs.
文章讲解:
Google CVPR 2018论文:CNN量化技术
Additionally, the minimum and maximum values for activations are determined during training. This allows a model trained with quantization in the loop to be converted to a fixed point inference model with little effort, eliminating the need for a separate calibration step.
此外,激活的最小值和最大值在训练期间确定。这使得在循环中用量化训练的模型可以毫不费力地转换成固定点推断模型,从而不需要单独的校准步骤。(校准是为了获得参数的范围)
二、具体实现
github 代码:
https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md
The linked model tar files contain the following:
- Trained model checkpoints:
mobilenet_v1_1.0_224.ckpt.data-00000-of-00001(保存变量及其取值)
mobilenet_v1_1.0_224.ckpt.index
mobilenet_v1_1.0_224.ckpt.meta(保存图结构)
- Eval graph text protos (to be easily viewed) :mobilenet_v1_1.0_224_eval.pbtxt
- Frozen trained models:mobilenet_v1_1.0_224_frozen.pb(模型大小:17173742)
- Info file containing input and output information:mobilenet_v1_1.0_224_info.txt
- Converted TensorFlow Lite flatbuffer model:mobilenet_v1_1.0_224.tflite(模型大小:4276000)
Note that