TfLite: model Post-training quantization(生成量化模型和部署)

Post-training quantization的方法

tensorflow lite model 的quantization的方法有两种: 

“hybrid” post training quantization and post-training integer quantization

“hybrid” post training quantization approach reduced the model size and latency in many cases, but it has the limitation of requiring floating point computation, which may not be available in all hardware accelerators (i.e. Edge TPUs). 

post-training integer quantization enables users to take an already-trained floating-point model and fully quantize it to only use 8-bit signed integers (i.e. `int8`). By leveraging this quantization scheme, we can get reasonable quantized model accuracy across many models without resorting to retraining a model with quantization-aware training. With this new tool, models will continue to be 4x smaller, but will see even greater CPU speed-ups. Fixed point hardware accelerators, such as Edge TPUs, will also be able to run these models.

量化的原理

1] 每轴(或每通道)或每张量的权重用int8进行定点量化的可表示范围为[-127,127],且zero-point就是量化值0
2] 每张量的激活值或输入值用int8进行定点量化的可表示范围为[-128,127],其zero-point在[-128,127]内依据公式求得

量化参数:
S: Rmax - Rmin / Qmax - Qmin = Scale: 每个Q单位表示多大的Real value
Z: Zero = Qmax - Rmax / S: Real zero表示多大的Q value

明确权重的量化和激活值的量化方法不同:
1. weight的量化方法是: Real zero 由 Q value 0表示, 而激活值的 Real zero是由 Z = Qmax - Rmax/S 计算得到
2. 定点的范围也不同weight: [-127, 127], active value:[-128, 127]

对量化是阈值的选取计算,这篇文章采用了更简单的方法。
对于权重,使用实际的最大和最小值来决定量化参数。
对于激活输出,使用跨批(batches)的最大和最小值的滑动平均值来决定量化参数。

生成 float tflite model/ hybrid quatization and integer quantization

下面用简单的例子演示怎样生成 float tflite mode(no quantization), hybrid post training quatization and post-training integer quantization.

生成简单的mnist模型

使用的tensorfow版本为2.0.0

import tensorflow as tf
import numpy as np

print (tf.__version__)

(x_train, y_train),(x_test, y_test) = tf.keras.datasets.mnist.load_data()
#x_train, x_test = (x_train / 255.0), (x_test / 255.0)
x_train, x_test = (x_train / 255.0).astype(np.float32), (x_test / 255.0).astype(np.float32)

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=1)
model.evaluate(x_test, y_test)

保存为 saved_model and h5格式

model.save('saved_model')
model.save('keral_model.h5')
用的竟然是同一个函数save, 可能是从后面的参数 文件夹还是文件区分的

生成tf.lite.TFLiteConverter的方法

在介绍生成quantization之前,先介绍下生成TFLiteConverter的方法。

The Python API for converting TensorFlow models to TensorFlow Lite is tf.lite.TFLiteConverterTFLiteConverter provides the following classmethods to convert a model based on the original model format:

This document contains example usages of the API, a detailed list of changes in the API between Tensorflow 1 and TensorFlow 2, and instructions on running the different versions of TensorFlow.

和之前1.x的区别是没有了 from_keras_model_file("xxx.h5")

在2.0中没有该函数,怎么从h5文件中得到

keras_model = tf.keras.models.load_model('keras_model.h5') 然后

converter = tf.lite.TFLiteConverter.from_keras_model(keras_model)

生成float模型

converter = tf.lite.TFLiteConverter.from_keras_model(model)
float_model = converter.convert()
open("float_model.tflite", "wb").write(float_model)

生成hybrid quantization

converter_hybrid = tf.lite.TFLiteConverter.from_saved_model('saved_model')
converter_hybrid.optimizations = [tf.l

  • 3
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值