通常训练的模型存储为float32,如果可以转换为float16来进行存储的话,模型大小会减少一半
把已经训练好的模型使用TensorFlow Lite converter进行转换为float16即可
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.lite.constants.FLOAT16]
Tflite_quanit_model = converter.convert()
实际上在CPU上是对float16进行了上采样到float32来计算的,原因是目前很多硬件不支持计算float16
TensorFlow Lite的GPU代理已经得到加强,能够直接获取并运行16位精度参数:
//Prepare GPU delegate.
const TfLiteGpuDelegateOptions options = {
.metadata = NULL,
.compile_options = {
.precision_loss_allowed = 1, // FP16
.preferred_gl_object_type = TFLITE_GL_OBJECT_TYPE_FASTEST,
.dynamic_batch_enabled = 0, // Not fully functional yet
},
};
官方指南:
https://www.tensorflow.org/lite/performance/post_training_quantization
16位MNIST的Colab链接:
https://colab.research.google.com/github/tensorflow/tensorflow/blob/master/tensorflow/lite/g3doc/performance/post_training_float16_quant.ipynb