TFlite量化

最新推荐文章于 2024-04-25 09:41:27 发布

kuzma_zhang

最新推荐文章于 2024-04-25 09:41:27 发布

阅读量937

点赞数

文章标签： python 开发语言人工智能

本文链接：https://blog.csdn.net/kuzma_zhang/article/details/131360166

版权

tflite提供四种量化方法TFlite：

float16量化：

converter = tf.lite.TFLiteConverter.from_keras_model(keras_model)
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS]
converter.target_spec.supported_types = [tf.float16]
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()
open(f,"wb").write(tflite_model)

# 产生一个converter，指定支持的操作，支持的类型，指定优化器，转化，写到文件里面去
# float16量化
# fp16量化下，input/output都是float32,同时当采用CPU计算时，模型权重w和bias会dequantize(反量化)到float32,如果采用gpu计算，则不需要做此步dequantize,因为tflite的gpu代理支持fp16操作。

动态量化：

converter = tf.lite.TFLiteConverter.from_keras_model(keras_model)
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS]
converter.target_spec.supported_types = [tf.float16]
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()
open(f,"wb").write(tflite_model)

# 产生一个converter，指定支持的操作，支持的类型，指定优化器，转化，写到文件里面去
# 动态量化区别于float16：不指定支持的类型即可
# 导出命令：python export.py --weights yolov5s.pt --include tflite --imgsz 640
# 动态量化下，input/output都是float32，计算过程整型加速与浮点加速同时兼顾，模型参数为int8,输入输出都是float32

全整型量化：

dataset = Loadlmages(check_dataset(data)['train'],img_size=imgsz,auto=False)
converter.representative_dataset = lambda:representative_dataset_gen(dataset,ncalib)
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.target_spec.supported_types =[]
converter.inference_input_type = tf.uint8 # or tf.int8
converter.inference_output_type = tf.uint8 # or tf.int8
converter.experimental_new_quantizer = False
tflite_model = converter.convert()
open(f,"wb").write(tflite_model)

# 由于其输入和输出都是全整型的

# 导出命令python3 export.py --weights yolov5s.pt --include tflite --imgsz 640 --int8

# 在导出时需要小批量的数据，用来标定输入输出的量化参数scale/zero-point

# int8量化下，input/output都是int8

获取到tflite如何使用：

执行预测：

# 执行计算流
interpreter = tf.lite.Interpreter(model_path=yolov5s)
interpreter.allocate_tensors()
input_index = interpreter.get_input_details()[O]["index"]
output_index = interpreter.get_output_details()[O]["index"]
interpreter.set_tensor(input_index,test_image)
interpreter.invoke()
predictions = interpreter.get_tensor(output_index)

#创建interpreter,分配内存，获取数据，传入预测数据，执行预测，获取输出

# 在获取get_input_details时，其中的参数quantization量化参数，第一个为scale，第二个为zero-point，在调用api识别图片时需要利用此参数量化和反量化图片数据