来自blog
Tensorflow 量化训练全过程
You can either train your quantized model by restroing a ever trained floating point model or from scratch. In any cases, you have to firstly create a quantization training graph.
tf.contrib.quantize.create_training_graph(quant_delay=DELAY_STEP)
The DELAY_STEP means number of steps after which weights and activations are quantized during training. Just put the above code after you create your normal training graph(exclude the optimization operation). If you use multi-gpu training, you have to create a new quantization graph on every gpu card. Just like the code as following:
with tf.variable_scope(tf.get_variable_scope()):
for i in xrange(len(GPU_NUM_ID)):
with tf.device('/gpu:%d' % GPU_NUM_ID[i]):
with tf.name_scope('%s_%d' % ('cnn_mg', i)) as scope:
images, abels = load_batch_images()
logits, out_data = net.inference(images, reuse=tf.AUTO_REUSE, num_classes=LABEL_NUM)
with tf.variable_scope(tf.get_variable_scope(), reuse=tf.AUTO_REUSE):
tf.contrib.quantize.create_training_graph(quant_delay=DELAY_STEP)
loss = conpute_loss(labels, logits)
tf.get_variable_scope().reuse_variables()
grads = optimizer.compute_gradients(loss_total_sep)
tower_grads.append(grads)
One thing I have to mention is that the quantized aware training process is fake training. Fake training means that during the forward process, the training graph just simulate the integer multiply by using corrsponding floating point mulipy, The word ‘Corrosponding’ means that the simulated float point weights are the reversd quantization of the corresponding fixed integer point. So the training forward output may silightly different from the actual quantization computed result.
Save, Frozen, Convert and Test
Save
Next, you have to save your trained quantized model. However, to save your quantized model, you have to create a quantized evaluation graph by using the following code:
g = tf.get_default_graph()
tf.contrib.quantize.create_eval_graph(input_graph=g)
Then just writing the graph and save it.
with open('./your_quantized_graph.pb', 'w') as f:
f.write(str(g.as_graph_def()))\
Frozen
To make your model more compact, you can froze your model. Frozen a model means that getting rid of useless operations and fusing redundant operations. To froze your graph, you can use the standard frozen tool.
bazel build tensorflow/python/tools:freeze_graph && \
bazel-bin/tensorflow/python/tools/freeze_graph \
--input_graph=some_graph_def.pb \
--input_checkpoint=model.ckpt-8361242 \
--output_graph=/tmp/frozen_graph.pb --output_node_names=softmax
Convert
The next step is to convert your frozen graph to tflite for future delopy.
path_to_frozen_graphdef_pb = './your_frozen_graph.pb'
input_shapes = {'validate_input/imgs':[1,320,320,3]}
(tf_verion>1.11)converter = tf.contrib.lite.TFLiteConverter.from_frozen_graph(path_to_frozen_graphdef_pb, ['validate_input/imgs'], ['output_node'])
(tf_version<=1.11)converter = tf.contrib.lite.TocoConverter.from_frozen_graph(path_to_frozen_graphdef_pb, ['validate_input/imgs'], ['output_node'])
converter.inference_type = tf.contrib.lite.constants.QUANTIZED_UINT8
converter.quantized_input_stats = {'validate_input/imgs':(0.,1.)}
converter.allow_custom_ops = True
converter.default_ranges_stats = (0,255)
converter.post_training_quantize = True
tflite_model = converter.convert()
open("sfnv2.tflite", "wb").write(tflite_model)
Test
Finally, your can test your converted tflite. By the following code, you can test your quantized model:
interpreter = tf.contrib.lite.Interpreter(model_path="your.tflite")
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
interpreter.set_tensor(input_details[0]['index'], batch_validate_img)
interpreter.invoke()
score = interpreter.get_tensor(output_details[0]['index'])
score = score[0][0]
zero_point = xxx
scale = xxx
reverse_socre = scale * (score - zero_point)
One thing to mention is that the final score you get is a fixted point integer value. You have to convert the fixed point integer value to the corresponing float value. In order to do that, you have to check the corresponding zero point and scale in the corresponding output layer and then descaling the original output value.