目录
AIMET TensorFlow Quantization SIM API
顶层API:量化模拟模型类QuantizationSimModel
- 主要就是一个类:aimet_tensorflow.quantsim.QuantizationSimModel
- 作用:给指定的模型添加量化模拟算子,从而创建一个QuantSim模型。
- 主要有两个优点:
- 在浮点设备上模拟真实部署的硬件设备上的推理准确率以及性能。
- 可以对模型进行微调(即QAT技术)在一定程度上抵消量化带来的性能损失。
- 主要有两个优点:
- 类的定义为:
class aimet_tensorflow.quantsim.QuantizationSimModel(session: tf.compat.v1.Session, starting_op_names: List[str], output_op_names: List[str], quant_scheme: Union[str, QuantScheme] = 'tf_enhanced', rounding_mode: str = 'nearest', default_output_bw: int = 8, default_param_bw: int = 8, use_cuda: bool = True, config_file: str = None, default_data_type: QuantizationDataType = QuantizationDataType.int)
- 参数说明:
- session():指定的模型作为session,将量化算子插入到session中,类型是tf.compat.v1.Session
- starting_op_names ([]):模型的计算图中第一个算子的名称,类型是字符串列表
- output_op_names([]):模型的计算图中输出算子的名称,类型是字符串列表
- quant_scheme([, QuantScheme]):量化方案,目前支持的量化方案是post_training_tf和post_training_tf_enhanced,默认为post_training_tf_enhanced
- QuantScheme:枚举类型,详见:class aimet_common.defs.QuantScheme
- rounding mode:舍入方案,支持‘最近’与‘随机’模式,类型是字符串,默认值为‘最近’模式
- default_output_bw:激活值的位宽,默认8位int
- default_param_bw:参数张量的位宽,默认8位int
- use_cuda:bool类型,是否在GPU上进行量化操作
- config_file([]):配置文件的路径,配置文件主要用于指定模型中量化算子的相关规则。
- default_data_type:– 用于量化所有层参数的默认数据类型。支持的数据类型是 QuantizationDataType.int QuantizationDataType.float。
- 注意:当 default_output_bw=16 和 default_param_bw=16 时仅支持 default_data_type=QuantizationDataType.float 类型
- 返回值:object类型,它可以在tensorflow图上执行量化操作。
- 异常类型:值错误类型,当处理类定义中某一个输入参数出错时,给出提示。
量化方案说明
- AIMET支持多种量化方案:
- PTQ后量化:使用 TF 或者 TF-Enhanced 量化方式来计算模型的编码
- 这里的编码指的是:量化需要的min、max、scale 和 zero point,编码说明详见Quantization Encoding Specification
- QAT量化感知训练:主要是在训练的过程中学习 min 和 max
- Range Learning with TF Initialization:用 TF 方式来初始化编码,然后再训练过程中微调这些编码,以提升模型的性能
- Range Learning with TF-Enhanced Initialization:用 TF-Enhanced 方式来初始化编码,然后再训练过程中微调这些编码,以提升模型的性能
- PTQ后量化:使用 TF 或者 TF-Enhanced 量化方式来计算模型的编码
用于计算模型的编码的API
- QuantizationSimModel类方法
- QuantizationSimModel.compute_encodings(self, forward_pass_callback: Callable[[tf.compat.v1.Session, Any], None],
forward_pass_callback_args)- 为模型中所有的量化模拟节点计算编码。
- 也可以用于为Range Learning设置出示的编码值
- 参数说明:
- forward_pass_callback([[,],]):一个回调函数,主要是能在session中跑前向。
- 此回调函数应使用有代表性数据来跑前向,目的是为了让计算得到的编码适用于数据集中所有的数据样本。
- 此回调在内部能自行选择要用于计算编码的数据样本数。
- forward_pass_callback_args:这些参数按原样传递给第一个参数,即forward_pass_callback。此参数的类型由用户确定。例如,可以只是一个表示使用的数据样本数量的整数,也可以是元组类型的参数或表示更复杂的对象。
- forward_pass_callback([[,],]):一个回调函数,主要是能在session中跑前向。
- 返回值:没有
将量化好的模型导出到目标部署机器的API
-
QuantizationSimModel类方法
-
export(self, path: str, filename_prefix: str, orig_sess: tf.compat.v1.Session = None)
- 主要用于导出量化模拟的模型,以便后面用于硬件设备上。
- 关于保存的内容的说明:
- 导出的模型格式是:ckpt或者meta。注意:模型中不包含任何的量化模拟算子。
- 二量化编码将导出到单独的 JSON 格式文件中,之后如果需要的话,可导入到目标运行时上,也就是部署到相应的硬件设备上。
- 参数说明:
- path ():存放模型文件和编码文件的路径,类型是字符串
- filename_prefix ():两个文件名的前缀,类型是字符串
- orig_sess ([]):在原始会话中传递的可选参数,无需导出量化节点
- 返回值:没有
##示例代码:后量化和QAT使用说明
-
导包
import tensorflow as tf # Import the tensorflow quantisim from aimet_tensorflow import quantsim from aimet_tensorflow.common import graph_eval from aimet_tensorflow.utils import graph_saver from aimet_common.defs import QuantScheme
-
传入标定或校准数据
def pass_calibration_data(session: tf.Session): """ The User of the QuantizationSimModel API is expected to write this function based on their data set. This is not a working function and is provided only as a guideline. :param session: Model's session :return: """ # User action required # The following line of code is an example of how to use the ImageNet data's validation data loader. # Replace the following line with your own dataset's validation data loader. data_loader = None # Your Dataset's data loader # User action required # For computing the activation encodings, around 1000 unlabelled data samples are required. # Edit the following 2 lines based on your dataloader's batch size. # batch_size * max_batch_counter should be 1024 batch_size = 64 max_batch_counter = 16 input_tensor = None # input tensor in session train_tensor = None # train tensor in session current_batch_counter = 0 for input_data, _ in data_loader: feed_dict = {input_tensor: input_data, train_tensor: False} session.run([], feed_dict=feed_dict) current_batch_counter += 1 if current_batch_counter == max_batch_counter: break
-
后量化以及微调(即QAT)
def quantize_model(): """ Create the Quantization Simulation and finetune the model. :return: """ tf.compat.v1.reset_default_graph() # load graph sess = graph_saver.load_model_from_meta('models/mnist_save.meta', 'models/mnist_save') # Create quantsim model to quantize the network using the default 8 bit params/activations sim = quantsim.QuantizationSimModel(sess, starting_op_names=['reshape_input'], output_op_names=['dense_1/BiasAdd'], quant_scheme=QuantScheme.post_training_tf_enhanced, config_file='../../../TrainingExtensions/common/src/python/aimet_common/' 'quantsim_config/default_config.json') # Compute encodings sim.compute_encodings(pass_calibration_data, forward_pass_callback_args=None) # Do some finetuning # User action required # The following line of code illustrates that the model is getting finetuned. # Replace the following train() function with your pipeline's train() function. train(sim)
-
量化和微调训练好的模型,以学习编码(即range learning)
def quantization_aware_training_range_learning(): """ Running Quantize Range Learning Test """ tf.reset_default_graph() # Allocate the generator you wish to use to provide the network with data parser2 = tf_gen.MnistParser(batch_size=100, data_inputs=['reshape_input']) generator = tf_gen.TfRecordGenerator(tfrecords=[os.path.join('data', 'mnist', 'validation.tfrecords')], parser=parser2) sess = graph_saver.load_model_from_meta('models/mnist_save.meta', 'models/mnist_save') # Create quantsim model to quantize the network using the default 8 bit params/activations # quant scheme set to range learning sim = quantsim.QuantizationSimModel(sess, ['reshape_input'], ['dense_1/BiasAdd'], quant_scheme=QuantScheme.training_range_learning_with_tf_init) # Initialize the model with encodings sim.compute_encodings(pass_calibration_data, forward_pass_callback_args=None) # Train the model to fine-tune the encodings g = sim.session.graph sess = sim.session with g.as_default(): parser2 = tf_gen.MnistParser(batch_size=100, data_inputs=['reshape_input']) generator2 = tf_gen.TfRecordGenerator(tfrecords=['data/mnist/validation.tfrecords'], parser=parser2) cross_entropy = g.get_operation_by_name('xent') train_step = g.get_operation_by_name("Adam") # do training: learn weights and architecture simultaneously x = sim.session.graph.get_tensor_by_name("reshape_input:0") y = g.get_tensor_by_name("labels:0") fc1_w = g.get_tensor_by_name("dense_1/MatMul/ReadVariableOp:0") perf = graph_eval.evaluate_graph(sess, generator2, ['accuracy'], graph_eval.default_eval_func, 1) print('Quantized performance: ' + str(perf * 100)) ce = g.get_tensor_by_name("xent:0") train_step = tf.train.AdamOptimizer(1e-3, name="TempAdam").minimize(ce) graph_eval.initialize_uninitialized_vars(sess) mnist = input_data.read_data_sets('./data', one_hot=True) for i in range(100): batch = mnist.train.next_batch(50) sess.run([train_step, fc1_w], feed_dict={x: batch[0], y: batch[1]}) if i % 10 == 0: perf = graph_eval.evaluate_graph(sess, generator2, ['accuracy'], graph_eval.default_eval_func, 1) print('Quantized performance: ' + str(perf * 100)) # close session sess.close()