【模型量化】AIMET文档 - AIMET TensorFlow Quantization SIM API

月满星沉

已于 2022-09-03 00:00:11 修改

阅读量1k

点赞数 2

分类专栏：模型量化文章标签： tensorflow 深度学习 python

于 2022-09-02 23:57:45 首次发布

本文链接：https://blog.csdn.net/Hunter_Murphy/article/details/126672207

版权

模型量化专栏收录该内容

1 篇文章 0 订阅

订阅专栏

AIMET TensorFlow Quantization SIM API

顶层API：量化模拟模型类QuantizationSimModel

主要就是一个类：aimet_tensorflow.quantsim.QuantizationSimModel
作用：给指定的模型添加量化模拟算子，从而创建一个QuantSim模型。
- 主要有两个优点：
  - 在浮点设备上模拟真实部署的硬件设备上的推理准确率以及性能。
  - 可以对模型进行微调（即QAT技术）在一定程度上抵消量化带来的性能损失。
类的定义为：
- class aimet_tensorflow.quantsim.QuantizationSimModel(session: tf.compat.v1.Session, starting_op_names: List[str], output_op_names: List[str], quant_scheme: Union[str, QuantScheme] = 'tf_enhanced', rounding_mode: str = 'nearest', default_output_bw: int = 8, default_param_bw: int = 8, use_cuda: bool = True, config_file: str = None, default_data_type: QuantizationDataType = QuantizationDataType.int)
- 参数说明：
  - session()：指定的模型作为session，将量化算子插入到session中，类型是tf.compat.v1.Session
  - starting_op_names ([])：模型的计算图中第一个算子的名称，类型是字符串列表
  - output_op_names([])：模型的计算图中输出算子的名称，类型是字符串列表
  - quant_scheme([, QuantScheme])：量化方案，目前支持的量化方案是post_training_tf和post_training_tf_enhanced，默认为post_training_tf_enhanced
    - QuantScheme：枚举类型，详见：class aimet_common.defs.QuantScheme
  - rounding mode：舍入方案，支持‘最近’与‘随机’模式，类型是字符串，默认值为‘最近’模式
  - default_output_bw：激活值的位宽，默认8位int
  - default_param_bw：参数张量的位宽，默认8位int
  - use_cuda：bool类型，是否在GPU上进行量化操作
  - config_file([])：配置文件的路径，配置文件主要用于指定模型中量化算子的相关规则。
  - default_data_type：– 用于量化所有层参数的默认数据类型。支持的数据类型是 QuantizationDataType.int QuantizationDataType.float。
    - 注意：当 default_output_bw=16 和 default_param_bw=16 时仅支持 default_data_type=QuantizationDataType.float 类型
- 返回值：object类型，它可以在tensorflow图上执行量化操作。
- 异常类型：值错误类型，当处理类定义中某一个输入参数出错时，给出提示。

量化方案说明

AIMET支持多种量化方案：
- PTQ后量化：使用 TF 或者 TF-Enhanced 量化方式来计算模型的编码
  - 这里的编码指的是：量化需要的min、max、scale 和 zero point，编码说明详见Quantization Encoding Specification
- QAT量化感知训练：主要是在训练的过程中学习 min 和 max
  - Range Learning with TF Initialization：用 TF 方式来初始化编码，然后再训练过程中微调这些编码，以提升模型的性能
  - Range Learning with TF-Enhanced Initialization：用 TF-Enhanced 方式来初始化编码，然后再训练过程中微调这些编码，以提升模型的性能

用于计算模型的编码的API

QuantizationSimModel类方法
QuantizationSimModel.compute_encodings(self, forward_pass_callback: Callable[[tf.compat.v1.Session, Any], None],
forward_pass_callback_args)
- 为模型中所有的量化模拟节点计算编码。
- 也可以用于为Range Learning设置出示的编码值
- 参数说明：
  - forward_pass_callback([[,],])：一个回调函数，主要是能在session中跑前向。
    - 此回调函数应使用有代表性数据来跑前向，目的是为了让计算得到的编码适用于数据集中所有的数据样本。
    - 此回调在内部能自行选择要用于计算编码的数据样本数。
  - forward_pass_callback_args：这些参数按原样传递给第一个参数，即forward_pass_callback。此参数的类型由用户确定。例如，可以只是一个表示使用的数据样本数量的整数，也可以是元组类型的参数或表示更复杂的对象。
- 返回值：没有

将量化好的模型导出到目标部署机器的API

QuantizationSimModel类方法
export(self, path: str, filename_prefix: str, orig_sess: tf.compat.v1.Session = None)
- 主要用于导出量化模拟的模型，以便后面用于硬件设备上。
- 关于保存的内容的说明：
  - 导出的模型格式是：ckpt或者meta。注意：模型中不包含任何的量化模拟算子。
  - 二量化编码将导出到单独的 JSON 格式文件中，之后如果需要的话，可导入到目标运行时上，也就是部署到相应的硬件设备上。
- 参数说明：
  - path ()：存放模型文件和编码文件的路径，类型是字符串
  - filename_prefix ()：两个文件名的前缀，类型是字符串
  - orig_sess ([])：在原始会话中传递的可选参数，无需导出量化节点
- 返回值：没有
  ##示例代码：后量化和QAT使用说明

导包

import tensorflow as tf

# Import the tensorflow quantisim
from aimet_tensorflow import quantsim
from aimet_tensorflow.common import graph_eval
from aimet_tensorflow.utils import graph_saver
from aimet_common.defs import QuantScheme

传入标定或校准数据

def pass_calibration_data(session: tf.Session):
    """
    The User of the QuantizationSimModel API is expected to write this function based on their data set.
    This is not a working function and is provided only as a guideline.

    :param session: Model's session
    :return:
    """

    # User action required
    # The following line of code is an example of how to use the ImageNet data's validation data loader.
    # Replace the following line with your own dataset's validation data loader.
    data_loader = None  # Your Dataset's data loader

    # User action required
    # For computing the activation encodings, around 1000 unlabelled data samples are required.
    # Edit the following 2 lines based on your dataloader's batch size.
    # batch_size * max_batch_counter should be 1024
    batch_size = 64
    max_batch_counter = 16

    input_tensor = None  # input tensor in session
    train_tensor = None  # train tensor in session

    current_batch_counter = 0
    for input_data, _ in data_loader:
        feed_dict = {input_tensor: input_data,
                     train_tensor: False}

        session.run([], feed_dict=feed_dict)

        current_batch_counter += 1
        if current_batch_counter == max_batch_counter:
            break

后量化以及微调(即QAT)

def quantize_model():
    """
    Create the Quantization Simulation and finetune the model.
    :return:
    """
    tf.compat.v1.reset_default_graph()

    # load graph
    sess = graph_saver.load_model_from_meta('models/mnist_save.meta', 'models/mnist_save')

    # Create quantsim model to quantize the network using the default 8 bit params/activations
    sim = quantsim.QuantizationSimModel(sess, starting_op_names=['reshape_input'], output_op_names=['dense_1/BiasAdd'],
                                        quant_scheme=QuantScheme.post_training_tf_enhanced,
                                        config_file='../../../TrainingExtensions/common/src/python/aimet_common/'
                                                    'quantsim_config/default_config.json')

    # Compute encodings
    sim.compute_encodings(pass_calibration_data, forward_pass_callback_args=None)

    # Do some finetuning

    # User action required
    # The following line of code illustrates that the model is getting finetuned.
    # Replace the following train() function with your pipeline's train() function.
    train(sim)

量化和微调训练好的模型，以学习编码（即range learning）

def quantization_aware_training_range_learning():
    """
    Running Quantize Range Learning Test
    """
    tf.reset_default_graph()

    # Allocate the generator you wish to use to provide the network with data
    parser2 = tf_gen.MnistParser(batch_size=100, data_inputs=['reshape_input'])
    generator = tf_gen.TfRecordGenerator(tfrecords=[os.path.join('data', 'mnist', 'validation.tfrecords')],
                                         parser=parser2)

    sess = graph_saver.load_model_from_meta('models/mnist_save.meta', 'models/mnist_save')

    # Create quantsim model to quantize the network using the default 8 bit params/activations
    # quant scheme set to range learning
    sim = quantsim.QuantizationSimModel(sess, ['reshape_input'], ['dense_1/BiasAdd'],
                                        quant_scheme=QuantScheme.training_range_learning_with_tf_init)

    # Initialize the model with encodings
    sim.compute_encodings(pass_calibration_data, forward_pass_callback_args=None)

    # Train the model to fine-tune the encodings
    g = sim.session.graph
    sess = sim.session

    with g.as_default():

        parser2 = tf_gen.MnistParser(batch_size=100, data_inputs=['reshape_input'])
        generator2 = tf_gen.TfRecordGenerator(tfrecords=['data/mnist/validation.tfrecords'], parser=parser2)
        cross_entropy = g.get_operation_by_name('xent')
        train_step = g.get_operation_by_name("Adam")

        # do training: learn weights and architecture simultaneously
        x = sim.session.graph.get_tensor_by_name("reshape_input:0")
        y = g.get_tensor_by_name("labels:0")
        fc1_w = g.get_tensor_by_name("dense_1/MatMul/ReadVariableOp:0")

        perf = graph_eval.evaluate_graph(sess, generator2, ['accuracy'], graph_eval.default_eval_func, 1)
        print('Quantized performance: ' + str(perf * 100))

        ce = g.get_tensor_by_name("xent:0")
        train_step = tf.train.AdamOptimizer(1e-3, name="TempAdam").minimize(ce)
        graph_eval.initialize_uninitialized_vars(sess)
        mnist = input_data.read_data_sets('./data', one_hot=True)

        for i in range(100):
            batch = mnist.train.next_batch(50)
            sess.run([train_step, fc1_w], feed_dict={x: batch[0], y: batch[1]})
            if i % 10 == 0:
                perf = graph_eval.evaluate_graph(sess, generator2, ['accuracy'], graph_eval.default_eval_func, 1)
                print('Quantized performance: ' + str(perf * 100))

    # close session
    sess.close()