【OpenVINO 4】NNCF模块-量化感知训练

hjxu2016

已于 2022-10-25 15:52:22 修改

阅读量1.8k

点赞数 1

分类专栏：推理框架|OpenVINO 文章标签： openvino 深度学习 pytorch

于 2022-10-25 10:01:07 首次发布

本文链接：https://blog.csdn.net/hjxu2016/article/details/127494019

版权

推理框架|OpenVINO 专栏收录该内容

4 篇文章

订阅专栏

Neural Network Compression Neural Network 简称NNCF模块。

NNCF可以通过集成常规的训练流程，来实现感知压缩。这种设计框架可以大幅降低代码修改量。

教程案例来自官网，请直接移步官网
https://docs.openvino.ai/latest/notebooks/302-pytorch-quantization-aware-training-with-output.html

https://github.com/openvinotoolkit/nncf/blob/develop/docs/Usage.md
完整的流程可以以下四个步骤：

将原始的FP32的模型，转换成INT8模型，用来对比精度、速度的
使用NNCF的Fintune模块，优化精度
把量化Fintune后的模型，导出ONNX，再导出OpenVINO的IR模型
评估和对比性能

。

一、NNCF模块量化流程

单纯用来对比时间和精度
在【OpenVINO 3】POT量化流程已经记录了，如果用POT的方式转换成INT8模型，这里再简单描述一次

1.1 配置NNCF的参数来指定压缩

nncf_config_dict = {
    "input_info": {"sample_size": [1, 3, image_size, image_size]},
    "log_dir": str(OUTPUT_DIR),  # The log directory for NNCF-specific logging outputs.
    "compression": {
        "algorithm": "quantization",  # Specify the algorithm here.
    },
}
nncf_config = NNCFConfig.from_dict(nncf_config_dict)

1.2 提供一个DataLoader（可选）

提供一个 data loader来初始化量化范围的值，并使用给定的样本，用来在统计数据，从而确定哪些激活应该有符号或无符号。
对一些确定的算法，比如量化，强烈建议在正确开始压缩微调之前，通过nncf_config传递训练数据来初始化算法

nncf_config = register_default_init_args(nncf_config, train_loader)

1.3 创建一个NNCF使用的模型对象

从一个预训练的FP32的模型和配置文件，创建一个包装好的对象。
这个函数会返回两个对象。

compression_ctrl 是一个控制器对象，可以在压缩训练的过程中，来调整压缩算法中的一些确定参数（比如学习策略），或者来收集压缩算法中的统计信息，比如模型当前的稀疏度。
Model 是经过了初始量化后的模型

compression_ctrl, model = create_compressed_model(model, nncf_config)

可选如果我们是在多卡GPU上训练的，比如用的DataParallel 或者DistributedDataParallel
如果是调用的DistributedDataParallel，在后面需要增加这句

 compression_ctrl.distributed()

1.4 评估默认量化算法模型的精度

评估经过默认初始化量化后模型的精度

acc1 = validate(val_loader, model, criterion)
print(f"Accuracy of initialized INT8 model: {acc1:.3f}")

1.5 Fintune 压缩后的模型

在这个步骤，调用很常规的Finue流程，可以进一步的提高量化模型的精度。
正常使用较低的学习率，只需要几个epochs。
这里的 train_loader 就是torch的data_loader, model 是指经过1.3 NNCF创建的model

compression_lr = init_lr / 10
optimizer = torch.optim.Adam(model.parameters(), lr=compression_lr)

# Train for one epoch with NNCF.
train(train_loader, model, criterion, optimizer, epoch=0)

# Evaluate on validation set after Quantization-Aware Training (QAT case).
acc1_int8 = validate(val_loader, model, criterion)

print(f"Accuracy of tuned INT8 model: {acc1_int8:.3f}")
print(f"Accuracy drop of tuned INT8 model over pre-trained FP32 model: {acc1_fp32 - acc1_int8:.3f}")

1.6 导出一个onnx模型

这里的onnx模型是假的int8模型

if not int8_onnx_path.exists():
    warnings.filterwarnings("ignore", category=TracerWarning)
    warnings.filterwarnings("ignore", category=UserWarning)
    # Export INT8 model to ONNX that is supported by OpenVINO™ Toolkit
    compression_ctrl.export_model(int8_onnx_path)
    print(f"INT8 ONNX model exported to {int8_onnx_path}.")

用netron打开，会发现，多了很多 FakeQuantize算子
netron使用见模型结构可视化神器netron
在这里插入图片描述

1.7 导出pytorch模型

还可以导出pytorch模型的

# save part
compression_ctrl, compressed_model = create_compressed_model(model, nncf_config)
checkpoint = {
    'state_dict': compressed_model.state_dict(),
    'compression_state': compression_ctrl.get_compression_state(),
    ...
}
torch.save(checkpoint, path)

# load part
resuming_checkpoint = torch.load(path)
compression_state = resuming_checkpoint['compression_state'] 
compression_ctrl, compressed_model = create_compressed_model(model, nncf_config, compression_state=compression_state)
state_dict = resuming_checkpoint['state_dict'] 

# load model in a preferable way
    load_state(compressed_model, state_dict, is_resume=True)     
    # or when execution mode on loading is the same as on saving: 
    # save and load in a single GPU mode or save and load in the (Distributed)DataParallel one, not in a mixed way  
    compressed_model.load_state_dict(state_dict)

1.8 导出OpenVINO的IR模型

使用mo工具，在FP16的参数设置下，将onnx模型导出为OpenVINO支持的IR模型。导出的模型可以保存在指定的文件夹
执行需要一些时间，执行成功的话，最后一行会显示[SUCESS]

将FP32的onnx模型导出IR模型，可用来对比时间精度

if not fp32_ir_path.exists():
    !mo --input_model $fp32_onnx_path --input_shape "[1,3, $image_size, $image_size]" --mean_values "[123.675, 116.28 , 103.53]" --scale_values "[58.395, 57.12 , 57.375]" --data_type FP16 --ou

将经过INT8量化的的onnx模型导出IR模型

if not int8_ir_path.exists():
    !mo --input_model $int8_onnx_path --input_shape "[1,3, $image_size, $image_size]" --mean_values "[123.675, 116.28 , 103.53]" --scale_values "[58.395, 57.12 , 57.375]" --data_type FP16 --out

1.9 评估模型

最后，评估FP32和INT8的推理性能，可以使用 OpenVINO提供的 Benchmark Tool 工具。
默认情况下，Bechmark 工具在异步CPU下，推理60秒，然后返回每张图推理延迟时间(latency)和通量(FPS).

def parse_benchmark_output(benchmark_output):
    parsed_output = [line for line in benchmark_output if not (line.startswith(r"[") or line.startswith("  ") or line == "")]
    print(*parsed_output, sep='\n')


print('Benchmark FP32 model (IR)')
benchmark_output = ! benchmark_app -m $fp32_ir_path -d CPU -api async -t 15
parse_benchmark_output(benchmark_output)

print('Benchmark INT8 model (IR)')
benchmark_output = ! benchmark_app -m $int8_ir_path -d CPU -api async -t 15
parse_benchmark_output(benchmark_output)

打印CPU信息

ie = Core()
ie.get_property("CPU", "FULL_DEVICE_NAME")