YoloV3-Openvino推理引擎在Tiger Lake上进行benchmark测试

最新推荐文章于 2023-05-17 10:30:37 发布

沙皮狗de忧伤

最新推荐文章于 2023-05-17 10:30:37 发布

阅读量1.3k

点赞数 3

分类专栏：学习笔记文章标签：深度学习神经网络 openvino yolov3

本文链接：https://blog.csdn.net/weixin_38106878/article/details/117444936

版权

学习笔记专栏收录该内容

10 篇文章 1 订阅

订阅专栏

博客内容章节简介：

软件环境和硬件设备的简要介绍
使用openvino自带的downloader模块进行模型的下载、转换和量化；

模型转换： pb模型转换至openvino支持的xml和bin文件；
模型量化：生成FP32和FP16的模型文件；

使用openvino量化工具POT进行模型量化

FP32 model -> INT8 model
FP16 model -> INT8 model

测试量化好的模型在CPU、GPU以及CPU+GPU硬件条件下的推理性能；

sources：

openvino官网：https://docs.openvinotoolkit.org/latest/index.html
POT工具介绍：https://docs.openvinotoolkit.org/latest/pot_README.html
benchmark 数据： https://docs.openvinotoolkit.org/latest/openvino_docs_performance_benchmarks_openvino.html

软件环境和硬件设备的简要介绍

软件环境介绍：
操作系统：ubuntu18.04、Openvino version：2021.3、python version：3.6.9
Tiger lake硬件介绍：
CPU：I7-1165G、SSD：256G、RAM：4G DDR4*2
Openvino的安装：
在本博客内容中不贴出安装教程，具体安装可参考官方网站的安装方法，在接下来的章节中，默认您已经在电脑上安装好Openvino 2021.3版本；
ps：官方安装步骤链接

使用openvino自带的downloader模块进行yolov3-tf模型的下载、转换和量化（该章节所有的脚本执行路径均在`openvino_2021/deployment_tools/open_model_zoo/tools/downloader`下）

openvino_2021/deployment_tools/open_model_zoo/tools/downloader

使用脚本进行算法模型的下载：

python3 downloader.py --name yolo-v3-tf

下载过程如图所示：
在这里插入图片描述待模型下载完成后，模型会保存至默认下载路径内（如下图所示）；

下载完后模型默认存储路径：openvino_2021/deployment_tools/open_model_zoo/tools/downloader/public/yolo-v3-tf

在这里插入图片描述

模型转换
使用convert.py脚本进行模型的转换和量化，分别生成FP32和FP16的模型；执行命令如下：
（notice：由于博客中下载模型采用的是程序默认路径，因此涉及模型的转换都是使用的默认路径，无需设定其他参数；若是使用自己的路径，需要设定相应参数，具体使用请执行：python3 convert.py --h）

python3 convert.py --name yolo-v3-tf

转换成功后，执行结果log输出如下：

intel@intel:~/intel/openvino_2021/deployment_tools/open_model_zoo/tools/downloader$ python3 converter.py --name yolo-v3-tf
========== Converting yolo-v3-tf to IR (FP16)
Conversion command: /usr/bin/python3 -- /home/intel/intel/openvino_2021/deployment_tools/model_optimizer/mo.py --framework=tf --data_type=FP16 --output_dir=/home/intel/intel/openvino_2021.3.394/deployment_tools/open_model_zoo/tools/downloader/public/yolo-v3-tf/FP16 --model_name=yolo-v3-tf '--input_shape=[1,416,416,3]' --input=input_1 '--scale_values=input_1[255]' --reverse_input_channels --transformations_config=/home/intel/intel/openvino_2021.3.394/deployment_tools/open_model_zoo/tools/downloader/public/yolo-v3-tf/yolo-v3.json --input_model=/home/intel/intel/openvino_2021.3.394/deployment_tools/open_model_zoo/tools/downloader/public/yolo-v3-tf/yolo-v3.pb

Model Optimizer arguments:
Common parameters:
	- Path to the Input Model: 	/home/intel/intel/openvino_2021.3.394/deployment_tools/open_model_zoo/tools/downloader/public/yolo-v3-tf/yolo-v3.pb
	- Path for generated IR: 	/home/intel/intel/openvino_2021.3.394/deployment_tools/open_model_zoo/tools/downloader/public/yolo-v3-tf/FP16
	- IR output name: 	yolo-v3-tf
	- Log level: 	ERROR
	- Batch: 	Not specified, inherited from the model
	- Input layers: 	input_1
	- Output layers: 	Not specified, inherited from the model
	- Input shapes: 	[1,416,416,3]
	- Mean values: 	Not specified
	- Scale values: 	input_1[255]
	- Scale factor: 	Not specified
	- Precision of IR: 	FP16
	- Enable fusing: 	True
	- Enable grouped convolutions fusing: 	True
	- Move mean values to preprocess section: 	None
	- Reverse input channels: 	True
TensorFlow specific parameters:
	- Input model in text protobuf format: 	False
	- Path to model dump for TensorBoard: 	None
	- List of shared libraries with TensorFlow custom layers implementation: 	None
	- Update the configuration file with input/output node names: 	None
	- Use configuration file used to generate the model with Object Detection API: 	None
	- Use the config file: 	None
	- Inference Engine found in: 	/home/intel/intel/openvino_2021/python/python3.6/openvino
Inference Engine version: 	2.1.2021.3.0-2787-60059f2c755-releases/2021/3
Model Optimizer version: 	    2021.3.0-2787-60059f2c755-releases/2021/3
[ SUCCESS ] Generated IR version 10 model.
[ SUCCESS ] XML file: /home/intel/intel/openvino_2021.3.394/deployment_tools/open_model_zoo/tools/downloader/public/yolo-v3-tf/FP16/yolo-v3-tf.xml
[ SUCCESS ] BIN file: /home/intel/intel/openvino_2021.3.394/deployment_tools/open_model_zoo/tools/downloader/public/yolo-v3-tf/FP16/yolo-v3-tf.bin
[ SUCCESS ] Total execution time: 31.89 seconds. 
[ SUCCESS ] Memory consumed: 1700 MB. 

========== Converting yolo-v3-tf to IR (FP32)
Conversion command: /usr/bin/python3 -- /home/intel/intel/openvino_2021/deployment_tools/model_optimizer/mo.py --framework=tf --data_type=FP32 --output_dir=/home/intel/intel/openvino_2021.3.394/deployment_tools/open_model_zoo/tools/downloader/public/yolo-v3-tf/FP32 --model_name=yolo-v3-tf '--input_shape=[1,416,416,3]' --input=input_1 '--scale_values=input_1[255]' --reverse_input_channels --transformations_config=/home/intel/intel/openvino_2021.3.394/deployment_tools/open_model_zoo/tools/downloader/public/yolo-v3-tf/yolo-v3.json --input_model=/home/intel/intel/openvino_2021.3.394/deployment_tools/open_model_zoo/tools/downloader/public/yolo-v3-tf/yolo-v3.pb

Model Optimizer arguments:
Common parameters:
	- Path to the Input Model: 	/home/intel/intel/openvino_2021.3.394/deployment_tools/open_model_zoo/tools/downloader/public/yolo-v3-tf/yolo-v3.pb
	- Path for generated IR: 	/home/intel/intel/openvino_2021.3.394/deployment_tools/open_model_zoo/tools/downloader/public/yolo-v3-tf/FP32
	- IR output name: 	yolo-v3-tf
	- Log level: 	ERROR
	- Batch: 	Not specified, inherited from the model
	- Input layers: 	input_1
	- Output layers: 	Not specified, inherited from the model
	- Input shapes: 	[1,416,416,3]
	- Mean values: 	Not specified
	- Scale values: 	input_1[255]
	- Scale factor: 	Not specified
	- Precision of IR: 	FP32
	- Enable fusing: 	True
	- Enable grouped convolutions fusing: 	True
	- Move mean values to preprocess section: 	None
	- Reverse input channels: 	True
TensorFlow specific parameters:
	- Input model in text protobuf format: 	False
	- Path to model dump for TensorBoard: 	None
	- List of shared libraries with TensorFlow custom layers implementation: 	None
	- Update the configuration file with input/output node names: 	None
	- Use configuration file used to generate the model with Object Detection API: 	None
	- Use the config file: 	None
	- Inference Engine found in: 	/home/intel/intel/openvino_2021/python/python3.6/openvino
Inference Engine version: 	2.1.2021.3.0-2787-60059f2c755-releases/2021/3
Model Optimizer version: 	    2021.3.0-2787-60059f2c755-releases/2021/3
[ SUCCESS ] Generated IR version 10 model.
[ SUCCESS ] XML file: /home/intel/intel/openvino_2021.3.394/deployment_tools/open_model_zoo/tools/downloader/public/yolo-v3-tf/FP32/yolo-v3-tf.xml
[ SUCCESS ] BIN file: /home/intel/intel/openvino_2021.3.394/deployment_tools/open_model_zoo/tools/downloader/public/yolo-v3-tf/FP32/yolo-v3-tf.bin
[ SUCCESS ] Total execution time: 31.12 seconds. 
[ SUCCESS ] Memory consumed: 1727 MB.

目录内的文件结构如图所示，FP16以及FP32目录下分别保存的是不同精度的xml格式的模型文件：
在这里插入图片描述

使用openvino量化工具POT进行int8模型量化（脚本文件：`quantizer.py`）

模型进行int8量化

在最新的openvino版本内，有quantizer.py这个脚本，该脚本能够更加方便的实现对模型进行int8的量化，其本质也是调用了pot工具进行量化操作；
脚本内容如下：

#!/usr/bin/env python3

# Copyright (c) 2020 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import argparse
import json
import os
import sys
import tempfile

from pathlib import Path

import yaml

import common

DEFAULT_POT_CONFIG_BASE = {
    'compression': {
        'algorithms': [
            {
                'name': 'DefaultQuantization',
                'params': {
                    'preset': 'performance',
                    'stat_subset_size': 300,
                },
            },
        ],
    },
}

DATASET_DEFINITIONS_PATH = common.OMZ_ROOT / 'tools/accuracy_checker/dataset_definitions.yml'


def quantize(reporter, model, precision, args, output_dir, pot_path, pot_env):
    input_precision = common.KNOWN_QUANTIZED_PRECISIONS[precision]

    pot_config_base_path = common.MODEL_ROOT / model.subdirectory / 'quantization.yml'

    try:
        with pot_config_base_path.open('rb') as pot_config_base_file:
            pot_config_base = yaml.safe_load(pot_config_base_file)
    except FileNotFoundError:
        pot_config_base = DEFAULT_POT_CONFIG_BASE

    pot_config_paths = {
        'engine': {
            # "type": "simplified"
            'config': str(common.MODEL_ROOT / model.subdirectory / 'accuracy-check.yml'),
        },
        'model': {
            'model': str(args.model_dir / model.subdirectory / input_precision / (model.name + '.xml')),
            'weights': str(args.model_dir / model.subdirectory / input_precision / (model.name + '.bin')),
            'model_name': model.name,
        }
    }

    pot_config = {**pot_config_base, **pot_config_paths}

    if args.target_device:
        pot_config['compression']['target_device'] = args.target_device

    reporter.print_section_heading('{}Quantizing {} from {} to {}',
                                   '(DRY RUN) ' if args.dry_run else '', model.name, input_precision, precision)

    model_output_dir = output_dir / model.subdirectory / precision
    pot_config_path = model_output_dir / 'pot-config.json'

    reporter.print('Creating {}...', pot_config_path)
    pot_config_path.parent.mkdir(parents=True, exist_ok=True)
    with pot_config_path.open('w') as pot_config_file:
        json.dump(pot_config, pot_config_file, indent=4)
        pot_config_file.write('\n')

    pot_output_dir = model_output_dir / 'pot-output'
    pot_output_dir.mkdir(parents=True, exist_ok=True)

    pot_cmd = [str(args.python), '--', str(pot_path),
               '--config={}'.format(pot_config_path),
               '--direct-dump',
               '--output-dir={}'.format(pot_output_dir),
               ]

    reporter.print('Quantization command: {}', common.command_string(pot_cmd))
    reporter.print('Quantization environment: {}',
                   ' '.join('{}={}'.format(k, common.quote_arg(v))
                            for k, v in sorted(pot_env.items())))

    success = True

    if not args.dry_run:
        reporter.print(flush=True)

        success = reporter.job_context.subprocess(pot_cmd, env={**os.environ, **pot_env})

    reporter.print()
    if not success: return False

    if not args.dry_run:
        reporter.print('Moving quantized model to {}...', model_output_dir)
        for ext in ['.xml', '.bin']:
            (pot_output_dir / 'optimized' / (model.name + ext)).replace(
                model_output_dir / (model.name + ext))
        reporter.print()

    return True



def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--model_dir', type=Path, metavar='DIR',
                        default=Path.cwd(), help='root of the directory tree with the full precision model files')
    parser.add_argument('--dataset_dir', type=Path, help='root of the dataset directory tree')
    parser.add_argument('-o', '--output_dir', type=Path, metavar='DIR',
                        help='root of the directory tree to place quantized model files into')
    parser.add_argument('--name', metavar='PAT[,PAT...]',
                        help='quantize only models whose names match at least one of the specified patterns')
    parser.add_argument('--list', type=Path, metavar='FILE.LST',
                        help='quantize only models whose names match at least one of the patterns in the specified file')
    parser.add_argument('--all', action='store_true', help='quantize all available models')
    parser.add_argument('--print_all', action='store_true', help='print all available models')
    parser.add_argument('-p', '--python', type=Path, metavar='PYTHON', default=sys.executable,
                        help='Python executable to run Post-Training Optimization Toolkit with')
    parser.add_argument('--pot', type=Path, help='Post-Training Optimization Toolkit entry point script')
    parser.add_argument('--dry_run', action='store_true',
                        help='print the quantization commands without running them')
    parser.add_argument('--precisions', metavar='PREC[,PREC...]',
                        help='quantize only to the specified precisions')
    parser.add_argument('--target_device', help='target device for the quantized model')
    args = parser.parse_args()

    pot_path = args.pot
    if pot_path is None:
        try:
            pot_path = Path(
                os.environ['INTEL_OPENVINO_DIR']) / 'deployment_tools/tools/post_training_optimization_toolkit/main.py'
        except KeyError:
            sys.exit('Unable to locate Post-Training Optimization Toolkit. '
                     + 'Use --pot or run setupvars.sh/setupvars.bat from the OpenVINO toolkit.')

    models = common.load_models_from_args(parser, args)

    # We can't mark it as required, because it's not required when --print_all is specified.
    # So we have to check it manually.
    if not args.dataset_dir:
        sys.exit('--dataset_dir must be specified.')

    if args.precisions is None:
        requested_precisions = common.KNOWN_QUANTIZED_PRECISIONS.keys()
    else:
        requested_precisions = set(args.precisions.split(','))
        unknown_precisions = requested_precisions - common.KNOWN_QUANTIZED_PRECISIONS.keys()
        if unknown_precisions:
            sys.exit('Unknown precisions specified: {}.'.format(', '.join(sorted(unknown_precisions))))

    reporter = common.Reporter(common.DirectOutputContext())

    output_dir = args.output_dir or args.model_dir

    failed_models = []

    with tempfile.TemporaryDirectory() as temp_dir:
        annotation_dir = Path(temp_dir) / 'annotations'
        annotation_dir.mkdir()

        pot_env = {
            'ANNOTATIONS_DIR': str(annotation_dir),
            'DATA_DIR': str(args.dataset_dir),
            'DEFINITIONS_FILE': str(DATASET_DEFINITIONS_PATH),
        }

        for model in models:
            if not model.quantizable:
                reporter.print_section_heading('Skipping {} (quantization not supported)', model.name)
                reporter.print()
                continue

            for precision in sorted(requested_precisions):
                if not quantize(reporter, model, precision, args, output_dir, pot_path, pot_env):
                    failed_models.append(model.name)
                    break

    if failed_models:
        reporter.print('FAILED:')
        for failed_model_name in failed_models:
            reporter.print(failed_model_name)
        sys.exit(1)


if __name__ == '__main__':
    main()

对模型进行量化的过程，我们需要使用部分数据集来用于模型的量化过程layer的accuracy check，因此在本博客中采用了官方demo运行指定的数据集coco_val_2017数据集以及其label文件；（如下图所示，val2017文件夹内为照片文件）
文件结构如下：
- val2017
  - val2017
  - instances_val2017.json
    4. 数据集以及准备工作做好后，我们使用quantizer.py量化脚本来进行模型INT8的量化工作
    执行如下指令（指令执行完后会生成两个模型文件夹：FP32->INT8的模型、 FP16->INT8的模型）：

python3 quantizer.py --name yolo-v3-tf --dataset_dir ~/Desktop/val2017

脚本执行成功的log输出：

========== Quantizing yolo-v3-tf from FP16 to FP16-INT8
Creating /home/intel/intel/openvino_2021.3.394/deployment_tools/open_model_zoo/tools/downloader/public/yolo-v3-tf/FP16-INT8/pot-config.json...
Quantization command: /usr/bin/python3 -- /home/intel/intel/openvino_2021/deployment_tools/tools/post_training_optimization_toolkit/main.py --config=/home/intel/intel/openvino_2021.3.394/deployment_tools/open_model_zoo/tools/downloader/public/yolo-v3-tf/FP16-INT8/pot-config.json --direct-dump --output-dir=/home/intel/intel/openvino_2021.3.394/deployment_tools/open_model_zoo/tools/downloader/public/yolo-v3-tf/FP16-INT8/pot-output
Quantization environment: ANNOTATIONS_DIR=/tmp/tmp7f1r19ed/annotations DATA_DIR=/home/intel/Desktop/val2017 DEFINITIONS_FILE=/home/intel/intel/openvino_2021.3.394/deployment_tools/open_model_zoo/tools/accuracy_checker/dataset_definitions.yml

16:51:43 accuracy_checker WARNING: /home/intel/intel/openvino_2021.3.394/deployment_tools/tools/post_training_optimization_toolkit/compression/algorithms/quantization/optimization/algorithm.py:42: UserWarning: Nevergrad package could not be imported. If you are planning to useany hyperparameter optimization algo, consider installing itusing pip. This implies advanced usage of the tool.Note that nevergrad is compatible only with Python 3.6+
  'Nevergrad package could not be imported. If you are planning to use'

INFO:app.run:Output log dir: /home/intel/intel/openvino_2021.3.394/deployment_tools/open_model_zoo/tools/downloader/public/yolo-v3-tf/FP16-INT8/pot-output
INFO:app.run:Creating pipeline:
 Algorithm: DefaultQuantization
 Parameters:
	preset                     : performance
	stat_subset_size           : 300
	target_device              : ANY
	model_type                 : None
	dump_intermediate_model    : False
	exec_log_dir               : /home/intel/intel/openvino_2021.3.394/deployment_tools/open_model_zoo/tools/downloader/public/yolo-v3-tf/FP16-INT8/pot-output
 ===========================================================================
IE version: 2.1.2021.3.0-2787-60059f2c755-releases/2021/3
Loaded CPU plugin version:
    CPU - MKLDNNPlugin: 2.1.2021.3.0-2787-60059f2c755-releases/2021/3
Annotation conversion for ms_coco_detection_80_class_without_background dataset has been started
Parameters to be used for conversion:
converter: mscoco_detection
annotation_file: /home/intel/Desktop/val2017/instances_val2017.json
has_background: False
sort_annotations: True
use_full_label_map: False
Total annotations size: 5000
100 / 5000 processed in 0.419s
200 / 5000 processed in 0.424s
300 / 5000 processed in 0.423s
400 / 5000 processed in 0.422s
500 / 5000 processed in 0.436s
600 / 5000 processed in 0.428s
700 / 5000 processed in 0.427s
800 / 5000 processed in 0.427s
900 / 5000 processed in 0.427s
1000 / 5000 processed in 0.440s
1100 / 5000 processed in 0.436s
1200 / 5000 processed in 0.474s
1300 / 5000 processed in 0.438s
1400 / 5000 processed in 0.466s
1500 / 5000 processed in 0.439s
1600 / 5000 processed in 0.457s
1700 / 5000 processed in 0.445s
1800 / 5000 processed in 0.439s
1900 / 5000 processed in 0.446s
2000 / 5000 processed in 0.459s
2100 / 5000 processed in 0.445s
2200 / 5000 processed in 0.451s
2300 / 5000 processed in 0.447s
2400 / 5000 processed in 0.483s
2500 / 5000 processed in 0.619s
2600 / 5000 processed in 0.568s
2700 / 5000 processed in 0.561s
2800 / 5000 processed in 0.465s
2900 / 5000 processed in 0.471s
3000 / 5000 processed in 0.481s
3100 / 5000 processed in 0.473s
3200 / 5000 processed in 0.498s
3300 / 5000 processed in 0.448s
3400 / 5000 processed in 0.453s
3500 / 5000 processed in 0.493s
3600 / 5000 processed in 0.442s
3700 / 5000 processed in 0.496s
3800 / 5000 processed in 0.456s
3900 / 5000 processed in 0.488s
4000 / 5000 processed in 0.507s
4100 / 5000 processed in 0.474s
4200 / 5000 processed in 0.466s
4300 / 5000 processed in 0.434s
4400 / 5000 processed in 0.436s
4500 / 5000 processed in 0.483s
4600 / 5000 processed in 0.428s
4700 / 5000 processed in 0.430s
4800 / 5000 processed in 0.423s
4900 / 5000 processed in 0.430s
5000 / 5000 processed in 0.436s
5000 objects processed in 22.957 seconds
Annotation conversion for ms_coco_detection_80_class_without_background dataset has been finished
ms_coco_detection_80_class_without_background dataset metadata will be saved to /tmp/tmp7f1r19ed/annotations/mscoco_det_80.json
Converted annotation for ms_coco_detection_80_class_without_background dataset will be saved to /tmp/tmp7f1r19ed/annotations/mscoco_det_80.pickle
INFO:compression.statistics.collector:Start computing statistics for algorithms : DefaultQuantization
INFO:compression.statistics.collector:Computing statistics finished
INFO:compression.pipeline.pipeline:Start algorithm: DefaultQuantization
INFO:compression.algorithms.quantization.default.algorithm:Start computing statistics for algorithm : ActivationChannelAlignment
INFO:compression.algorithms.quantization.default.algorithm:Computing statistics finished
INFO:compression.algorithms.quantization.default.algorithm:Start computing statistics for algorithms : MinMaxQuantization,FastBiasCorrection
INFO:compression.algorithms.quantization.default.algorithm:Computing statistics finished
INFO:compression.pipeline.pipeline:Finished: DefaultQuantization
 ===========================================================================

Moving quantized model to /home/intel/intel/openvino_2021.3.394/deployment_tools/open_model_zoo/tools/downloader/public/yolo-v3-tf/FP16-INT8...

========== Quantizing yolo-v3-tf from FP32 to FP32-INT8
Creating /home/intel/intel/openvino_2021.3.394/deployment_tools/open_model_zoo/tools/downloader/public/yolo-v3-tf/FP32-INT8/pot-config.json...
Quantization command: /usr/bin/python3 -- /home/intel/intel/openvino_2021/deployment_tools/tools/post_training_optimization_toolkit/main.py --config=/home/intel/intel/openvino_2021.3.394/deployment_tools/open_model_zoo/tools/downloader/public/yolo-v3-tf/FP32-INT8/pot-config.json --direct-dump --output-dir=/home/intel/intel/openvino_2021.3.394/deployment_tools/open_model_zoo/tools/downloader/public/yolo-v3-tf/FP32-INT8/pot-output
Quantization environment: ANNOTATIONS_DIR=/tmp/tmp7f1r19ed/annotations DATA_DIR=/home/intel/Desktop/val2017 DEFINITIONS_FILE=/home/intel/intel/openvino_2021.3.394/deployment_tools/open_model_zoo/tools/accuracy_checker/dataset_definitions.yml

16:56:13 accuracy_checker WARNING: /home/intel/intel/openvino_2021.3.394/deployment_tools/tools/post_training_optimization_toolkit/compression/algorithms/quantization/optimization/algorithm.py:42: UserWarning: Nevergrad package could not be imported. If you are planning to useany hyperparameter optimization algo, consider installing itusing pip. This implies advanced usage of the tool.Note that nevergrad is compatible only with Python 3.6+
  'Nevergrad package could not be imported. If you are planning to use'

INFO:app.run:Output log dir: /home/intel/intel/openvino_2021.3.394/deployment_tools/open_model_zoo/tools/downloader/public/yolo-v3-tf/FP32-INT8/pot-output
INFO:app.run:Creating pipeline:
 Algorithm: DefaultQuantization
 Parameters:
	preset                     : performance
	stat_subset_size           : 300
	target_device              : ANY
	model_type                 : None
	dump_intermediate_model    : False
	exec_log_dir               : /home/intel/intel/openvino_2021.3.394/deployment_tools/open_model_zoo/tools/downloader/public/yolo-v3-tf/FP32-INT8/pot-output
 ===========================================================================
IE version: 2.1.2021.3.0-2787-60059f2c755-releases/2021/3
Loaded CPU plugin version:
    CPU - MKLDNNPlugin: 2.1.2021.3.0-2787-60059f2c755-releases/2021/3
Annotation for ms_coco_detection_80_class_without_background dataset will be loaded from /tmp/tmp7f1r19ed/annotations/mscoco_det_80.pickle
Loaded dataset info:
	Dataset name: ms_coco_detection_80_class_without_background
	Accuracy Checker version 0.8.6
	Dataset size 5000
	Conversion parameters:
		converter: mscoco_detection
		annotation_file: instances_val2017.json
		has_background: False
		sort_annotations: True
		use_full_label_map: False
ms_coco_detection_80_class_without_background dataset metadata will be loaded from /tmp/tmp7f1r19ed/annotations/mscoco_det_80.json
INFO:compression.statistics.collector:Start computing statistics for algorithms : DefaultQuantization
INFO:compression.statistics.collector:Computing statistics finished
INFO:compression.pipeline.pipeline:Start algorithm: DefaultQuantization
INFO:compression.algorithms.quantization.default.algorithm:Start computing statistics for algorithm : ActivationChannelAlignment
INFO:compression.algorithms.quantization.default.algorithm:Computing statistics finished
INFO:compression.algorithms.quantization.default.algorithm:Start computing statistics for algorithms : MinMaxQuantization,FastBiasCorrection
INFO:compression.algorithms.quantization.default.algorithm:Computing statistics finished
INFO:compression.pipeline.pipeline:Finished: DefaultQuantization
 ===========================================================================

Moving quantized model to /home/intel/intel/openvino_2021.3.394/deployment_tools/open_model_zoo/tools/downloader/public/yolo-v3-tf/FP32-INT8...

ps：如何判断模型量化成功了；最简单的办法就是查看模型文件大小（bin文件），或者查看xml文件，搜索关键字是否有i8或者u8这类的词；
成功执行后，生成的文件夹内的内容如下图所示：
在这里插入图片描述当你看到文件夹内这些文件时，我们已经顺利的完成了模型的下载、量化工作；接下来我们开始进行模型在设备上benchmark的测试工作了。

模型在CPU、GPU以及CPU+GPU硬件条件下的推理性能测试

该部分测试采用的是openvino自带的benchmark脚本进行实验，该脚本存在的路径为：

/openvino_2021/deployment_tools/tools/benchmark_tool

接下来进入该目录下进行算法模型benchmark的测试（INT8模型）

CPU推理性能的测试
执行的指令如下所示：

python3 benchmark_app.py -m /home/intel/intel/openvino_2021.3.394/deployment_tools/open_model_zoo/tools/downloader/public/yolo-v3-tf/FP16-INT8/yolo-v3-tf.xml -i /home/intel/Downloads/sample-videos-master/people-detection.mp4 -d CPU

测试结果：

[Step 1/11] Parsing and validating input arguments
[ WARNING ]  -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README. 
[Step 2/11] Loading Inference Engine
[ INFO ] InferenceEngine:
         API version............. 2.1.2021.3.0-2787-60059f2c755-releases/2021/3
[ INFO ] Device info
         CPU
         MKLDNNPlugin............ version 2.1
         Build................... 2021.3.0-2787-60059f2c755-releases/2021/3

[Step 3/11] Setting device configuration
[ WARNING ] -nstreams default value is determined automatically for CPU device. Although the automatic selection usually provides a reasonable performance,but it still may be non-optimal for some cases, for more information look at README.
[Step 4/11] Reading network files
[ INFO ] Read network took 56.83 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 6/11] Configuring input of the model
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 637.67 ms
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'input_1' precision U8, dimensions (NCHW): 1 3 416 416
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Infer Request 0 filling
[ INFO ] Fill input 'input_1' with random values (image is expected)
[ INFO ] Infer Request 1 filling
[ INFO ] Fill input 'input_1' with random values (image is expected)
[ INFO ] Infer Request 2 filling
[ INFO ] Fill input 'input_1' with random values (image is expected)
[ INFO ] Infer Request 3 filling
[ INFO ] Fill input 'input_1' with random values (image is expected)
[Step 10/11] Measuring performance (Start inference asynchronously, 4 inference requests using 4 streams for CPU, limits: 60000 ms duration)
[ INFO ] First inference took 107.97 ms
[Step 11/11] Dumping statistics report
Count:      1196 iterations
Duration:   60204.83 ms
Latency:    200.71 ms
Throughput: 19.87 FPS

GPU推理性能的测试
运行指令：

python3 benchmark_app.py -m /home/intel/intel/openvino_2021.3.394/deployment_tools/open_model_zoo/tools/downloader/public/yolo-v3-tf/FP16-INT8/yolo-v3-tf.xml -i /home/intel/Downloads/sample-videos-master/people-detection.mp4 -d GPU

测试结果：

[Step 1/11] Parsing and validating input arguments
[ WARNING ]  -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README. 
[Step 2/11] Loading Inference Engine
[ INFO ] InferenceEngine:
         API version............. 2.1.2021.3.0-2787-60059f2c755-releases/2021/3
[ INFO ] Device info
         GPU
         clDNNPlugin............. version 2.1
         Build................... 2021.3.0-2787-60059f2c755-releases/2021/3

[Step 3/11] Setting device configuration
[ WARNING ] -nstreams default value is determined automatically for GPU device. Although the automatic selection usually provides a reasonable performance,but it still may be non-optimal for some cases, for more information look at README.
[Step 4/11] Reading network files
[ INFO ] Read network took 52.69 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 6/11] Configuring input of the model
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 778.57 ms
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'input_1' precision U8, dimensions (NCHW): 1 3 416 416
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Infer Request 0 filling
[ INFO ] Fill input 'input_1' with random values (image is expected)
[ INFO ] Infer Request 1 filling
[ INFO ] Fill input 'input_1' with random values (image is expected)
[ INFO ] Infer Request 2 filling
[ INFO ] Fill input 'input_1' with random values (image is expected)
[ INFO ] Infer Request 3 filling
[ INFO ] Fill input 'input_1' with random values (image is expected)
[Step 10/11] Measuring performance (Start inference asynchronously, 4 inference requests using 2 streams for GPU, limits: 60000 ms duration)
[ INFO ] First inference took 21.45 ms
[Step 11/11] Dumping statistics report
Count:      3540 iterations
Duration:   60101.89 ms
Latency:    67.88 ms
Throughput: 58.90 FPS

CPU+GPU推理性能的测试
运行的指令：

python3 benchmark_app.py -m /home/intel/intel/openvino_2021.3.394/deployment_tools/open_model_zoo/tools/downloader/public/yolo-v3-tf/FP16-INT8/yolo-v3-tf.xml -i /home/intel/Downloads/sample-videos-master/people-detection.mp4 -d MULTI:CPU,GPU

测试结果：

[Step 1/11] Parsing and validating input arguments
[ WARNING ]  -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README. 
[Step 2/11] Loading Inference Engine
[ INFO ] InferenceEngine:
         API version............. 2.1.2021.3.0-2787-60059f2c755-releases/2021/3
[ INFO ] Device info
         CPU
         MKLDNNPlugin............ version 2.1
         Build................... 2021.3.0-2787-60059f2c755-releases/2021/3
         GPU
         clDNNPlugin............. version 2.1
         Build................... 2021.3.0-2787-60059f2c755-releases/2021/3
         MULTI
         MultiDevicePlugin....... version 2.1
         Build................... 2021.3.0-2787-60059f2c755-releases/2021/3

[Step 3/11] Setting device configuration
[ WARNING ] Turn off threads pinning for CPUdevice since multi-scenario with GPU device is used.
[ WARNING ] -nstreams default value is determined automatically for CPU device. Although the automatic selection usually provides a reasonable performance,but it still may be non-optimal for some cases, for more information look at README.
[ WARNING ] -nstreams default value is determined automatically for GPU device. Although the automatic selection usually provides a reasonable performance,but it still may be non-optimal for some cases, for more information look at README.
[ WARNING ] Turn on GPU trottling. Multi-device execution with the CPU + GPU performs best with GPU trottling hint, which releases another CPU thread (that is otherwise used by the GPU driver for active polling)
[Step 4/11] Reading network files
[ INFO ] Read network took 52.11 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 6/11] Configuring input of the model
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 62460.94 ms
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'input_1' precision U8, dimensions (NCHW): 1 3 416 416
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Infer Request 0 filling
[ INFO ] Fill input 'input_1' with random values (image is expected)
[ INFO ] Infer Request 1 filling
[ INFO ] Fill input 'input_1' with random values (image is expected)
[ INFO ] Infer Request 2 filling
[ INFO ] Fill input 'input_1' with random values (image is expected)
[ INFO ] Infer Request 3 filling
[ INFO ] Fill input 'input_1' with random values (image is expected)
[ INFO ] Infer Request 4 filling
[ INFO ] Fill input 'input_1' with random values (image is expected)
[ INFO ] Infer Request 5 filling
[ INFO ] Fill input 'input_1' with random values (image is expected)
[ INFO ] Infer Request 6 filling
[ INFO ] Fill input 'input_1' with random values (image is expected)
[ INFO ] Infer Request 7 filling
[ INFO ] Fill input 'input_1' with random values (image is expected)
[Step 10/11] Measuring performance (Start inference asynchronously, 8 inference requests using 4 streams for CPU, 2 streams for GPU, limits: 60000 ms duration)
[ INFO ] First inference took 207.95 ms
[Step 11/11] Dumping statistics report
Count:      4616 iterations
Duration:   60165.40 ms
Throughput: 76.72 FPS