openvino系列 17. OpenVINO Preprocessing API 案例,以及与OpenCV的预处理对比

openvino系列 17. OpenVINO Preprocessing API 案例,以及与OpenCV的预处理对比

此案例,我们将详细介绍OpenVINO Preprocessing API,并与OpenCV的预处理结果做对比。案例涉及:

  • 读取 ONNX 迁移学习模型
  • 将 ONNX 模型转化为 IR 中间件
  • 通过 OpenCV 导入图片,并进行预处理,最终模型推理。计算模型的FPS以及精度;
  • 通过benchmark_app指令查看此IR模型的性能;
  • 通过 OpenVINO Preprocessing API 将预处理步骤集成并保存到 IR 中;
  • 对比 OpenVINO 预处理以及 OpenCV 预处理结果,包括速度和精度;
  • 通过benchmark_app指令查看此 OpenVINO 预处理 IR模型的性能。

环境描述:

  • 本案例运行环境:Win10,10代i5笔记本
  • IDE:VSCode
  • openvino版本:2022.1
  • 代码链接12-preprocessing


1 OpenVINO Preprocessing API 介绍

OpenVINO 2022.1之前版本不提供 OpenVINO runtime原生的用于数据预处理的API函数,开发者必须通过第三方库,比如,OpenCV,来实现数据预处理。OpenVINO™ 2022.1自带的预处理API可以将所有预处理步骤都集成到在执行图中,这样iGPU、CPU、VPU 或今后Intel的独立显卡都能进行数据预处理,大大提高了执行效率;相比之前,用OpenCV实现的预处理,则只能在CPU上执行,如下图:

在这里插入图片描述

相关的网页链接:

2 加载 ONNX 模型并转换为 IR 模型

这里我们使用的模型名为mobileNetV2-lego-minifigures.onnx,基于MobileNetV2进行迁移学习得到,MobileNetV2预训练模型地址:torch hub。此模型的迁移训练博客参见
MobileNetV2 pyTorch Lightning LEGO Minifigures 图像分类案例

所有预训练模型都期望输入图像以相同的方式归一化,即形状为 (3 x H x W) 的3通道RGB图像,其中H和W为224。图像必须归一化到[0,1]的范围内,然后使用 mean=[0.485,0.456,0.406] 和 std=[0.229,0.224,0.225] 进行归一化。

首先,我们需要通过OpenVINO的mo指令将ONNX文件转化为IR中间件:mo --framework=onnx --input_model "model\mobileNetV2-lego-minifigures.onnx" --input_shape "[1,3,224,224]" --data_type FP32 --output_dir "model"

实际上,mo这条指令还可以添加好些选项,我们虽然在这个案例中没有使用,但这里也做一个介绍:

半精度模型(FP16):我们可以在转化的过程中将TensorFlow模型转换成FP16精度的IR模型。在指令中对应--data_type选项,比如:mo --input_model INPUT_MODEL --data_type FP16。半精度模型大小应只有全精度模型的一般,但它可能会有一些精度下降,尽管对于大多数模型来说,精度下降可以忽略不计。

设置Layout:Layout定义了模型的形状尺寸,并且可以为设定输入模型的Layout和经过转换之后的IR输出模型的Layout,比如:mo --input_model tf_nasnet_large.onnx --layout "nhwc->nchw",或者我们只定义一个Layout:mo --input_model tf_nasnet_large.onnx --layout nhwc

设置Mean和Scale:通常使用归一化的输入数据训练神经网络模型。这意味着将输入数据值转换为特定范围内,例如 [0, 1] 或 [-1, 1]。有时,作为预处理的一部分,我们从输入数据值中减去平均值。输入数据预处理的实现方式有两种:

  • 输入预处理操作是模型的一部分。在这种情况下,应用程序不会将输入数据作为单独的步骤进行预处理:所有内容都嵌入到模型本身中。
  • 输入预处理操作不是模型的一部分,预处理是在为模型提供输入数据的应用程序中执行的(我们这个案例的情况)。

在第一种情况下,模型优化器生成具有所需预处理操作的 IR,并且不需要Mean和Scale参数。在第二种情况下,应向模型优化器提供有关Mean和Scale值的信息,以将其嵌入到生成的 IR 中。我们可以在命令中使用如下参数:

  • --mean_values
  • --scale_values
  • --scale

一个例子:mo --input_model unet.pdmodel --mean_values [123,117,104] --scale 255

修改输入通道:有时,您的应用程序的输入图像可以是 RGB (BGR) 格式,并且模型在 BGR (RGB) 格式的图像上进行训练,颜色通道顺序相反。在这种情况下,重要的是通过在推理之前恢复颜色通道来预处理输入图像。为了将此预处理步骤嵌入到 IR 中,模型优化器提供了 --reverse_input_channels 命令行参数来修改颜色通道。

这里,我们没有在模型变化的时候进行归一化处理,而是在后续导入图片后进行对应的预处理。

代码如下:

MODEL_DIR_IR = 'model/FP16'
MODEL_DIR_ONNX = 'model'
MODEL_NAME = 'mobileNetV2-lego-minifigures'

print("0 Load the original ONNX model and convert it into IR format.")
os.makedirs(MODEL_DIR_IR, exist_ok=True)
onnx_model_path = Path(MODEL_DIR_ONNX) / '{}.onnx'.format(MODEL_NAME)
ir_model_path = Path(MODEL_DIR_IR) / '{}.xml'.format(MODEL_NAME)
print("- Load ONNX model {}".format(onnx_model_path))
ir_path = Path(ir_model_path).with_suffix(".xml")

# Convert this model into the OpenVINO IR using the Model Optimizer:
mo_command = f"""mo
                 --framework=onnx
                 --input_model "{onnx_model_path}"
                 --input_shape "[1,3,224,224]"
                 --data_type FP16 
                 --output_dir "{MODEL_DIR_IR}"
                 """
mo_command = " ".join(mo_command.split())
print("- Model Optimizer command to convert ONNX to OpenVINO:")
display(Markdown(f"`{mo_command}`"))

# Run Model Optimizer if the IR model file does not exist
if not ir_path.exists():
    print("Exporting ONNX model to IR... This may take a few minutes.")
    ! $mo_command
else:
    print(f"- IR model {ir_path} converted. Already exists.")

我们可以将 ONNX 模型转化为 FP16 或者 FP32 精度的 IR 中间件模型。相关模型分别保存于./model/FP16/./model/FP32/路径下。

Terminal运行后打印:

0 Load the original ONNX model and convert it into IR format.
- Load ONNX model model\mobileNetV2-lego-minifigures.onnx
- Model Optimizer command to convert ONNX to OpenVINO:
mo --framework=onnx --input_model "model\mobileNetV2-lego-minifigures.onnx" --input_shape "[1,3,224,224]" --data_type FP16 --output_dir "model/FP16"

- IR model model\FP16\mobileNetV2-lego-minifigures.xml converted. Already exists.

3 导入 IR 模型

接下来,我们导入 IR 模型,代码如下:

print("1 Load the IR model.")
ie = Core()
model = ie.read_model(model=ir_path)
compiled_model = ie.compile_model(model=model, device_name="CPU")
input_layer_ir = compiled_model.input(0)
output_layer_ir = compiled_model.output(0)
print("- Input layer info: {}".format(input_layer_ir))
print("- Output layer info: {}".format(output_layer_ir))

Terminal打印:

1 Load the IR model.
- Input layer info: <ConstOutput: names[input.1] shape{1,3,224,224} type: f32>
- Output layer info: <ConstOutput: names[466] shape{1,37} type: f32>

需要注意,这个案例包含了37个label:[‘SPIDER-MAN’, ‘VENOM’, ‘AUNT MAY’, ‘GHOST SPIDER’, ‘YODA’, ‘LUKE SKYWALKER’, ‘R2-D2’, ‘MACE WINDU’, ‘GENERAL GRIEVOUS’, ‘KYLO REN’, ‘THE MANDALORIAN’, ‘CARA DUNE’, ‘KLATOOINIAN RAIDER 1’, ‘KLATOOINIAN RAIDER 2’, ‘MYSTERIO’, ‘FIREFIGHTER’, ‘SPIDER-MAN’, ‘HARRY POTTER’, ‘RON WEASLEY’, ‘BLACK WIDOW’, ‘YELENA BELOVA’, ‘TASKMASTER’, ‘CAPTAIN AMERICA’, ‘OUTRIDER 1’, ‘OUTRIDER 2’, ‘OWEN GRADY’, ‘TRACKER TRAQUEUR RASTREADOR’, ‘IRON MAN MK 1’, ‘IRON MAN MK 5’, ‘IRON MAN MK 41’, ‘IRON MAN MK 50’, ‘JANNAH’, ‘HAN SOLO’, ‘DARTH VADER’, ‘ANAKIN SKYWALKER’, ‘EMPEROR PALPATINE’, ‘OBI-WAN KENOBI’]。

4 在一张图像上进行模型推理

首先,我们尝试在一张图像上进行 IR 模型推理,已解释其中的每一个步骤。

  • 导入图片(cv2.imread(BASE_DIR+img_name)
  • 通过OpenCV对图像进行预训练,包括:
    • BGR转RGB(cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    • 图像长宽的resize,以符合IR模型的输入尺寸(cv2.resize(image, (W, H))
    • 通过mean和std对图像进行归一化(torchvision.transforms.Compose
    • 尺寸转换,最终得以和IR模型的输入尺寸完全一致
  • 模型推理(compiled_model([input_image])[output_layer_ir]
  • 找到最可能的识别对象结果(np.argmax(boxes)
  • 与Ground Truth对比

代码如下:

print("First let's test one image with IR model.")
# The base dataset directory
BASE_DIR = './archive/'
img_name = 'test/001.jpg'
df_metadata = pd.read_csv(os.path.join(BASE_DIR, 'metadata.csv'), index_col=0)
df_groundtruth = pd.read_csv(os.path.join(BASE_DIR, 'test.csv'), index_col=0)
df_groundtruth_result = df_groundtruth[df_groundtruth.index == img_name]
df_groundtruth_imageNames = df_groundtruth.index.to_list()
N_CLASSES = df_metadata.shape[0]
print('- Number of classes: ', N_CLASSES)

# N,C,H,W = batch size, number of channels, height, width
N, C, H, W = input_layer_ir.shape
print("2 Load the image, and reshape to the same size as model input.")
# Text detection models expects image in BGR format
image = cv2.imread(BASE_DIR+img_name)
print("- Image original shape: {0}".format(image.shape))
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
print("- First convert from BGR to RGB.")
# Resize image to meet network expected input sizes
image = cv2.resize(image, (W, H))
print("- Second convert image size to {}".format(image.shape))
transform = torchvision.transforms.Compose([
    torchvision.transforms.ToTensor(),
    torchvision.transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225],
    ),
])
# ToTensor() takes a PIL image (or np.int8 NumPy array) with shape (n_rows, n_cols, n_channels) as input and returns a PyTorch tensor with floats between 0 and 1 and shape (n_channels, n_rows, n_cols)
# Normalize() subtracts the mean and divides by the standard deviation of the floating point values in the range [0, 1].
normalized_img = transform(image)
normalized_img = normalized_img.numpy()
print("- Third, image shape after torchvision normalization is: {}".format(normalized_img.shape))
normalized_img = normalized_img.transpose(1, 2, 0)
print("- image shape is transposed into {}".format(normalized_img.shape))
# Reshape to network input shape
input_image = np.expand_dims(normalized_img.transpose(2, 0, 1), 0)
#plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
print("- image shape finally: {0}".format(input_image.shape))

print("3 Inference.")
# Create inference request
boxes = compiled_model([input_image])[output_layer_ir]
# Remove zero only boxes
#boxes = boxes[~np.all(boxes == 0, axis=1)]
result_index = np.argmax(boxes)
print("- Shape of inference result: {0}".format(boxes.shape))
labels_names = df_metadata['minifigure_name'].tolist()
labels_names_idx = df_metadata.index
label_groundtruth = df_groundtruth_result['class_id'][0]
print("- Final Predict classification result: {0}".format(labels_names_idx[result_index]))
print("- Final Ground Truth classification result: {0}".format(label_groundtruth))

Terminal 打印:

First let's test one image with IR model.
- Number of classes:  37
2 Load the image, and reshape to the same size as model input.
- Image original shape: (512, 512, 3)
- First convert from BGR to RGB.
- Second convert image size to (224, 224, 3)
- Third, image shape after torchvision normalization is: (3, 224, 224)
- image shape is transposed into (224, 224, 3)
- image shape finally: (1, 3, 224, 224)
3 Inference.
- Shape of inference result: (1, 37)
- Final Predict classification result: 32
- Final Ground Truth classification result: 32

5 计算模型的精度以及FPS

接下来,我们将所有测试数据都跑一遍,然后计算模型的精度以及FPS。

代码如下:

import time
# The base dataset directory
BASE_DIR = './archive/'
df_metadata = pd.read_csv(os.path.join(BASE_DIR, 'metadata.csv'), index_col=0)
df_groundtruth = pd.read_csv(os.path.join(BASE_DIR, 'test.csv'), index_col=0)
df_groundtruth_imageNames = df_groundtruth.index.to_list()
labels_names = df_metadata['minifigure_name'].tolist()
labels_names_idx = df_metadata.index
num_iter = len(df_groundtruth_imageNames)
N_CLASSES = df_metadata.shape[0]
# N,C,H,W = batch size, number of channels, height, width
N, C, H, W = input_layer_ir.shape

results = []

st = time.time()
for img_name in df_groundtruth_imageNames:
    # Text detection models expects image in BGR format
    image = cv2.imread(BASE_DIR+img_name)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    # Resize image to meet network expected input sizes
    image = cv2.resize(image, (W, H))
    transform = torchvision.transforms.Compose([
        torchvision.transforms.ToTensor(),
        torchvision.transforms.Normalize(
            mean=[0.485, 0.456, 0.406],
            std=[0.229, 0.224, 0.225],
        ),
    ])
    # ToTensor() takes a PIL image (or np.int8 NumPy array) with shape (n_rows, n_cols, n_channels) as input and returns a PyTorch tensor with floats between 0 and 1 and shape (n_channels, n_rows, n_cols)
    # Normalize() subtracts the mean and divides by the standard deviation of the floating point values in the range [0, 1].
    normalized_img = transform(image)
    normalized_img = normalized_img.numpy()
    normalized_img = normalized_img.transpose(1, 2, 0)
    # Reshape to network input shape
    input_image = np.expand_dims(normalized_img.transpose(2, 0, 1), 0)
    # Create inference request
    boxes = compiled_model([input_image])[output_layer_ir]
    # Remove zero only boxes
    result_index = np.argmax(boxes)
    label_ = labels_names_idx[result_index]
    df_groundtruth_result = df_groundtruth[df_groundtruth.index == img_name]
    label_groundtruth = df_groundtruth_result['class_id'][0]
    if label_ != label_groundtruth:
        results.append(1)
    else:
        results.append(0)

et = time.time()
elapsed_time = et - st
print('Execution time: {} seconds for {} images'.format(elapsed_time, num_iter))
print('FPS: {}'.format(int(num_iter/elapsed_time)))
print('Model accuracy: {}'.format(1-sum(results)/len(results)))

6 使用benchmark_app测试IR模型性能

为了测量 FP16 IR 模型的推理性能,我们使用 OpenVINO 的 Benchmark Tool。 可以在笔记本中运行:!benchmark_app%sx benchmark_app

注意:为了获得最准确的性能估计,我们建议在关闭其他应用程序后在终端/命令提示符下运行 benchmark_app。 运行 benchmark_app --help 以查看所有命令行选项。

!benchmark_app -m $ir_path -d CPU -api async -t 15 -b 1
# t means time, b means batch

Terminal打印:

[Step 1/11] Parsing and validating input arguments
[ WARNING ]  -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README. 
[Step 2/11] Loading OpenVINO
[ WARNING ] PerformanceMode was not explicitly specified in command line. Device CPU performance hint will be set to THROUGHPUT.
[ INFO ] OpenVINO:
         API version............. 2022.1.0-7019-cdb9bec7210-releases/2022/1
[ INFO ] Device info
         CPU
         openvino_intel_cpu_plugin version 2022.1
         Build................... 2022.1.0-7019-cdb9bec7210-releases/2022/1

[Step 3/11] Setting device configuration
[ WARNING ] -nstreams default value is determined automatically for CPU device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README.
[Step 4/11] Reading network files
[ INFO ] Read model took 56.00 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model input 'input.1' precision u8, dimensions ([N,C,H,W]): 1 3 224 224
[ INFO ] Model output '466' precision f32, dimensions ([...]): 1 37
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 125.00 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] DEVICE: CPU
[ INFO ]   AVAILABLE_DEVICES  , ['']
[ INFO ]   RANGE_FOR_ASYNC_INFER_REQUESTS  , (1, 1, 1)
[ INFO ]   RANGE_FOR_STREAMS  , (1, 8)
[ INFO ]   FULL_DEVICE_NAME  , Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz
[ INFO ]   OPTIMIZATION_CAPABILITIES  , ['FP32', 'FP16', 'INT8', 'BIN', 'EXPORT_IMPORT']
[ INFO ]   CACHE_DIR  , 
[ INFO ]   NUM_STREAMS  , 4
[ INFO ]   INFERENCE_NUM_THREADS  , 0
[ INFO ]   PERF_COUNT  , False
[ INFO ]   PERFORMANCE_HINT_NUM_REQUESTS  , 0
[Step 9/11] Creating infer requests and preparing input data
[ INFO ] Create 4 infer requests took 0.00 ms
[ WARNING ] No input files were given for input 'input.1'!. This input will be filled with random values!
[ INFO ] Fill input 'input.1' with random values 
[Step 10/11] Measuring performance (Start inference asynchronously, 4 inference requests using 4 streams for CPU, inference only: True, limits: 15000 ms duration)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 7.26 ms
[Step 11/11] Dumping statistics report
Count:          4072 iterations
Duration:       15013.77 ms
Latency:
    Median:     14.21 ms
    AVG:        14.65 ms
    MIN:        9.65 ms
    MAX:        52.41 ms
Throughput: 271.22 FPS

7 OpenVINO PreProcessing API 将预处理步骤集成并保存到 IR 中

对于许多应用程序来说,最小化模型的读取/加载时间也很重要,因此在 ov::runtime::Core::read_model 之后每次应用程序启动时执行预处理步骤的集成可能看起来不方便。在这种情况下,在添加预处理和后处理步骤之后,将新的执行模型存储为中间表示(IR,.xml 格式)会很有用。

7.1 Precision 调整

两个方面:输入的图像由无符号8整数值数组表示;模型接受浮点张量。

# 输入的图像由无符号8整数值数组表示
ppp.input().tensor().set_element_type(Type.u8)
# execution graph preprocessing 模型接受浮点张量
ppp.input().preprocess().convert_element_type(Type.f16)

7.2 Layout 调整

两个方面:输入的图像 Layout 是[1,512,512,3],也就是[NHWC];模型的输入尺寸格式是[1,3,224,224],也就是[NCHW]。所以这里我们需要对两者进行设置。

# 设置输入的图像 Layout 为 'NHWC'
ppp.input().tensor().set_layout(Layout('NHWC'))
# 模型的输入格式为 'NCHW'
ppp.input().model().set_layout(Layout('NCHW'))
# execution graph preprocessing 将'NHWC'转化为'NCHW'
ppp.input().preprocess().convert_layout([0, 3, 1, 2])

7.3 图像高宽尺寸调整

两个方面:在这个案例,我们输入的图像尺寸都是一致的,为[512,512,3];模型的输入尺寸格式是[1,3,224,224]。这里,我们需要调整图像的高宽尺寸。

# 设置输入的图像的尺寸,按照'NHWC'的顺序写
ppp.input().tensor().set_shape([1,512,512,3])
# 如果我们不确定输入图像的高宽尺寸,可以用set_spatial_dynamic_shape
# ppp.input().tensor().set_spatial_dynamic_shape()
# execution graph preprocessing 将输入图像高宽尺寸进行调整
ppp.input().preprocess().resize(ResizeAlgorithm.RESIZE_LINEAR, 224, 224)

7.4 Color 调整

两个方面:我们通过openCV输入的图像都是BGR的;模型的输入图像是RGB格式。

# 设置输入的图像的 color format
ppp.input().tensor().set_color_format(ColorFormat.BGR)
# execution graph preprocessing 将输入图像从BGR格式转化为RGB格式
ppp.input().preprocess().convert_color(ColorFormat.RGB)

代码如下:

from openvino.preprocess import PrePostProcessor, ColorFormat, ResizeAlgorithm
from openvino.runtime import Core, Layout, Type, set_batch
from openvino.runtime.passes import Manager

# ========  Step 0: read original model =========
core = Core()
model = core.read_model(model=onnx_model_path)

# ======== Step 1: Preprocessing ================
ppp = PrePostProcessor(model)
# Declare section of desired application's input format
# 这个指的是输入的图像 format,比如我们通过opencv导入的图像是0-255 像素范围,[height,width,channel]格式,BGR格式。
ppp.input().tensor() \
    .set_element_type(Type.u8) \
    .set_shape([1, 512, 512, 3]) \
    .set_layout(Layout('NHWC')) \
    .set_color_format(ColorFormat.BGR)
# use set_spatial_dynamic_shape if we are not sure the exact size of input image.

# Specify actual model layout
ppp.input().model().set_layout(Layout('NCHW'))

# Explicit preprocessing steps. Layout conversion will be done automatically as last step
ppp.input().preprocess() \
    .convert_element_type(Type.f16) \
    .convert_color(ColorFormat.RGB) \
    .convert_layout([0, 3, 1, 2]) \
    .resize(ResizeAlgorithm.RESIZE_LINEAR, 224, 224) \
    .mean([123.675, 116.28, 103.53]) \
    .scale([58.395, 57.12, 57.375])

# Dump preprocessor
print(f'Dump preprocessor: {ppp}')
model = ppp.build()

# ======== Step 2: Change batch size ================
# In this example we also want to change batch size to increase throughput
set_batch(model, 1)

# ======== Step 3: Save the model ================
pass_manager = Manager()
MODEL_DIR_IR_PP = 'model/FP16PP'
MODEL_NAME = 'mobileNetV2-lego-minifigures'
os.makedirs(MODEL_DIR_IR_PP, exist_ok=True)
irpp_model_path_xml = '{}/{}.xml'.format(MODEL_DIR_IR_PP,MODEL_NAME)
irpp_model_path_bin = '{}/{}.bin'.format(MODEL_DIR_IR_PP,MODEL_NAME)
pass_manager.register_pass(pass_name="Serialize",
                           xml_path=irpp_model_path_xml,
                           bin_path=irpp_model_path_bin)
pass_manager.run_passes(model)

Terminal 打印:

Dump preprocessor: Input "input.1" (color BGR):
User's input tensor: {1,512,512,3}, [N,H,W,C], u8
Model's expected tensor: {1,3,224,224}, [N,C,H,W], f32
Pre-processing steps (6):
    convert type (f16): ({1,512,512,3}, [N,H,W,C], u8, BGR) -> ({1,512,512,3}, [N,H,W,C], f16, BGR)
    convert color (RGB): ({1,512,512,3}, [N,H,W,C], f16, BGR) -> ({1,512,512,3}, [N,H,W,C], f16, RGB)
    convert layout (0,3,1,2): ({1,512,512,3}, [N,H,W,C], f16, RGB) -> ({1,3,512,512}, [N,C,H,W], f16, RGB)
    resize to (224, 224): ({1,3,512,512}, [N,C,H,W], f16, RGB) -> ({1,3,224,224}, [N,C,H,W], f16, RGB)
    mean (123.675,116.28,103.53): ({1,3,224,224}, [N,C,H,W], f16, RGB) -> ({1,3,224,224}, [N,C,H,W], f16, RGB)
    scale (58.395,57.12,57.375): ({1,3,224,224}, [N,C,H,W], f16, RGB) -> ({1,3,224,224}, [N,C,H,W], f16, RGB)
Implicit pre-processing steps (1):
    convert type (f32): ({1,3,224,224}, [N,C,H,W], f16, RGB) -> ({1,3,224,224}, [N,C,H,W], f32, RGB)

8 导入 OpenVINO PreProcessing API 集成后的 IR 模型

print("Load the IR model after OpenVINO PreProcessing API Integration.")
ie = Core()
modelpp = ie.read_model(model=irpp_model_path_xml)
compiled_modelpp = ie.compile_model(model=modelpp, device_name="CPU")
input_layer_irpp = compiled_modelpp.input(0)
output_layer_irpp = compiled_modelpp.output(0)
print("- Input layer info: {}".format(input_layer_irpp))
print("- Output layer info: {}".format(output_layer_irpp))

Terminal打印:

Load the IR model after OpenVINO PreProcessing API Integration.
- Input layer info: <ConstOutput: names[input.1] shape{1,512,512,3} type: u8>
- Output layer info: <ConstOutput: names[466] shape{1,37} type: f32>

9 计算模型的精度以及FPS(execution graph preprocessing)

接下来,我们将所有测试数据都跑一遍,然后计算模型的精度以及FPS。并且我们对比一下用OpenCV的预处理与OpenVINO Execution Graph PreProcessing (PP) 进行预处理的结果。

总体上来说,IR模型在没有加PP之前的benchmark_app值是271.22 FPS,在加了之后是183.83 FPS。但是opencv预处理模块更花时间(CV Execution time: 1.0087604522705078 seconds for 76 images CV,FPS: 75;PP Execution time: 0.7720448970794678 seconds for 76 images,PP FPS: 98)。大概IR模型加了PP之后可以每一帧节约3ms时间。而这个节约的时间也和CPU,或者GPU的性能,以及具体的使用情况有关。(在CPU上两者的区别不明显,但是在GPU上很明显)

import time
# The base dataset directory
BASE_DIR = './archive/'
df_metadata = pd.read_csv(os.path.join(BASE_DIR, 'metadata.csv'), index_col=0)
df_groundtruth = pd.read_csv(os.path.join(BASE_DIR, 'test.csv'), index_col=0)
df_groundtruth_imageNames = df_groundtruth.index.to_list()
labels_names = df_metadata['minifigure_name'].tolist()
labels_names_idx = df_metadata.index
num_iter = len(df_groundtruth_imageNames)
N_CLASSES = df_metadata.shape[0]
# N,C,H,W = batch size, number of channels, height, width
N, C, H, W = input_layer_ir.shape

'''
通过openCV进行预处理
'''
results = []
st = time.time()
for img_name in df_groundtruth_imageNames:
    # Text detection models expects image in BGR format
    image = cv2.imread(BASE_DIR+img_name)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    # Resize image to meet network expected input sizes
    image = cv2.resize(image, (W, H))
    transform = torchvision.transforms.Compose([
        torchvision.transforms.ToTensor(),
        torchvision.transforms.Normalize(
            mean=[0.485, 0.456, 0.406],
            std=[0.229, 0.224, 0.225],
        ),
    ])
    # ToTensor() takes a PIL image (or np.int8 NumPy array) with shape (n_rows, n_cols, n_channels) as input and returns a PyTorch tensor with floats between 0 and 1 and shape (n_channels, n_rows, n_cols)
    # Normalize() subtracts the mean and divides by the standard deviation of the floating point values in the range [0, 1].
    normalized_img = transform(image)
    normalized_img = normalized_img.numpy()
    normalized_img = normalized_img.transpose(1, 2, 0)
    # Reshape to network input shape
    input_image = np.expand_dims(normalized_img.transpose(2, 0, 1), 0)
    # Create inference request
    boxes = compiled_model([input_image])[output_layer_ir]
    # Remove zero only boxes
    result_index = np.argmax(boxes)
    label_ = labels_names_idx[result_index]
    df_groundtruth_result = df_groundtruth[df_groundtruth.index == img_name]
    label_groundtruth = df_groundtruth_result['class_id'][0]
    if label_ != label_groundtruth:
        results.append(1)
    else:
        results.append(0)

et = time.time()
elapsed_time = et - st
print('CV Execution time: {} seconds for {} images'.format(elapsed_time, num_iter))
print('CV FPS: {}'.format(int(num_iter/elapsed_time)))
print('CV Model accuracy: {}'.format(1-sum(results)/len(results)))


'''
通过 openvino execution graph preprocessing 进行预处理
'''

results = []
st = time.time()
for img_name in df_groundtruth_imageNames:
    # Text detection models expects image in BGR format
    image = cv2.imread(BASE_DIR+img_name)
    image = np.expand_dims(image, 0)
    # Create inference request
    boxes = compiled_modelpp([image])[output_layer_irpp]
    # Remove zero only boxes
    result_index = np.argmax(boxes)
    label_ = labels_names_idx[result_index]
    df_groundtruth_result = df_groundtruth[df_groundtruth.index == img_name]
    label_groundtruth = df_groundtruth_result['class_id'][0]
    if label_ != label_groundtruth:
        results.append(1)
    else:
        results.append(0)

et = time.time()
elapsed_time = et - st
print('PP Execution time: {} seconds for {} images'.format(elapsed_time, num_iter))
print('PP FPS: {}'.format(int(num_iter/elapsed_time)))
print('PP Model accuracy: {}'.format(1-sum(results)/len(results)))

Terminal返回:

CV Execution time: 0.9832742214202881 seconds for 76 images
CV FPS: 77
CV Model accuracy: 0.8289473684210527
PP Execution time: 0.882225751876831 seconds for 76 images
PP FPS: 86
PP Model accuracy: 0.8289473684210527

10 使用benchmark_app测试OpenVINO PreProcessing API 集成后的 IR 模型性能

!benchmark_app -m $irpp_model_path_xml -d CPU -api async -t 15 -b 1

Terminal返回:

[Step 1/11] Parsing and validating input arguments
[ WARNING ]  -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README. 
[Step 2/11] Loading OpenVINO
[ WARNING ] PerformanceMode was not explicitly specified in command line. Device CPU performance hint will be set to THROUGHPUT.
[ INFO ] OpenVINO:
         API version............. 2022.1.0-7019-cdb9bec7210-releases/2022/1
[ INFO ] Device info
         CPU
         openvino_intel_cpu_plugin version 2022.1
         Build................... 2022.1.0-7019-cdb9bec7210-releases/2022/1

[Step 3/11] Setting device configuration
[ WARNING ] -nstreams default value is determined automatically for CPU device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README.
[Step 4/11] Reading network files
[ INFO ] Read model took 68.01 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model input 'input.1' precision u8, dimensions ([N,H,W,C]): 1 512 512 3
[ INFO ] Model output '466' precision f32, dimensions ([...]): 1 37
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 168.00 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] DEVICE: CPU
[ INFO ]   AVAILABLE_DEVICES  , ['']
[ INFO ]   RANGE_FOR_ASYNC_INFER_REQUESTS  , (1, 1, 1)
[ INFO ]   RANGE_FOR_STREAMS  , (1, 8)
[ INFO ]   FULL_DEVICE_NAME  , Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz
[ INFO ]   OPTIMIZATION_CAPABILITIES  , ['FP32', 'FP16', 'INT8', 'BIN', 'EXPORT_IMPORT']
[ INFO ]   CACHE_DIR  , 
[ INFO ]   NUM_STREAMS  , 4
[ INFO ]   INFERENCE_NUM_THREADS  , 0
[ INFO ]   PERF_COUNT  , False
[ INFO ]   PERFORMANCE_HINT_NUM_REQUESTS  , 0
[Step 9/11] Creating infer requests and preparing input data
[ INFO ] Create 4 infer requests took 0.98 ms
[ WARNING ] No input files were given for input 'input.1'!. This input will be filled with random values!
[ INFO ] Fill input 'input.1' with random values 
[Step 10/11] Measuring performance (Start inference asynchronously, 4 inference requests using 4 streams for CPU, inference only: True, limits: 15000 ms duration)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 9.71 ms
[Step 11/11] Dumping statistics report
Count:          2764 iterations
Duration:       15035.84 ms
Latency:
    Median:     21.30 ms
    AVG:        21.64 ms
    MIN:        16.49 ms
    MAX:        35.65 ms
Throughput: 183.83 FPS
  • 2
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 2
    评论
【数据分析师-数据分析项⽬案例】⼤数据预处理 ⼤数据预处理 ⼿动反爬⾍,禁⽌转载: https://blog.csdn.net/lys_828/article/details/119902718(CSDN博主:Be_melting) 知识梳理不易,请尊重劳动成果,⽂章仅发布在CSDN⽹站上,在其他⽹站看到该博⽂均属于未经作者授权的恶意爬取信息 0 前⾔ 在进⾏数据分析项⽬案例之前,需要了解数据的情况,有时候拿到的数据并不是想象中的完美数据,那么就需要进⾏预处理后才能使⽤。为 了系统的缕清预处理的⼀般的步骤,这⾥进⾏详细的梳理,采⽤sklearn⼯具包和⼿写代码验证的⽅式进⾏。 1 数据标准化 1.1 标准化定义 标准化的定义:⼜被称为均值移除(mean removal),对不同样本的同⼀特征值进⾏处理,最终均值为0,标准差为1,采⽤此种⽅式我们只 需要使⽤如下公式即可。 数据均值 数据标准差 1.2 为什么要进⾏数据标准化? 在机器学习中,很多的算法和评估模型的好坏的⽅法都是基于距离(残差)的处理,也就是 或者是 ,因此在进⾏数据随 机采样的时候应该避免不同距离对模型影响,故需要进⾏标准化处理,保准随机取的数据是等距离的。说⼈话,借⽤图像举例,就是要把不 同的椭圆,最后处理成为正圆,这样在圆上取任意值到原点的距离都相等。 1.3 实例操作 x = scaled sd x mean z = σ X μ mean : sd : (x i x)2 (y i y)2 import numpy as np from sklearn import preprocessing data = np.array([[3, -1.5, 2, -5.4], [0, 4, -0.3, 2.1], [1, 3.3, -1.9, -4.3]]) data 输出结果为:(数据随机设定的,⽅便后⾯进⾏⼿动验证) 按照标准化的公式,要先计算均值和⽅差,那么有个问题就来了:计算的数据是横向(⼀⾏数据,axis = 1),还是纵向(⼀列数据,axis = 0)的呢?每⼀列(纵向)都代表着⼀个字段的数据,⽽每⼀⾏却包含了所有字段中的⼀个数据,⽽在计算均值和⽅差时候应该选取的是 某个字段进⾏,也就是需要计算纵向的数据 print('均值: ',data.mean(axis = 0)) print('标准差: ', data.std(axis = 0)) 输出结果为: 接下来就可以进⾏⼿动验证 import math math.sqrt(((3-1.33333333)**2+(0-1.33333333)**2+(1-1.33333333)**2)/3) 输出结果为:1.247219128924647(这⾥只进⾏第⼀列的标准差的验证,其余列也是⼀样的,均值可以⼝算) 最终标准化后的结果为:(以第⼀列第⼀⾏的数据进⾏展⽰) 以上的过程虽然原理很简单,操作起来也不是很难,但是要是每次进⾏数据处理之前都得⼀个数据⼀个数据的挨个处理,就显着很浪费时 间,因此就可以使⽤ preprocessing 函数进⾏处理 核⼼代码: preprocessing.scale() data_standarized = preprocessing.scale(data) print('均值: ',data_standarized.mean(axis = 0)) print('标准差: ', data_standarized.std(axis = 0)) data_standarized 输出的结果为:(⼀⾏代码搞定标准化。由于python计算精度的问题,均值这⾥实际上是为0的,10的负17次⽅,相当于很微⼩的数值 了) 2 数据缩放化 2.1 0-1缩放 对不同样本的同⼀特征值,减去其最⼩值,除以(最⼤值-最⼩值), 最终原最⼤值为1,原最⼩值为0,这样在数据分析时可以有效的消除不 同单位⼤⼩对最终结构的权重影响。(例如股票类信息,如果股价是5-7元之间浮动,但是每天成交量在100万上下,在不采⽤缩放的模式 下,成交量的数据权重会⽐股价⾼上⼏万倍,导致最终预测数据出现畸形) 2.2 实例操作 如果直接使⽤numpy进⾏操作,依据计算的公式求解如下 除了⼿动计算外,也可以直接调⽤sklearn中的模块,也是在 preprocessing 函数中,使⽤ MinMaxScaler ⽅法 核⼼代码: preprocessing.MinMaxScaler() data_scaler = preprocessing.MinMaxScaler(feature_range = (0,2)) data_scaled = data_scaler.fit_transform(data) data_scaled 输出结果

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

破浪会有时

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值