在上一篇文章中C#使用onnxruntime进行预测,我展示了如何使用C#去读取一个resnet50的onnx模型,并且进行图像分类的预测。但我们把这个模型应用到实际项目中时,可能还存在这样一个问题:有时我们需要把多个图像同时送入模型,但此时该模型只能输出第一个图像的结果,而对其他的图像视而不见。这是因为该模型的输入batch是1,每次只能进行一张图像的识别。这里,我们就需要修改模型的输入,使其能够满足多个图像同时识别的需求。
修改输入的代码如下,resnet50-v2-7.onnx是我们从onnx的modelzoo下载到的模型,它默认的输入数量是1,当batch_size设置为数字或可转换为数字的字符串时,则模型被设置为该数值。当batch_size被设置为无法转换为数值的字符串时,模型被设置为N,即可变的输入数量。
import onnx
def change_input_dim(model,):
batch_size = "16"
# The following code changes the first dimension of every input to be batch_size
# Modify as appropriate ... note that this requires all inputs to
# have the same batch_size
inputs = model.graph.input
for input in inputs:
# Checks omitted.This assumes that all inputs are tensors and have a shape with first dim.
# Add checks as needed.
dim1 = input.type.tensor_type.shape.dim[0]
# update dim to be a symbolic value
if isinstance(batch_size, str):
# set dynamic batch size
dim1.dim_param = batch_size
elif (isinstance(batch_size, str) and batch_size.isdigit()) or isinstance(batch_size, int):
# set given batch size
dim1.dim_value = int(batch_size)
else:
# set batch size of 1
dim1.dim_value = 1
def apply(transform, infile, outfile):
model = onnx.load(infile)
transform(model,)
onnx.save(model, outfile)
apply(change_input_dim, 'resnet50-v2-7.onnx', 'resnet50-v2-16input.onnx')
作为测试,我们分别把模型的输入设置为1,16,N,然后用该模型去进行识别任务,以此来观察不同输入数量的情况下,处理速度的差异:
处理和统计时间的代码如下:
import numpy as np # we're going to use numpy to process input and output data
import onnxruntime # to inference ONNX models, we use the ONNX Runtime
import onnx
from onnx import numpy_helper
import urllib.request
import json
import time
# display images in notebook
import matplotlib.pyplot as plt
from PIL import Image, ImageDraw, ImageFont
test_data_dir = 'resnet50v2/test_data_set'
test_data_num = 3
import glob
import os
# Load inputs
inputs = []
for i in range(test_data_num):
input_file = os.path.join(test_data_dir + '_{}'.format(i), 'input_0.pb')
tensor = onnx.TensorProto()
with open(input_file, 'rb') as f:
tensor.ParseFromString(f.read())
inputs.append(numpy_helper.to_array(tensor))
print('Loaded {} inputs successfully.'.format(test_data_num))
# Load reference outputs
ref_outputs = []
for i in range(test_data_num):
output_file = os.path.join(test_data_dir + '_{}'.format(i), 'output_0.pb')
tensor = onnx.TensorProto()
with open(output_file, 'rb') as f:
tensor.ParseFromString(f.read())
ref_outputs.append(numpy_helper.to_array(tensor))
print('Loaded {} reference outputs successfully.'.format(test_data_num))
# Run the model on the backend
session = onnxruntime.InferenceSession('resnet50-v2-16input.onnx', None)
input_batch_size = 16
# get the name of the first input of the model
input_name = session.get_inputs()[0].name
print('Input Name:', input_name)
outputs = [session.run([], {input_name: inputs[i]})[0] for i in range(test_data_num)]
print('Predicted {} results.'.format(len(outputs)))
# Compare the results with reference outputs up to 4 decimal places
for ref_o, o in zip(ref_outputs, outputs):
np.testing.assert_almost_equal(ref_o, o, 4)
print('ONNX Runtime outputs are similar to reference outputs!')
def load_labels(path):
with open(path) as f:
data = json.load(f)
return np.asarray(data)
def preprocess(input_data):
# convert the input data into the float32 input
img_data = input_data.astype('float32')
# normalize
mean_vec = np.array([0.485, 0.456, 0.406])
stddev_vec = np.array([0.229, 0.224, 0.225])
norm_img_data = np.zeros(img_data.shape).astype('float32')
for i in range(img_data.shape[0]):
norm_img_data[i, :, :] = (img_data[i, :, :] / 255 - mean_vec[i]) / stddev_vec[i]
# add batch channel
norm_img_data = norm_img_data.reshape(1, 3, 224, 224).astype('float32')
return norm_img_data
def softmax(x):
x = x.reshape(-1)
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum(axis=0)
def postprocess(result):
return softmax(np.array(result)).tolist()
labels = load_labels('imagenet-simple-labels.json')
image = Image.open('dog.png')
# image = Image.open('images/plane.jpg')
print("Image size: ", image.size)
plt.axis('off')
display_image = plt.imshow(image)
image_data = np.array(image).transpose(2, 0, 1)
input_batch = []
input_data = preprocess(image_data)
for I in range(0,input_batch_size):
input_batch.append(input_data[:])
input_batch = np.reshape(input_batch,(input_batch_size,3,224,224))
for N in range(0,10):
raw_result = session.run([], {input_name: input_data})
start = time.time()
for N in range(0,100):
raw_result = session.run([], {input_name: input_batch})
end = time.time()
first_result = raw_result[0][0,:].reshape((1,1000))
res = postprocess(first_result)
inference_time = np.round((end - start) * 1000, 2)
idx = np.argmax(res)
print('========================================')
print('Final top prediction is: ' + labels[idx])
print('========================================')
print('========================================')
print('Inference time: ' + str(inference_time) + " ms")
print('========================================')
sort_idx = np.flip(np.squeeze(np.argsort(res)))
print('============ Top 5 labels are: ============================')
print(labels[sort_idx[:5]])
print('===========================================================')
plt.axis('off')
display_image = plt.imshow(image)
plt.show()
识别得到的结果和预测时间如图显示。接下来我们对比不同输入尺寸的预测时间:
当模型输入尺寸和实际输入数据尺寸都为16时,每组输入的用时为45.2ms,相当于每张图片接近3ms
当模型输入尺寸和实际输入数据尺寸都为1时,每组用时5.3ms
当模型输入尺寸为N,实际输入尺寸为1时,用时与模型输入为1时几乎一样。
当模型输入尺寸为N,实际输入尺寸为16时,用时与模型输入为16时几乎一样。
这样,我们可以得到结论,如果在inference框架支持的情况下,选择为可变输入是一个很好的选择。