- pb转uff
转换的方法有三种:
- python的方法一:
graphdef = gs.DynamicGraph(frozen_file)
uff.from_tensorflow(graphdef, output_nodes=[], preprocessor=None, **kwargs)
- python的方法二:
uff.from_tensorflow_frozen_model(frozen_file, output_nodes=[], preprocessor=None, **kwargs)
- 使用convert工具为方法三:
convert-to-uff frozen_inference_graph.pb -O output_nodes -p config.py
- tensort加载网络
TRT_LOGGER = trt.Logger(trt.Logger.INFO)
uff_model = uff.from_tensorflow(dynamic_graph.as_graph_def(), model.output_name, output_filename='tmp.uff')
builder = trt.Builder(TRT_LOGGER)
network = builder.create_network()
parser = trt.UffParser()
parser.register_input("Input", [3, h, w])
parser.register_output("net_outnode")
parser.parse('tmp.uff', network)
engine = builder.build_cuda_engine(network)
- 加载数据并执行网络
获取输入输出的个数并分别开辟内存:
for binding in engine:
size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
print(size)
host_mem = cuda.pagelocked_empty(size, np.float32)
cuda_mem = cuda.mem_alloc(host_mem.nbytes)
bindings.append(int(cuda_mem))
if engine.binding_is_input(binding):
host_inputs.append(host_mem)
cuda_inputs.append(cuda_mem)
else:
print("output --------------------- buffer ")
host_outputs.append(host_mem)
cuda_outputs.append(cuda_mem)
context = engine.create_execution_context()
输入图像到内存当中并执行:
stream = cuda.Stream()
np.copyto(host_inputs[0], image.ravel())
start_time = time.time()
cuda.memcpy_htod_async(cuda_inputs[0], host_inputs[0], stream)
context.execute_async(bindings=bindings, stream_handle=stream.handle)
cuda.memcpy_dtoh_async(host_outputs[0], cuda_outputs[0], stream)
stream.synchronize()
-
tensorrt做inferece需要注意通道循序的改变问题
tensorrt输入支持的操作为NCHW,所以数据输入都需要转化为NCHW格式,但是其经过tensorrt之后,输出的结果会自动转化为NHWC格式,但是如果网络结构后面有sum操作或这div操作等,其仍然是使用NCHW的格式,但是tensorrt输出的格式已经转为NHWC格式,所以需要修改sum操作的sum的维度,用于适应这种变化,否则无法输出正确结果。 -
tensorflow转到trt还有两条路可选
- tensorflow 转onnx,然后使用onnx runtime进行验证。最后用TRT中的onnx parser进行解析实现;
- tensorflow转pth,然后再转onnx;