TensorRT推理过程出现condition: binding[x] != nullptr，output全0

炒鸡稀饭

已于 2023-08-07 17:28:54 修改

阅读量589

点赞数 4

文章标签：目标跟踪人工智能计算机视觉

于 2023-08-07 17:28:23 首次发布

本文链接：https://blog.csdn.net/qq_73794703/article/details/132147879

版权

一，问题出现背景

我的环境情况如下：Ubuntu18.04, python3.6.9, numpy1.19.3, pytorch1.8.0, torchvision0.9.0, onnx1.9.0, onnxruntime1.8.0, TensorRT8.2.1, pycuda2020.1

为了使用TensorRT加速Yolo目标检测模型的推理，我需要把pt模型变为TensorRT可支持的engine。先用Yolo自带的export.py将pt模型转为onnx模型，通过给相同的输入比对两个模型的输出，来判断onnx模型是否正确。再使用TensorRT自带的trtexec将onnx模型转为engine。使用engine推理时出现condition: binding[x] != nullptr，导致输出全0。

错误代码：

#创建engine实例
f = open("first.engine", "rb")
runtime = trt.Runtime(trt.Logger(trt.Logger.WARNING))
engine = runtime.deserialize_cuda_engine(f.read())
context = engine.create_execution_context()
#分配CPU锁页内存
h_input = cuda.pagelocked_empty(trt.volume(context.get_binding_shape(0)), dtype = np.float32)
h_output = cuda.pagelocked_empty(trt.volume(context.get_binding_shape(4)), dtype = np.float32)
#分配GPU显存
d_input = cuda.mem_alloc(h_input.nbytes)
d_output = cuda.mem_alloc(h_output.nbytes)

bindings = [int(d_input), int(d_output)]
stream = cuda.stream()

二，原因分析

首先，出现biinding[x] != nullptr，是因为没有正确分配内存显存。engine模型和onnx，pt模型不一样会有好几个输出。如果只为d_input和d_output分配内存，就会出现这个错误。导致GPU上没有为d_input和d_output分配到合理内存(valid memory)，也就是输入全为0，从而输出全为0。

三，解决方案

为所有的binding分配到内存显存，即可解决问题。

首先要知道engine一共有几个binding，代码如下：（前面已创建了engine实例）

for binding in engine:
    dims = engine.get_binding_shape(binding)
    size = trt.volume(dims)
    print("The size of binding is", size)
    print("The dimension of binding is", dims)
    print(binding)
    print("input = ", engine.binding_is_input(binding))
    print("dtype =", trt.nptype(engine.get_binding_dtype(binding)))

我这里一共有5个，1个输入和4个输出，最后1个输出是我想要的。知道原因后就可以修改前面分配内存显存的代码了。

#为每个binding分配CPU锁页内存
h_input = cuda.pagelocked_empty(trt.volume(context.get_binding_shape(0)), dtype = np.float32)
f_ouput = cuda.pagelocked_empty(trt.volume(context.get_binding_shape(1)), dtype = np.float32)
s_output = cuda.pagelocked_empty(trt.volume(context.get_binding_shape(2)), dtype = np.float32)
t_output = cuda.pagelocked_empty(trt.volume(context.get_binding_shape(3)), dtype = np.float32)
h_output = cuda.pagelocked_empty(trt.volume(context.get_binding_shape(4)), dtype = np.float32)
 #分配GPU显存
d_input = cuda.mem_alloc(h_input.nbytes)
f_d_output = cuda.mem_alloc(f_output.nbytes)
s_d_output = cuda.mem_alloc(s_output.nbytes)
t_d_output = cuda.mem_alloc(t_output.nbytes)
d_output = cuda.mem_alloc(h_output.nbytes)
#绑定输入输出
bindings = [int(d_input), int(f_d_output), int(s_d_output), int(t_d_output), int(d_output)]

stream = cuda.stream()

修改分配内存显存的那段代码之后，报错没有了，输出也和之前onnx模型一样了，问题解决。