【python】tensorrt8版本下的onnx转tensorrt engine

博客介绍了如何在Python环境中使用TensorRT 8将ONNX模型转换为可执行的TensorRT引擎。作者在过程中遇到了CUDA和TRT版本冲突的问题,解决后尝试通过python onnx2trt代码生成引擎,但遇到显存分配问题。最终通过调整代码成功生成了引擎,但警告提示可能与cuBLAS版本不匹配。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

背景

最近解决了python版本为啥执行不了trtexec.exe生成的trt文件的bug,是环境里pytorch自带的cuda和trt的冲突,卸了重装了CPU版本就OK了。但是在我尝试解决的过程中出现了问题,环境有点玩坏了,之后用trtexec.exe生成的engine直接摆烂,输出的全是NaN。行吧,那我在python环境里生成吧。但是网上搜索的onnx2tensorrt代码基本上是7代之前的,我之前试了一些跑不出东西来。今天参考官方代码,记录以下tensorrt8在python环境下如何把onnx转为engine。

参考

github代码

简单流程

实际上用不到官方代码的那么多,只需要选一部分就行了:

import tensorrt as trt
import os

EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
TRT_LOGGER = trt.Logger()


def get_engine(onnx_file_path, engine_file_path=""):
    """Attempts to load a serialized engine if available, otherwise builds a new TensorRT engine and saves it."""

    def build_engine():
        """Takes an ONNX file and creates a TensorRT engine to run inference with"""
        with trt.Builder(TRT_LOGGER) as builder, builder.create_network(
            EXPLICIT_BATCH
        ) as network, builder.create_builder_config() as config, trt.OnnxParser(
            network, TRT_LOGGER
        ) as parser, trt.Runtime(
            TRT_LOGGER
        ) as runtime:
            config.max_workspace_size = 1 << 32  # 4GB
            builder.max_batch_size = 1
            # Parse model file
            if not os.path.exists(onnx_file_path):
                print(
                    "ONNX file {} not found, please run yolov3_to_onnx.py first to generate it.".format(onnx_file_path)
                )
                exit(0)
            print("Loading ONNX file from path {}...".format(onnx_file_path))
            with open(onnx_file_path, "rb") as model:
                print("Beginning ONNX file parsing")
                if not parser.parse(model.read()):
                    print("ERROR: Failed to parse the ONNX file.")
                    for error in range(parser.num_errors):
                        print(parser.get_error(error))
                    return None

            # # The actual yolov3.onnx is generated with batch size 64. Reshape input to batch size 1
            # network.get_input(0).shape = [1, 3, 608, 608]

            print("Completed parsing of ONNX file")
            print("Building an engine from file {}; this may take a while...".format(onnx_file_path))
            plan = builder.build_serialized_network(network, config)
            engine = runtime.deserialize_cuda_engine(plan)
            print("Completed creating Engine")
            with open(engine_file_path, "wb") as f:
                f.write(plan)
            return engine

    if os.path.exists(engine_file_path):
        # If a serialized engine exists, use it instead of building an engine.
        print("Reading engine from file {}".format(engine_file_path))
        with open(engine_file_path, "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
            return runtime.deserialize_cuda_engine(f.read())
    else:
        return build_engine()


def main():
    """Create a TensorRT engine for ONNX-based YOLOv3-608 and run inference."""

    # Try to load a previously generated YOLOv3-608 network graph in ONNX format:
    onnx_file_path = "model.onnx"
    engine_file_path = "model.trt"

    get_engine(onnx_file_path, engine_file_path)


if __name__ == "__main__":
    main()

摘抄下来之后我根据自己的模型删了一些东西,参考的这个模型的onnx好像batch是10,它自己就加了转为1的部分,我手头的模型文件已经是了,就删掉了。动态输入大小我没需要搞,就没看咋加。用pycharm试了下失败了,好像是显存分配有点问题,不知道设置的1<<32的大小是不是有点莽。但是我改到pycharm的终端敲代码执行成功了,就很抽象。

(mypytorch) PS F:\DeepStereo\AppleShow2> python onnx2trt.py      
onnx2trt.py:20: DeprecationWarning: Use set_memory_pool_limit instead.
  config.max_workspace_size = 1 << 32  # 4GB
Loading ONNX file from path G:\jupyter\Model_Zoo\resources_iter10_modify\crestereo_combined_iter10_240x320.onnx...
Beginning ONNX file parsing
[06/16/2022-16:59:16] [TRT] [W] onnx2trt_utils.cpp:365: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down t
o INT32.
Completed parsing of ONNX file
Building an engine from file G:\jupyter\Model_Zoo\resources_iter10_modify\crestereo_combined_iter10_240x320.onnx; this may take a while...
[06/16/2022-17:03:52] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.8.0 but loaded cuBLAS/cuBLAS LT 11.3.1
[06/16/2022-17:07:42] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.8.0 but loaded cuBLAS/cuBLAS LT 11.3.1
[06/16/2022-17:07:43] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.8.0 but loaded cuBLAS/cuBLAS LT 11.3.1
Completed creating Engine
### 使用PythonTensorRT部署YOLOv11模型 为了在 Python 中使用 TensorRTONNX 部署 YOLOv11 模型,主要流程涉及几个关键步骤。这些步骤包括准备环境、换模型格式、加载并优化模型以及执行推理。 #### 准备工作 确保安装必要的库和支持工具,如 `tensorrt` 及其 Python API 接口,还有用于操作 ONNX 文件的相关包。可以通过 pip 安装所需依赖项: ```bash pip install nvidia-pyindex tensorrt onnx numpy opencv-python ``` #### ONNXEngine 将训练好的 YOLOv11 的 ONNX 模型化为 TensorRT engine 文件是一个重要环节。这一步骤能够显著提升后续推断过程中的性能表现效率。具体实现如下所示: ```python import tensorrt as trt from pathlib import Path def build_engine(onnx_file_path, engine_file_path=""): TRT_LOGGER = trt.Logger(trt.Logger.WARNING) with trt.Builder(TRT_LOGGER) as builder, \ builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)) as network, \ trt.OnnxParser(network, TRT_LOGGER) as parser: config = builder.create_builder_config() config.max_workspace_size = 1 << 30 with open(onnx_file_path, 'rb') as model: if not parser.parse(model.read()): for error in range(parser.num_errors): print(parser.get_error(error)) return None engine = builder.build_serialized_network(network, config) if engine and engine_file_path: with open(engine_file_path, "wb") as f: f.write(engine) return engine ``` 此函数接收两个参数:一个是待化的 `.onnx` 文件路径;另一个则是目标保存位置(可选)。它会返回一个序列化的 TensorRT Engine 对象[^1]。 #### 加载与初始化 一旦拥有了 .engine 文件之后,则可以轻松地将其加载入内存,并准备好进行实际的数据预测任务之前的一些准备工作,比如设置输入张量尺寸等。 ```python import pycuda.driver as cuda import pycuda.autoinit import numpy as np class HostDeviceMem(object): def __init__(self, host_mem, device_mem): self.host = host_mem self.device = device_mem def __str__(self): return "Host:\n" + str(self.host) + "\nDevice:\n" + str(self.device) def allocate_buffers(engine): inputs = [] outputs = [] bindings = [] stream = cuda.Stream() for binding in engine: size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size dtype = trt.nptype(engine.get_binding_dtype(binding)) # Allocate host and device buffers host_mem = cuda.pagelocked_empty(size, dtype) device_mem = cuda.mem_alloc(host_mem.nbytes) # Append the device buffer to device bindings. bindings.append(int(device_mem)) # Append to the appropriate list. if engine.binding_is_input(binding): inputs.append(HostDeviceMem(host_mem, device_mem)) else: outputs.append(HostDeviceMem(host_mem, device_mem)) return inputs, outputs, bindings, stream with open('model.engine', 'rb') as f, trt.Runtime(TRT_LOGGER) as runtime: engine = runtime.deserialize_cuda_engine(f.read()) context = engine.create_execution_context() inputs, outputs, bindings, stream = allocate_buffers(engine) ``` 上述代码片段展示了如何定义辅助类来管理主机端和 GPU 上分配给各个绑定点 (即网络层间传递数据的地方) 所需的空间资源。同时也包含了从磁盘读取预先构建好 `.engine` 文件的方法[^2]。 #### 数据预处理与后处理 对于图像识别应用而言,通常还需要考虑图片格式调整、缩放变换等问题。这部分逻辑可以根据具体的业务需求灵活定制。这里仅给出简单的例子作为参考: ```python import cv2 def preprocess(image_path): image = cv2.imread(image_path) resized_image = cv2.resize(image, (input_w, input_h), interpolation=cv2.INTER_LINEAR) img_in = cv2.cvtColor(resized_image, cv2.COLOR_BGR2RGB) img_in = np.transpose(img_in, (2, 0, 1)).astype(np.float32) img_in = np.expand_dims(img_in, axis=0) img_in /= 255.0 return img_in.flatten(), resized_image.shape[:2] image_data, original_shape = preprocess("test.jpg") np.copyto(inputs[0].host, image_data) [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs] stream.synchronize() ``` 这段脚本负责把原始 RGB 图像按照指定大小重采样成适合送入神经网络的形式,并完成归一化处理以便于计算单元更好地理解特征分布情况[^3]。 #### 进行推理运算 最后就是调用 TensorRT 提供的功能接口来进行前向传播计算啦! ```python context.execute_v2(bindings) [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs] stream.synchronize() output = [out.host.reshape(-1).tolist() for out in outputs][0] print(output) ``` 通过以上几段程序组合起来就可以实现在 Windows 平台上利用 Python 结合 TensorRT 来加速基于 ONNX 表达形式下的 YOLOv11 物体检测算法
评论 11
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值