python版tensorrt推理

Thomas_Cai

已于 2024-02-27 10:34:46 修改

阅读量2.6k

点赞数 28

分类专栏：深度学习 TensorRT Python技术文章标签： windows python tensorrt 人工智能推理加速模型优化

于 2024-01-25 19:02:09 首次发布

本文链接：https://blog.csdn.net/ThomasCai001/article/details/135851584

版权

深度学习同时被 3 个专栏收录

60 篇文章 2 订阅

订阅专栏

Python技术

16 篇文章 1 订阅

订阅专栏

TensorRT

6 篇文章 0 订阅

订阅专栏

在这里插入图片描述

笔者有个需求，如题，简单来说就是用tensorrt加速推理，然后踩了众多坑，这里总结如下。先说结论，最后是onnx转为onnx-sim版本，已经查过精度几乎没啥影响，然后转trt，最后用python脚本推理trt搞定。

这里做了一些尝试，笔者这里尝试出现的问题可能跟模型有关，大家自行选择，想跟笔者一样操作直接从尝试二开始看。
注意: 在此之前肯定要先安装tensorrt才行，可以参考我的另一篇博客在windows安装python版本的tensorrt

环境记录：(笔者自己记录，仅参考)
torch-1.13.0+cu117
torchvision 0.14.0+cu117
torchaudio-0.13.0+cu117

尝试一：利用torch导出的pth文件，调用torch2trt来进行trt推理

参考官方的github：https://github.com/NVIDIA-AI-IOT/torch2trt
这里粘贴一下，很简单，搭建环境，然后再导入模型后直接一行推理

1.1 搭建环境

git clone https://github.com/NVIDIA-AI-IOT/torch2trt
cd torch2trt
python setup.py install

1.2 如何trt推理

import torch
from torch2trt import torch2trt
from torchvision.models.alexnet import alexnet

# create some regular pytorch model...
model = alexnet(pretrained=True).eval().cuda()

# create example data
x = torch.ones((1, 3, 224, 224)).cuda()

# convert to TensorRT feeding sample data as input
model_trt = torch2trt(model, [x])

y = model(x)  # 原始推理方式
y_trt = model_trt(x)  # trt推理方式

# check the output against PyTorch
print(torch.max(torch.abs(y - y_trt)))

1.3 遇到的问题

问题一：在验证torch2trt的时候遇到的问题

OSError: [WinError 127] 找不到指定的程序。 Error loading "\path\to\site-packages\torch\lib\cublas64_11.dll" or one of its dependencies.

解决方案：
这里网上很多说pytorch版本问题，当然也有可能，但这里我最直接的问题是路径中我的是大写的Lib，而这个是小写的lib，因此把环境变量加入，我就ok了，具体操作（在windows上）：
windows上查看环境变量

set

windows上设置环境变量（当然这里你要找下你的cublas64_11.dll在哪儿）

set PATH=\path\to\Lib\site-packages\torch\lib;（其他一堆你原来的环路径，用英文的分号隔开）

然后就OK了。

问题二：在torch2trt转模型的时候遇到的问题，也就是跑上述推理trt的脚本

File "XXXXXX\lib\site-packages\torch2trt-0.4.0-py3.7.egg\torch2trt\torch2trt.py", line 300, in wrapper
    converter["converter"](ctx)
  File "XXXXXX\lib\site-packages\torch2trt-0.4.0-py3.7.egg\torch2trt\converters\getitem.py", line 30, in convert_tensor_getitem
    input_trt = input._trt
AttributeError: 'Tensor' object has no attribute '_trt'

这个问题就没解决，网上查了资料说是有算子不支持，为了验证这个问题，我就去用命令行直接把onnx转到trt，试一试能不能行，然后发现果真不行，仍然报了如下错误：

onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.

这个问题直接解决也是很费事，然后查到可以转sim版本的，而且也几乎不影响精度，于是乎onnx去转sim版本的onnx。

尝试二：把onnx模型转为sim版的onnx模型

2.1 搭建onnxim环境

要配置 onnxsim 环境，你可以按照以下步骤进行：

安装 onnxsim:
你可以使用 pip 安装 onnxsim。在命令行中运行以下命令：
```
pip install onnx-simplifier
```
确保依赖项已安装:
onnxsim 依赖于 ONNX，因此确保你已经安装了 ONNX。你可以使用以下命令安装 ONNX：
```
pip install onnx
```

2.2 使用 onnxsim

安装完成后，你可以在命令行中使用 onnxsim 命令行工具。例如：

python -m onnxsim input.onnx output.onnx

其中，input.onnx 是输入的 ONNX 模型文件名，output.onnx 是输出的简化后的 ONNX 模型文件名。
你可以根据需要调整输入和输出的文件路径。

请注意，onnxsim 主要用于简化和优化模型，并不会影响模型的运行时行为。如果你的主要目标是简化 ONNX 模型以减小文件大小和提高加载速度，onnxsim 是一个很有用的工具。

如此，就可以完成onnx-sim模型转换。

尝试三：把onnx-sim转到trt，并推理trt

3.1 把onnxsim转到trt (windows上)

trtexec.exe --onnx=XXXXXX.onnx --saveEngine=XXXXXXX.engine --workspace=6000

成功转换~

3.2 推理trt

参考官方：https://github.com/NVIDIA/TensorRT/blob/main/quickstart/IntroNotebooks/4.%20Using%20PyTorch%20through%20ONNX.ipynb

这边也记录一下：

首先安装pycuda环境
```
pip install pycuda
```

读取engine

import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit

f = open("resnet_engine_pytorch.trt", "rb")
runtime = trt.Runtime(trt.Logger(trt.Logger.WARNING)) 
engine = runtime.deserialize_cuda_engine(f.read())
context = engine.create_execution_context()

分配输入输出内存

import numpy as np# need to set input and output precisions to FP16 to fully enable it

input_batch = np.ones([1, 3, h, w], dtype=target_dtype)  # such as: np.float32
output = np.empty([BATCH_SIZE, 1000], dtype = target_dtype) 

# allocate device memory
d_input = cuda.mem_alloc(1 * input_batch.nbytes)
d_output = cuda.mem_alloc(1 * output.nbytes)
bindings = [int(d_input), int(d_output)]stream = cuda.Stream()

执行推理

def predict(batch): # result gets copied into output
    # transfer input data to device
    cuda.memcpy_htod_async(d_input, batch, stream)# execute model
    context.execute_async_v2(bindings, stream.handle, None)# transfer predictions back
    cuda.memcpy_dtoh_async(output, d_output, stream)# syncronize threads
    stream.synchronize()
    d_input.free()
    d_output.free()
    return output
    
pred = predict(preprocessed_images)

这样就推理ok了，不过这边也还遇到一个问题，顺便记录下。

3.3 遇到的问题

问题1

ValueError: ndarray is not contiguous

后面发现是preprocessed_images有问题，用这个即可校验：

print(batch.flags['C_CONTIGUOUS'])
print(batch.flags['F_CONTIGUOUS'])

解决方案：
最后是连续化一下向量即可：

if not (preprocessed_images.flags['C_CONTIGUOUS'] or preprocessed_images.flags['F_CONTIGUOUS']):
    preprocessed_images= np.ascontiguousarray(preprocessed_images)