Tensorrt的安装、模型转换、推理demo编写

最新推荐文章于 2024-06-27 16:38:54 发布

未来人生爱美食

最新推荐文章于 2024-06-27 16:38:54 发布

阅读量890

点赞数 8

文章标签：深度学习人工智能

本文链接：https://blog.csdn.net/qq_40105923/article/details/136744589

版权

一、Tensorrt的安装

Tensorrt的安装；如果单纯考虑是调用该库的话，python下的话，直接用pip install tensorrt便可；若是为了使用trtexec将onnx转为trt或者engine的话，需要手动来安装了。

首先要依据你的cuda版本来确定Tensorrt的版本来安装；依cuda版本为11.2 ubutun操作系统为例，需要进入nvidia官网找到Tensorrt页面选择相应的版本，本文是选择的Tensorrt8.5.1.7进行安装的。如下图

点击该处下载；将其放入ubutun操作系统相应位置；随后进行如下操作

tar xzvf TensorRT-8.5.1.7.Linux.x86_64-gnu.cuda-11.6.cudnn8.4.tar.gz

vim ~/.bashrc

添加：

source ~/.bashrc重新激活环境变量

随后输入trtexec，会出现如下：

二、模型转换

至此，可以将onnx模型转换为Tensorrt模型了;命令如下：

trtexec --onnx=model.onnx --saveEngine=model.trt

便可进行转换；

三、python版本推理

该版本推理需要安装Tensorrt与pycuda两个库；

Pycuda库的安装

a.可以pip install Pycuda

但是docker里面一直报错

b.可以手动安装pycuda

tar xfz pycuda-VERSION.tar.gz

$ cd pycuda-VERSION # if you're not there already

$ python configure.py --cuda-root=/where/ever/you/installed/cuda

$ su -c "make install"

这个也是报错

c. sudo apt-get install python3-pycuda

这个可以

Tensorrt的安装;

可以使用pip install Tensorrt

推理代码如下：

#导入必用依赖
"""
"""
model_path = "./model.engine"
import tensorrt as trt
verbose = True 
IN_NAME = 'input_1' 
OUT_NAME = 'conve_14' 
IN_H = 512
IN_W = 512
BATCH_SIZE = 1 

EXPLICIT_BATCH = 1 << (int)( 
    trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH) 
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np
import time
import cv2
import os
from tensorflow import keras
# 加载TRT引擎
# engine_file_path = 'model_fp16.trt'
#engine_file_path = 'model_tesfp16.trt'
class OxfordPets1(keras.utils.Sequence):

    # 在__init__方法中指定batch_size,img_size,input_img_paths,target_img_paths
    def __init__(self, batch_size, img_size, input_img_paths):
        self.batch_size = batch_size  # 批量大小
        self.img_size = img_size  # 图像大小
        self.input_img_paths = input_img_paths  # 输入图像路径
        #self.target_img_paths = target_img_paths  # 标注图像路径
        #self.on_epoch_end()

    def __len__(self):
        # 计算迭代次数
        return len(self.input_img_paths) // self.batch_size

    def __getitem__(self, idx):
        """
        获取每一个batch数据
        """
        i = idx * self.batch_size
        # 获取输入的图像数据
        batch_input_img_paths = self.input_img_paths[i: i + self.batch_size]
        # 获取标签数据
        #batch_target_img_paths = self.target_img_paths[i: i + self.batch_size]
        # 构建特征值数据：获取图像数据中每个像素的数据存储在x中
        x = np.zeros((self.batch_size,) + self.img_size + (8,), dtype="float32")
        #x = np.zeros((batch_size,) + self.img_size + (1,), dtype="float32")
        for j, path in enumerate(batch_input_img_paths):
            #img = load_img(path, target_size=self.img_size)
            img = np.load(path)['arr_0']
            img = np.array(img)
            x[j] = img
        return x
    def on_epoch_end(self):
        self.indexes = np.arange(len(self.input_img_paths))  

with open(model_path, 'rb') as f:
    engine_data = f.read()
#print(engine_data)
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
trt.init_libnvinfer_plugins(TRT_LOGGER, '')
runtime = trt.Runtime(TRT_LOGGER)

engine = runtime.deserialize_cuda_engine(engine_data)

# 创建执行上下文
context = engine.create_execution_context()

# 分配内存
# 创建输入和输出缓冲区
# 分配输入和输出内存
input_shape = (1, 512, 512, 3)  # 输入数据的形状 如果是三通道（1，3，512，512）
output_shape = (1, 512,512,3)  # 输出数据的形状 如果是三通道（1，3，512，512）




input_dir = "./data"
input_img_paths = sorted(
        [
            os.path.join(input_dir, fname)
            for fname in os.listdir(input_dir)
            if fname.endswith(".npz")
        ]
    )
val_input_img_paths = input_img_paths[:]
data = OxfordPets1(1, (512,512), val_input_img_paths)
data=np.array(data,dtype = 'float32')
print(data.shape)
data = data.reshape(200,512,512,8)
out_position = './result'
if not os.path.exists(out_position):
    os.makedirs(out_position)
T1 = time.time()
for index in range(200):
    input_data = data[index].reshape((1,512,512, 3,)).astype(np.float32) 

    output_data = np.empty(output_shape, dtype=np.float32)
    # 在GPU上分配内存
    d_input = cuda.mem_alloc(input_data.nbytes)
    d_output = cuda.mem_alloc(output_data.nbytes)
    # 创建CUDA流
    stream = cuda.Stream()

    # 将输入数据从主机内存复制到GPU内存
    cuda.memcpy_htod_async(d_input, input_data, stream)

    # 执行TensorRT推理
    #T1 = time.time()
    bindings = [int(d_input), int(d_output)]
    stream_handle = stream.handle
    context.execute_async_v2(bindings=bindings, stream_handle=stream_handle)

    # 将输出数据从GPU内存复制到主机内存
    cuda.memcpy_dtoh_async(output_data, d_output, stream)

    # 等待推理完成
    stream.synchronize()

    cv2.imwrite(out_position+'/1_'+str(index+1)+'.tiff',output_data[0,:,:,0])
   
T2 = time.time()
print('程序运行时间:%s秒' % ((T2 - T1)))
# 打印输出结果
print((output_data.shape))

未来人生爱美食

关注

8
点赞
踩
13

收藏

觉得还不错? 一键收藏
1
评论
Tensorrt的安装、模型转换、推理demo编写

Tensorrt的安装；如果单纯考虑是调用该库的话，python下的话，直接用pip install tensorrt便可；若是为了使用trtexec将onnx转为trt或者engine的话，需要手动来安装了。首先要依据你的cuda版本来确定Tensorrt的版本来安装；找到Tensorrt页面选择相应的版本，本文是选择的Tensorrt8.5.1.7进行安装的。至此，可以将onnx模型转换为Tensorrt模型了;该版本推理需要安装Tensorrt与pycuda两个库；一、Tensorrt的安装。
复制链接

扫一扫