TensorRT部署任意深度学习模型(从安装到推理程序一文全解析)
深度网络TensorRT部署路线
本文详细介绍了TensorRT推理加速任意到深度学习模型,从环境依赖的安装到推理程序的编写,全流程讲解,具体的相关代码会在下面写出,也同步开源了github,链接为: 。本文旨在和大家学习交流,帮助需要快速入门部署的朋友快速上手。欢迎大家批评指正。
环境依赖的配置
CUDA、cuDNN、TensorRT的安装
安装Nvidia驱动
- 添加源
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
- 检查可安装驱动
ubuntu-drivers devices
- 自动安装(推荐)
sudo ubuntu-drivers autoinstall
- dasdasd
安装指定版本
sudo apt install nvidia-driver-XXX
- dasd
安装CUDA Toolkit
- 降gcc和g++版本(由于ubuntu20.04默认gcc和g++太高)
sudo apt-get install gcc-7 g++-7
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-7 9
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-9 1
sudo update-alternatives --display gcc
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-7 9
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-9 1
sudo update-alternatives --display g++
- 官网下载对应版本的CUDA Toolkit(以11.8为例)
https://developer.nvidia.com/cuda-toolkit-archive
选项如下:
执行如下命令:
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
sudo sh cuda_11.8.0_520.61.05_linux.run
可以看到如下(输入accept)
之后如下(取消勾选Driver),回车Install:
- 添加环境变量
输入如下命令:
vim ~/.bashrc
加入如下文本:
export CUDA_HOME=/usr/local/cuda-11.8
export PATH=/usr/local/cuda-11.8/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64:$LD_LIBRARY_PATH
刷新环境变量
source ~/.bashrc
- 检查CUDA安装是否正确:
输入如下命令:
nvcc -V
出现如下信息则安装正确:
- dasds
安装cuDNN
cuDNN比CUDA安装简单,下载对应版本的压缩包,拷贝到指定目录,赋予权限即可。
- 官网下载压缩包并解压
https://developer.nvidia.com/cudnn
tar -xzvf 你下载的压缩包
- 复制一些头文件
sudo cp cuda/include/cudnn.h /usr/local/cuda-10.1/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda-10.1/lib64
sudo chmod a+r /usr/local/cuda-10.1/include/cudnn.h
sudo chmod a+r /usr/local/cuda-10.1/lib64/libcudnn*
- 检查是否安装成功
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
出现如下图所示,则安装成功:
如图,此版本就是cuDNN 7.5.0
PS:对于新版的cuDNN,可能没反应,执行如下命令即可:
sudo cp cuda/include/cudnn* /usr/local/cuda/include
cat cudnn_version.h | grep CUDNN_MAJOR -A 2
- dasdasdasd
安装TensorRT
- 官网查询合适的版本
https://docs.nvidia.com/deeplearning/tensorrt/archives/index.html#overview
具体操作如下图所示:
- 下载TensorRT(这里主要介绍tar包的安装方式)
https://developer.nvidia.com/tensorrt/download
- 解压下载的TensorRT压缩包
tar -xzvf xxxxxxxx.tar.gz
- 给TensorRT文件夹赋予权限
sudo chomd -R 777 /your/path/to/TensorRT
- 添加环境变量
vim ~/.bashrc
加入如下文本
export LD_LIBRARY_PATH=/your_path/TensorRT-8.6.1.6/lib:$LD_LIBRARY_PATH
export LIBRARY_PATH=/your_path/TensorRT-8.6.1.6/lib:$LIBRARY_PATH
注意:这里的路径改成你的TensorRT路径
- 刷新环境变量
source ~/.bashrc
- 安装python版
到 TensorRT-8.6.1.6/python 目录下,安装TensorRT, 根据自己的python版本选择。
cd /Path/to/TensorRT-8.6.1.6/python
python3 -m pip install tensorrt-*-cp3x-none-linux_x86_64.whl
其中 * 在这里是tensor版本号, cp3x根据自己的python选择
- 安装UFF(支持tensorflow模型转化)
cd /Path/to/TensorRT-8.6.1.6/uff
python3 -m pip install uff-0.6.9-py2.py3-none-any.whl
- 安装graphsurgeon(支持自定义结构)
cd /Path/to/TensorRT-8.6.1.6/graphsurgeon
pip install graphsurgeon-0.4.6-py2.py3-none-any.whl
- 为了避免其它软件找不到 TensorRT 的库,建议把 TensorRT 的库和头文件添加到系统路径下
sudo cp -r ./lib/* /usr/lib
sudo cp -r ./include/* /usr/include
- 进入python测试一下
import tensorrt as trt
print(trt.__version__)
- 例程测试
cd /Path/to/TensorRT-8.6.1.6/samples/sampleOnnxMNIST/
make -j8
cd ../../bin
./sample_onnx_mnist
出现如下则成功:
- sda
torch模型——>ONNX——>engine或者trt
下面是Pytorch模型转换成ONNX,再转换成TensorRT的Engine文件。因为TensorRT是需要Engine文件才能实现推理。
!!!注意!!!:Pytorch模型转ONNX可以在任何机器转换,但是ONNX转TensorRT的Engine文件必须在要推理的机器转换,否则不兼容。
Pytorch模型转换成ONNX
# 导包
import torch
import argparse
from torch.autograd import Variable
from torch.utils.data import DataLoader
from tqdm import tqdm
import threading
from dataset import *
import time
from collections import OrderedDict
from model.SCTransNet import SCTransNet as SCTransNet
# from loss import *
import model.Config as config
import numpy as np
import torch
from skimage import measure
import torchvision
from PIL import ImageOps, Image
# 有 GPU 就用 GPU,没有就用 CPU
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print('device', device)
# 载入模型
# model = models.resnet18(pretrained=True)
config_vit = config.get_SCTrans_config()
# net = SCTransNet(config_vit, mode='test', deepsuper=True)
net = SCTransNet(config_vit, mode='test', deepsuper=True).cuda()
state_dict = torch.load("SCTransNet_NUAA_NUDT_IRSTD1K.pth.tar")
# state_dict = torch.load(opt.pth_dir, map_location='cpu')
new_state_dict = OrderedDict()
for k, v in state_dict['state_dict'].items():
name = k[6:] # remove `module.`?????7?key?????????????????module.
new_state_dict[name] = v # ????key????value????????
net.load_state_dict(new_state_dict)
net.eval()
# model = torch.load("SCTransNet_NUAA_NUDT_IRSTD1K.pth.tar")
# model = model.eval().to(device)
# 构造一个输入
x = torch.randn(3, 1, 256, 256).to(device)
# 推理一下试试
# output = net(x)
# print(output.shape)
# Pytorch模型转ONNX格式
with torch.no_grad():
torch.onnx.export(
net, # 要转换的模型
x, # 模型的任意一组输入
'SCTransNet_NUAA_NUDT_IRSTD1K_batch_3.onnx', # 导出的 ONNX 文件名
opset_version=11, # ONNX 算子集版本
input_names=['input'], # 输入 Tensor 的名称(自己起名字)
output_names=['output'] # 输出 Tensor 的名称(自己起名字)
)
# 验证onnx模型导出成功
import onnx
# 读取 ONNX 模型
onnx_model = onnx.load('SCTransNet_NUAA_NUDT_IRSTD1K_batch_3.onnx')
# 检查模型格式是否正确
onnx.checker.check_model(onnx_model)
print('无报错,onnx模型载入成功')
#
# # 打印计算图
# print(onnx.helper.printable_graph(onnx_model.graph))
!!!注意!!!:这里必须构造一个输入才可以进行ONNX转换。
可能出现的错误
- RuntimeError: Exporting the operator var to ONNX opset version 11 is not supported. Please open a bug to request ONNX export support for the missing operator.
解决方法:
在/anaconda3/envs/torch1.7/lib/python3.7/site-packages/torch/onnx/symbolic_opset11.py加入以下代码段:
# This file exports ONNX ops for opset 11
import functools
import math
import sys
import warnings
from typing import List, Optional, Tuple, Union
import torch
import torch._C._onnx as _C_onnx
import torch.nn.modules.utils
import torch.onnx
from torch import _C
# Monkey-patch graph manipulation methods on Graph, used for the ONNX symbolics
from torch.onnx import symbolic_helper
@symbolic_helper.parse_args("v", "is", "i", "i")
def _var_mean(g, input, dim, correction, keepdim):
if dim is None:
mean = g.op("ReduceMean", input, keepdims_i=0)
t_mean = mean
num_elements = numel(g, input)
else:
mean = g.op("ReduceMean", input, axes_i=dim, keepdims_i=keepdim)
t_mean = g.op("ReduceMean", input, axes_i=dim, keepdims_i=1)
redudced_dims = g.op("Shape", input)
# dim could contain one or multiple dimensions
redudced_dims = g.op(
"Gather",
redudced_dims,
g.op("Constant", value_t=torch.tensor(dim)),
axis_i=0,
)
num_elements = g.op("ReduceProd", redudced_dims, keepdims_i=0)
sub_v = g.op("Sub", input, t_mean)
sqr_sub = g.op("Mul", sub_v, sub_v)
keepdim_mean = 0 if dim is None else keepdim
var = g.op("ReduceMean", sqr_sub, axes_i=dim, keepdims_i=keepdim_mean)
# Correct bias in calculating variance, by dividing it over (N - correction) instead on N
if correction is None:
correction = 1
if correction != 0:
num_elements = g.op(
"Cast", num_elements, to_i=symbolic_helper.cast_pytorch_to_onnx["Float"]
)
one = g.op("Constant", value_t=torch.tensor(correction, dtype=torch.float))
mul = g.op("Mul", var, num_elements)
var = g.op("Div", mul, g.op("Sub", num_elements, one))
return var, mean
def std(g, input, *args):
var, _ = var_mean(g, input, *args)
return g.op("Sqrt", var)
def var(g, input, *args):
var, _ = var_mean(g, input, *args)
return var
# var_mean (and all variance-related functions) has multiple signatures, so need to manually figure
# out the correct arguments:
# aten::var_mean(Tensor self, bool unbiased)
# aten::var_mean(Tensor self, int[1] dim, bool unbiased, bool keepdim=False)
# aten::var_mean(Tensor self, int[1]? dim=None, *, int? correction=None, bool keepdim=False)
def var_mean(g, input, *args):
if len(args) == 1:
return _var_mean(g, input, None, args[0], None)
else:
return _var_mean(g, input, *args)
def std_mean(g, input, *args):
var, mean = var_mean(g, input, *args)
return g.op("Sqrt", var), mean
ONNX转换为TensorRT的Engine文件
进入TensorRT-8.6.1.6/bin目录下,执行如下命令:
./trtexec --onnx=path/model.onnx --saveEngine=path/resnet_engine_intro.trt --explicitBatch
推理程序(Python版本)
推理程序如下,这个是本人整合之后的一个通用版本。这个程序运行成功则推理成功,但是前处理和后处理需要再加上,后文会举例讲。
import torch
import tensorrt as trt
from collections import OrderedDict, namedtuple
import numpy as np
trt.init_libnvinfer_plugins(None, "")
def infer(img_data, engine_path):
# 1.日志器
logger = trt.Logger(trt.Logger.INFO)
# 2.runtime加载trt engine model
runtime = trt.Runtime(logger)
trt.init_libnvinfer_plugins(logger, '') # initialize TensorRT plugins
with open(engine_path, "rb") as f:
serialized_engine = f.read()
engine = runtime.deserialize_cuda_engine(serialized_engine)
# 3.绑定输入输出
bindings = OrderedDict()
Binding = namedtuple('Binding', ('name', 'dtype', 'shape', 'data', 'ptr'))
fp16 = False
for index in range(engine.num_bindings):
name = engine.get_binding_name(index)
dtype = trt.nptype(engine.get_binding_dtype(index))
shape = tuple(engine.get_binding_shape(index))
data = torch.from_numpy(np.empty(shape, dtype=np.dtype(dtype))).to('cuda')
# Tensor.data_ptr 该tensor首个元素的地址即指针,为int类型
bindings[name] = Binding(name, dtype, shape, data, int(data.data_ptr()))
if engine.binding_is_input(index) and dtype == np.float16:
fp16 = True
# 记录输入输出的指针地址
binding_addrs = OrderedDict((n, d.ptr) for n, d in bindings.items())
# 4.加载数据,绑定数据,并推理,将推理的结果放入到
context = engine.create_execution_context()
binding_addrs['images'] = int(img_data.data_ptr())
context.execute_v2(list(binding_addrs.values()))
# 5.获取结果((根据导出onnx模型时设置的输入输出名字获取)
output_data = bindings['output'].data
return output_data
if __name__ == '__main__':
img_data = torch.rand(20, 1, 256, 256).to('cuda')
engine_path = "/home/shineber/zhoushenbo/competition/onnx/SCTransNet_NUAA_NUDT_IRSTD1K_batch.trt"
res = infer(img_data, engine_path)
# print(res)
print(res.shape)
print("Inference Success")
注意:这里的’input’和’output’需要改成你转的ONNX中的定义。如果不知道ONNX里面是什么键值,可以用下面的程序看一下。
查看ONNX键值的程序
import tensorrt as trt
trt.init_libnvinfer_plugins(None, "")
def load_engine_and_print_bindings(engine_path):
# 创建一个TensorRT日志器对象
logger = trt.Logger(trt.Logger.INFO)
# 创建一个TensorRT运行时
runtime = trt.Runtime(logger)
# 从文件加载TensorRT引擎
with open(engine_path, "rb") as f:
serialized_engine = f.read()
engine = runtime.deserialize_cuda_engine(serialized_engine)
# 打印出所有绑定的名称
for index in range(engine.num_bindings):
# 获取绑定名称
binding_name = engine.get_binding_name(index)
# 获取绑定数据类型
binding_dtype = engine.get_binding_dtype(index)
# 获取绑定形状
binding_shape = engine.get_binding_shape(index)
print(f"Binding index: {index}")
print(f" Name: {binding_name}")
print(f" DataType: {trt.nptype(binding_dtype)}")
print(f" Shape: {binding_shape}")
if __name__ == "__main__":
engine_path = "/home/shineber/zhoushenbo/competition/onnx/SCTransNet_NUAA_NUDT_IRSTD1K.trt" # 替换为您的TensorRT引擎文件路径
load_engine_and_print_bindings(engine_path)
加入前处理和后处理的一个例子(上面的通用推理没有包括前处理和后处理)
import time
import torch
import tensorrt as trt
from collections import OrderedDict, namedtuple
import numpy as np
# from torch.utils.data.dataset import Dataset
# from dataset import TestSetLoader
# from torch.utils.data import DataLoader
# import torchvision
# from PIL import ImageOps, Image
from tqdm import tqdm
import os
import cv2
import torchvision.transforms.functional as F
import threading
import queue
import csv
trt.init_libnvinfer_plugins(None, "")
# def open_csv(q,q2):
# while not q.empty():
# item = q.get()
# f = open(item, 'w', newline='', encoding='utf-8')
# q2.put(f)
def open_csv(q, q2):
while True:
item = q.get() # 获取队列中的文件路径
if item is None:
break # 如果接收到None,退出线程
f = open(item, 'w', newline='', encoding='utf-8')
q2.put(f) # 将打开的文件对象放入另一个队列
def create_bbox(image, threhold):
_, binary_image = cv2.threshold(image[0], threhold, 255, cv2.THRESH_BINARY)
contours, _ = cv2.findContours(binary_image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
score = 0
x1, y1, w1, h1 = 0, 0, 0, 0
if contours:
for i, contour in enumerate(contours):
# get bbox
x, y, w, h = cv2.boundingRect(contour)
score1 = image[0, y:y + h, x:x + w].sum()
if score1 > score:
score = score1
x1, y1, w1, h1 = x, y, w, h
return x1, y1, w1, h1
def create_bbox_invert(image, threhold):
_, binary_image = cv2.threshold(image[0], threhold, 255, cv2.THRESH_BINARY)
contours, _ = cv2.findContours(binary_image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
if contours:
for i, contour in enumerate(contours):
# get bbox
x, y, w, h = cv2.boundingRect(contour)
if w*h > 400:
image[0,y:y+h,x:x+w] = 0
return image
def Normalized(img, img_norm_cfg):
return (img - img_norm_cfg['mean']) / img_norm_cfg['std']
config_img = {'mean': 49.8, 'std': 21.2748}
if __name__ == '__main__':
# 创建队列与进程
# q = queue.Queue()
# q2 = queue.Queue()
#
# thread_open = threading.Thread(target=open_csv, args=(q,q2,))
#
# thread_open.start()
### 加载模型
engine_path = "/home/shineber/zhoushenbo/competition_final/onnx/SCTransNet_NUAA_NUDT_IRSTD1K_batch_3_fp16.trt"
# 1.日志器
logger = trt.Logger(trt.Logger.INFO)
# 2.runtime加载trt engine model
runtime = trt.Runtime(logger)
trt.init_libnvinfer_plugins(logger, '') # initialize TensorRT plugins
with open(engine_path, "rb") as f:
serialized_engine = f.read()
engine = runtime.deserialize_cuda_engine(serialized_engine)
# 3.绑定输入输出
bindings = OrderedDict()
Binding = namedtuple('Binding', ('name', 'dtype', 'shape', 'data', 'ptr'))
fp16 = False
for index in range(engine.num_bindings):
name = engine.get_binding_name(index)
dtype = trt.nptype(engine.get_binding_dtype(index))
shape = tuple(engine.get_binding_shape(index))
data = torch.from_numpy(np.empty(shape, dtype=np.dtype(dtype))).to('cuda')
# Tensor.data_ptr 该tensor首个元素的地址即指针,为int类型
bindings[name] = Binding(name, dtype, shape, data, int(data.data_ptr()))
if engine.binding_is_input(index) and dtype == np.float16:
fp16 = True
context = engine.create_execution_context()
### 加载结束
### 读取图片
data_dir = '/home/shineber/zhoushenbo/competition/dataB'
save_dir = '/home/shineber/zhoushenbo/competition/dataC'
filename000 = []
while True:
file_list = os.listdir(data_dir)
if len(file_list) > 0:
file_list = [file for file in file_list if file != filename000]
if len(file_list) != 0:
if len(file_list[0].split('.')) > 1:
base_results_path = os.path.join(save_dir,
'{}'.format(file_list[0].split('.')[0]))
bbox_file = '{}.csv'.format(base_results_path)
# q.put(bbox_file)
# thread_open.join()
time1 = time.time()
img_path = os.path.join(data_dir,file_list[0])
### 获取到图片
### 预处理
img000 = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
if img000 is not None:
size = img000.shape
img = cv2.resize(img000, (384, 384)) ##fast
x, y, w, h = 64, 64, 256, 256
img = img[y:y + h, x:x + w]
img_raw = img / img.max() * 255
img_invert = img_raw.max() - img_raw
img_gauss = cv2.GaussianBlur(img_raw, (5, 5), 0) ##fast
cat_img = np.concatenate([img_raw[np.newaxis, np.newaxis, ...], img_gauss[np.newaxis, np.newaxis, ...],
img_invert[np.newaxis, np.newaxis, ...]], axis=0)
cat_img = Normalized(cat_img, config_img) ##fast
img_gauss = torch.Tensor(cat_img) ##fast
### 预处理结束
### 执行推理
binding_addrs = OrderedDict((n, d.ptr) for n, d in bindings.items())
# 4.加载数据,绑定数据,并推理,将推理的结果放入到
binding_addrs['input'] = int(img_gauss.cuda().data_ptr())
context.execute_v2(list(binding_addrs.values()))
pred = bindings['output'].data.cpu()
f = open(bbox_file, 'w', newline='', encoding='utf-8')
pred = np.array(pred*255,dtype=np.uint8)
###### 后处理
pred[2] = create_bbox_invert(pred[2], 5)
pred_ = np.array((pred[0] + pred[1]) // 2, dtype=np.uint8)
x, y, w, h = create_bbox(pred_, 5)
pred[2, y - 5:y + h + 5, x - 5:x + w + 5] = 0
pred[2, y:y + h, x:x + w] = 255
pred_ = np.array(pred.mean(axis=0), dtype=np.uint8)
x, y, w, h = create_bbox(pred_, 5)
x = int(np.round((x+64)/3*5))
y = int(np.round((y+64)/3*4))
w = int(np.round(w/3*5))
h = int(np.round(h/3*4))
x = x+w//2
y = y+h//2
output = ['X', 'X', 'X', x, y]
writer = csv.writer(f)
writer.writerow(output)
f.close()
filename000 = file_list[0]
print(time.time()-time1)