TensorRT部署任意深度学习模型(从安装到推理程序一文全解析)

深度网络TensorRT部署路线

本文详细介绍了TensorRT推理加速任意到深度学习模型,从环境依赖的安装到推理程序的编写,全流程讲解,具体的相关代码会在下面写出,也同步开源了github,链接为: 。本文旨在和大家学习交流,帮助需要快速入门部署的朋友快速上手。欢迎大家批评指正。

环境依赖的配置

CUDA、cuDNN、TensorRT的安装

安装Nvidia驱动
  • 添加源
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
  • 检查可安装驱动
ubuntu-drivers devices
  • 自动安装(推荐)
sudo ubuntu-drivers autoinstall
  • dasdasd
    安装指定版本
sudo apt install nvidia-driver-XXX
  • dasd
安装CUDA Toolkit
  • 降gcc和g++版本(由于ubuntu20.04默认gcc和g++太高)
sudo apt-get install gcc-7 g++-7
 
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-7 9
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-9 1
 
sudo update-alternatives --display gcc
 
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-7 9
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-9 1
 
sudo update-alternatives --display g++

执行如下命令:

wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
sudo sh cuda_11.8.0_520.61.05_linux.run

可以看到如下(输入accept)
在这里插入图片描述

之后如下(取消勾选Driver),回车Install:
在这里插入图片描述

  • 添加环境变量

输入如下命令:

vim ~/.bashrc

加入如下文本:

export CUDA_HOME=/usr/local/cuda-11.8
export PATH=/usr/local/cuda-11.8/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64:$LD_LIBRARY_PATH

刷新环境变量

source ~/.bashrc
  • 检查CUDA安装是否正确:

输入如下命令:

nvcc -V

出现如下信息则安装正确:
在这里插入图片描述

  • dasds
安装cuDNN

cuDNN比CUDA安装简单,下载对应版本的压缩包,拷贝到指定目录,赋予权限即可。

tar -xzvf 你下载的压缩包
  • 复制一些头文件
sudo cp cuda/include/cudnn.h /usr/local/cuda-10.1/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda-10.1/lib64
sudo chmod a+r /usr/local/cuda-10.1/include/cudnn.h 
sudo chmod a+r /usr/local/cuda-10.1/lib64/libcudnn*
  • 检查是否安装成功
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

出现如下图所示,则安装成功:
在这里插入图片描述
如图,此版本就是cuDNN 7.5.0

PS:对于新版的cuDNN,可能没反应,执行如下命令即可:

sudo cp cuda/include/cudnn* /usr/local/cuda/include
cat cudnn_version.h | grep CUDNN_MAJOR -A 2
  • dasdasdasd
安装TensorRT
tar -xzvf xxxxxxxx.tar.gz
  • 给TensorRT文件夹赋予权限
sudo chomd -R 777 /your/path/to/TensorRT
  • 添加环境变量
vim ~/.bashrc

加入如下文本

export LD_LIBRARY_PATH=/your_path/TensorRT-8.6.1.6/lib:$LD_LIBRARY_PATH
export LIBRARY_PATH=/your_path/TensorRT-8.6.1.6/lib:$LIBRARY_PATH

注意:这里的路径改成你的TensorRT路径

  • 刷新环境变量
source ~/.bashrc
  • 安装python版

到 TensorRT-8.6.1.6/python 目录下,安装TensorRT, 根据自己的python版本选择。

cd /Path/to/TensorRT-8.6.1.6/python
python3 -m pip install tensorrt-*-cp3x-none-linux_x86_64.whl

其中 * 在这里是tensor版本号, cp3x根据自己的python选择

  • 安装UFF(支持tensorflow模型转化)
cd /Path/to/TensorRT-8.6.1.6/uff
python3 -m pip install uff-0.6.9-py2.py3-none-any.whl
  • 安装graphsurgeon(支持自定义结构)
cd /Path/to/TensorRT-8.6.1.6/graphsurgeon
pip install graphsurgeon-0.4.6-py2.py3-none-any.whl
  • 为了避免其它软件找不到 TensorRT 的库,建议把 TensorRT 的库和头文件添加到系统路径下
sudo cp -r ./lib/* /usr/lib
sudo cp -r ./include/* /usr/include
  • 进入python测试一下
import tensorrt as trt
print(trt.__version__)
  • 例程测试
cd /Path/to/TensorRT-8.6.1.6/samples/sampleOnnxMNIST/
make -j8
cd ../../bin
./sample_onnx_mnist

出现如下则成功:
在这里插入图片描述

  • sda

torch模型——>ONNX——>engine或者trt

下面是Pytorch模型转换成ONNX,再转换成TensorRT的Engine文件。因为TensorRT是需要Engine文件才能实现推理。
!!!注意!!!:Pytorch模型转ONNX可以在任何机器转换,但是ONNX转TensorRT的Engine文件必须在要推理的机器转换,否则不兼容。

Pytorch模型转换成ONNX

# 导包
import torch
import argparse
from torch.autograd import Variable
from torch.utils.data import DataLoader
from tqdm import tqdm
import threading
from dataset import *
import time
from collections import OrderedDict
from model.SCTransNet import SCTransNet as SCTransNet
# from loss import *
import model.Config as config
import numpy as np
import torch
from skimage import measure
import torchvision
from PIL import ImageOps, Image
# 有 GPU 就用 GPU,没有就用 CPU
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print('device', device)

# 载入模型
# model = models.resnet18(pretrained=True)
config_vit = config.get_SCTrans_config()
    # net = SCTransNet(config_vit, mode='test', deepsuper=True)
net = SCTransNet(config_vit, mode='test', deepsuper=True).cuda()
state_dict = torch.load("SCTransNet_NUAA_NUDT_IRSTD1K.pth.tar")
    # state_dict = torch.load(opt.pth_dir, map_location='cpu')
new_state_dict = OrderedDict()
for k, v in state_dict['state_dict'].items():
    name = k[6:]  # remove `module.`?????7?key?????????????????module.
    new_state_dict[name] = v  # ????key????value????????
net.load_state_dict(new_state_dict)
net.eval()

# model = torch.load("SCTransNet_NUAA_NUDT_IRSTD1K.pth.tar")
# model = model.eval().to(device)

# 构造一个输入
x = torch.randn(3, 1, 256, 256).to(device)

# 推理一下试试
# output = net(x)
# print(output.shape)

# Pytorch模型转ONNX格式
with torch.no_grad():
    torch.onnx.export(
        net,                       # 要转换的模型
        x,                           # 模型的任意一组输入
        'SCTransNet_NUAA_NUDT_IRSTD1K_batch_3.onnx',    # 导出的 ONNX 文件名
        opset_version=11,            # ONNX 算子集版本
        input_names=['input'],       # 输入 Tensor 的名称(自己起名字)
        output_names=['output']      # 输出 Tensor 的名称(自己起名字)
    )

# 验证onnx模型导出成功
import onnx
# 读取 ONNX 模型
onnx_model = onnx.load('SCTransNet_NUAA_NUDT_IRSTD1K_batch_3.onnx')
# 检查模型格式是否正确
onnx.checker.check_model(onnx_model)
print('无报错,onnx模型载入成功')
#
# # 打印计算图
# print(onnx.helper.printable_graph(onnx_model.graph))

!!!注意!!!:这里必须构造一个输入才可以进行ONNX转换。

可能出现的错误
  • RuntimeError: Exporting the operator var to ONNX opset version 11 is not supported. Please open a bug to request ONNX export support for the missing operator.

解决方法:
在/anaconda3/envs/torch1.7/lib/python3.7/site-packages/torch/onnx/symbolic_opset11.py加入以下代码段:


# This file exports ONNX ops for opset 11
import functools
import math
import sys
import warnings
from typing import List, Optional, Tuple, Union

import torch
import torch._C._onnx as _C_onnx
import torch.nn.modules.utils
import torch.onnx
from torch import _C

# Monkey-patch graph manipulation methods on Graph, used for the ONNX symbolics
from torch.onnx import symbolic_helper


@symbolic_helper.parse_args("v", "is", "i", "i")
def _var_mean(g, input, dim, correction, keepdim):
    if dim is None:
        mean = g.op("ReduceMean", input, keepdims_i=0)
        t_mean = mean
        num_elements = numel(g, input)
    else:
        mean = g.op("ReduceMean", input, axes_i=dim, keepdims_i=keepdim)
        t_mean = g.op("ReduceMean", input, axes_i=dim, keepdims_i=1)
        redudced_dims = g.op("Shape", input)
        # dim could contain one or multiple dimensions
        redudced_dims = g.op(
            "Gather",
            redudced_dims,
            g.op("Constant", value_t=torch.tensor(dim)),
            axis_i=0,
        )
        num_elements = g.op("ReduceProd", redudced_dims, keepdims_i=0)
    sub_v = g.op("Sub", input, t_mean)
    sqr_sub = g.op("Mul", sub_v, sub_v)
    keepdim_mean = 0 if dim is None else keepdim
    var = g.op("ReduceMean", sqr_sub, axes_i=dim, keepdims_i=keepdim_mean)
    # Correct bias in calculating variance, by dividing it over (N - correction) instead on N
    if correction is None:
        correction = 1
    if correction != 0:
        num_elements = g.op(
            "Cast", num_elements, to_i=symbolic_helper.cast_pytorch_to_onnx["Float"]
        )
        one = g.op("Constant", value_t=torch.tensor(correction, dtype=torch.float))
        mul = g.op("Mul", var, num_elements)
        var = g.op("Div", mul, g.op("Sub", num_elements, one))
    return var, mean


def std(g, input, *args):
    var, _ = var_mean(g, input, *args)
    return g.op("Sqrt", var)


def var(g, input, *args):
    var, _ = var_mean(g, input, *args)
    return var

# var_mean (and all variance-related functions) has multiple signatures, so need to manually figure
# out the correct arguments:
# aten::var_mean(Tensor self, bool unbiased)
# aten::var_mean(Tensor self, int[1] dim, bool unbiased, bool keepdim=False)
# aten::var_mean(Tensor self, int[1]? dim=None, *, int? correction=None, bool keepdim=False)
def var_mean(g, input, *args):
    if len(args) == 1:
        return _var_mean(g, input, None, args[0], None)
    else:
        return _var_mean(g, input, *args)


def std_mean(g, input, *args):
    var, mean = var_mean(g, input, *args)
    return g.op("Sqrt", var), mean

ONNX转换为TensorRT的Engine文件

进入TensorRT-8.6.1.6/bin目录下,执行如下命令:

./trtexec --onnx=path/model.onnx --saveEngine=path/resnet_engine_intro.trt  --explicitBatch

推理程序(Python版本)

推理程序如下,这个是本人整合之后的一个通用版本。这个程序运行成功则推理成功,但是前处理和后处理需要再加上,后文会举例讲。

import torch
import tensorrt as trt
from collections import OrderedDict, namedtuple
import numpy as np

trt.init_libnvinfer_plugins(None, "")
def infer(img_data, engine_path):
    # 1.日志器
    logger = trt.Logger(trt.Logger.INFO)
    # 2.runtime加载trt engine model
    runtime = trt.Runtime(logger)
    trt.init_libnvinfer_plugins(logger, '')  # initialize TensorRT plugins
    with open(engine_path, "rb") as f:
        serialized_engine = f.read()
    engine = runtime.deserialize_cuda_engine(serialized_engine)

    # 3.绑定输入输出
    bindings = OrderedDict()
    Binding = namedtuple('Binding', ('name', 'dtype', 'shape', 'data', 'ptr'))
    fp16 = False
    for index in range(engine.num_bindings):
        name = engine.get_binding_name(index)
        dtype = trt.nptype(engine.get_binding_dtype(index))
        shape = tuple(engine.get_binding_shape(index))
        data = torch.from_numpy(np.empty(shape, dtype=np.dtype(dtype))).to('cuda')
        # Tensor.data_ptr 该tensor首个元素的地址即指针,为int类型
        bindings[name] = Binding(name, dtype, shape, data, int(data.data_ptr()))
        if engine.binding_is_input(index) and dtype == np.float16:
            fp16 = True
    # 记录输入输出的指针地址
    binding_addrs = OrderedDict((n, d.ptr) for n, d in bindings.items())

    # 4.加载数据,绑定数据,并推理,将推理的结果放入到
    context = engine.create_execution_context()
    binding_addrs['images'] = int(img_data.data_ptr())
    context.execute_v2(list(binding_addrs.values()))

    # 5.获取结果((根据导出onnx模型时设置的输入输出名字获取)
    output_data = bindings['output'].data
    return output_data


if __name__ == '__main__':
    img_data = torch.rand(20, 1, 256, 256).to('cuda')
    engine_path = "/home/shineber/zhoushenbo/competition/onnx/SCTransNet_NUAA_NUDT_IRSTD1K_batch.trt"
    res = infer(img_data, engine_path)
        # print(res)
    print(res.shape)
    print("Inference Success")

注意:这里的’input’和’output’需要改成你转的ONNX中的定义。如果不知道ONNX里面是什么键值,可以用下面的程序看一下。

查看ONNX键值的程序

import tensorrt as trt
trt.init_libnvinfer_plugins(None, "")
def load_engine_and_print_bindings(engine_path):
    # 创建一个TensorRT日志器对象
    logger = trt.Logger(trt.Logger.INFO)

    # 创建一个TensorRT运行时
    runtime = trt.Runtime(logger)

    # 从文件加载TensorRT引擎
    with open(engine_path, "rb") as f:
        serialized_engine = f.read()
    engine = runtime.deserialize_cuda_engine(serialized_engine)

    # 打印出所有绑定的名称
    for index in range(engine.num_bindings):
        # 获取绑定名称
        binding_name = engine.get_binding_name(index)
        # 获取绑定数据类型
        binding_dtype = engine.get_binding_dtype(index)
        # 获取绑定形状
        binding_shape = engine.get_binding_shape(index)

        print(f"Binding index: {index}")
        print(f"  Name: {binding_name}")
        print(f"  DataType: {trt.nptype(binding_dtype)}")
        print(f"  Shape: {binding_shape}")

if __name__ == "__main__":
    engine_path = "/home/shineber/zhoushenbo/competition/onnx/SCTransNet_NUAA_NUDT_IRSTD1K.trt"  # 替换为您的TensorRT引擎文件路径
    load_engine_and_print_bindings(engine_path)

加入前处理和后处理的一个例子(上面的通用推理没有包括前处理和后处理)

import time

import torch
import tensorrt as trt
from collections import OrderedDict, namedtuple
import numpy as np

# from torch.utils.data.dataset import Dataset
# from dataset import TestSetLoader
# from torch.utils.data import DataLoader
# import torchvision
# from PIL import ImageOps, Image

from tqdm import tqdm
import os
import cv2
import torchvision.transforms.functional as F

import threading
import queue
import csv


trt.init_libnvinfer_plugins(None, "")

# def open_csv(q,q2):
#     while not q.empty():
#         item = q.get()
#         f = open(item, 'w', newline='', encoding='utf-8')
#         q2.put(f)
def open_csv(q, q2):
    while True:
        item = q.get()  # 获取队列中的文件路径
        if item is None:
            break  # 如果接收到None,退出线程
        f = open(item, 'w', newline='', encoding='utf-8')
        q2.put(f)  # 将打开的文件对象放入另一个队列

def create_bbox(image, threhold):
    _, binary_image = cv2.threshold(image[0], threhold, 255, cv2.THRESH_BINARY)
    contours, _ = cv2.findContours(binary_image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    score = 0
    x1, y1, w1, h1 = 0, 0, 0, 0
    if contours:
        for i, contour in enumerate(contours):
            # get bbox
            x, y, w, h = cv2.boundingRect(contour)
            score1 = image[0, y:y + h, x:x + w].sum()

            if score1 > score:
                score = score1
                x1, y1, w1, h1 = x, y, w, h
    return x1, y1, w1, h1
def create_bbox_invert(image, threhold):
    _, binary_image = cv2.threshold(image[0], threhold, 255, cv2.THRESH_BINARY)
    contours, _ = cv2.findContours(binary_image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    if contours:
        for i, contour in enumerate(contours):
            # get bbox
            x, y, w, h = cv2.boundingRect(contour)
            if w*h > 400:
                image[0,y:y+h,x:x+w] = 0
    return image
def Normalized(img, img_norm_cfg):
    return (img - img_norm_cfg['mean']) / img_norm_cfg['std']

config_img = {'mean': 49.8, 'std': 21.2748}

if __name__ == '__main__':
    # 创建队列与进程
    # q = queue.Queue()
    # q2 = queue.Queue()
    #
    # thread_open = threading.Thread(target=open_csv, args=(q,q2,))
    #
    # thread_open.start()


    ### 加载模型
    engine_path = "/home/shineber/zhoushenbo/competition_final/onnx/SCTransNet_NUAA_NUDT_IRSTD1K_batch_3_fp16.trt"
    # 1.日志器
    logger = trt.Logger(trt.Logger.INFO)
    # 2.runtime加载trt engine model
    runtime = trt.Runtime(logger)
    trt.init_libnvinfer_plugins(logger, '')  # initialize TensorRT plugins
    with open(engine_path, "rb") as f:
        serialized_engine = f.read()
    engine = runtime.deserialize_cuda_engine(serialized_engine)
    # 3.绑定输入输出
    bindings = OrderedDict()
    Binding = namedtuple('Binding', ('name', 'dtype', 'shape', 'data', 'ptr'))
    fp16 = False
    for index in range(engine.num_bindings):
        name = engine.get_binding_name(index)
        dtype = trt.nptype(engine.get_binding_dtype(index))
        shape = tuple(engine.get_binding_shape(index))
        data = torch.from_numpy(np.empty(shape, dtype=np.dtype(dtype))).to('cuda')
        # Tensor.data_ptr 该tensor首个元素的地址即指针,为int类型
        bindings[name] = Binding(name, dtype, shape, data, int(data.data_ptr()))
        if engine.binding_is_input(index) and dtype == np.float16:
            fp16 = True
    context = engine.create_execution_context()
    ### 加载结束

    ### 读取图片
    data_dir = '/home/shineber/zhoushenbo/competition/dataB'
    save_dir = '/home/shineber/zhoushenbo/competition/dataC'
    filename000 = []
    while True:
        file_list = os.listdir(data_dir)
        if len(file_list) > 0:
            file_list = [file for file in file_list if file != filename000]
        if len(file_list) != 0:
            if len(file_list[0].split('.')) > 1:
                base_results_path = os.path.join(save_dir,
                                                 '{}'.format(file_list[0].split('.')[0]))
                bbox_file = '{}.csv'.format(base_results_path)
                # q.put(bbox_file)
                # thread_open.join()

                time1 = time.time()
                img_path = os.path.join(data_dir,file_list[0])
                ### 获取到图片

                ### 预处理
                img000 = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
                if img000 is not None:

                    size = img000.shape
                    img = cv2.resize(img000, (384, 384))  ##fast
                    x, y, w, h = 64, 64, 256, 256
                    img = img[y:y + h, x:x + w]
                    img_raw = img / img.max() * 255
                    img_invert = img_raw.max() - img_raw
                    img_gauss = cv2.GaussianBlur(img_raw, (5, 5), 0)  ##fast
                    cat_img = np.concatenate([img_raw[np.newaxis, np.newaxis, ...], img_gauss[np.newaxis, np.newaxis, ...],
                                              img_invert[np.newaxis, np.newaxis, ...]], axis=0)
                    cat_img = Normalized(cat_img, config_img)  ##fast
                    img_gauss = torch.Tensor(cat_img)  ##fast
                    ### 预处理结束

                    ### 执行推理
                    binding_addrs = OrderedDict((n, d.ptr) for n, d in bindings.items())

                    # 4.加载数据,绑定数据,并推理,将推理的结果放入到
                    binding_addrs['input'] = int(img_gauss.cuda().data_ptr())
                    context.execute_v2(list(binding_addrs.values()))
                    pred = bindings['output'].data.cpu()
                    f = open(bbox_file, 'w', newline='', encoding='utf-8')
                    pred = np.array(pred*255,dtype=np.uint8)
                    ###### 后处理
                    pred[2] = create_bbox_invert(pred[2], 5)

                    pred_ = np.array((pred[0] + pred[1]) // 2, dtype=np.uint8)
                    x, y, w, h = create_bbox(pred_, 5)

                    pred[2, y - 5:y + h + 5, x - 5:x + w + 5] = 0
                    pred[2, y:y + h, x:x + w] = 255

                    pred_ = np.array(pred.mean(axis=0), dtype=np.uint8)
                    x, y, w, h = create_bbox(pred_, 5)
                    x = int(np.round((x+64)/3*5))
                    y = int(np.round((y+64)/3*4))
                    w = int(np.round(w/3*5))
                    h = int(np.round(h/3*4))
                    x = x+w//2
                    y = y+h//2
                    output = ['X', 'X', 'X', x, y]
                    
                    writer = csv.writer(f)
                    writer.writerow(output)
                    f.close()
                    
                    filename000 = file_list[0]
                    print(time.time()-time1)
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值