pytorch 直接转 tensorrt 的trt文件,并运行,量化int8

这个项目是我使用tensorrt,torch2tr包,将训练的yolov5s模型,进行tensorrt推理加速,量化精度为int8,但值得注意的是tensorrt对于forward的部份操作是不支持的,如切片等,这时可以考虑模型部份转换或者改写forward方法;(关注查看完整代码)

1.tensorrt加速的原理:将conv、bn、relu 和 conv、relu 进行融合,融合为一层,从而减少网络参数;

2.tensorrt对于分支结构加速效果尤为明显,像inception网络等;如分支1、分支2、分支3可能在同一个时间步骤下都包含1*1卷积层,tensorrt会将这三个分支的1*1卷积层合并为一个1*1卷积层,从而达到减少卷积层的目的;

3.tensorrt还可以对模型进行量化;量化到int8、fp16等;

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

yolov5 pt2trt代码示例:

import torch
from torch2trt import torch2trt
from torch2trt import TRTModule
import time


model = torch.load('/home/Oyj/yolov5/yolov5s_trainModel.pt').cuda()


x = torch.rand((1, 3, 608, 608)).cuda()  # 占位符,3通道的608*608尺寸图片,最好选用图片

# convert to TensorRT feeding sample data as input

model_trt = torch2trt(model, [x],int8_mode=True)

#这里首先把pytorch模型加载到CUDA,然后定义好输入的样例x(这里主要用来指定输入的shape,用ones, zeros都可以)。model_trt就是转成功的TensorRT模型,你运行上面代码没报错就证明你转tensorRT成功了。

torch.save(model_trt.state_dict(), 'yolov5s_trt.pth')

model_trt = TRTModule()

model_trt.load_state_dict(torch.load('yolov5s_trt.pth'))

def _make_grid(nx=20, ny=20):
    yv, xv = torch.meshgrid([torch.arange(ny), torch.arange(nx)])
    return torch.stack((xv, yv), 2).view((1, 1, ny, nx, 2)).float()

def detect(pre):
    z = []
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    stride = torch.Tensor([8,16,32]).cuda()
    anchors = torch.Tensor([[[1.25000, 1.87500, 3.62500],
                             [2.00000, 3.87500, 4.87500],
                             [4.12500, 3.68750, 11.65625]],
                            [[1.62500, 3.81250, 2.81250],
                             [3.75000, 2.81250, 6.18750],
                             [2.87500, 7.43750, 10.18750]]]).cuda()
    anchor_grid = torch.Tensor([[[[[[ 10.,  13.]]],
                                  [[[ 16.,  30.]]],
                                  [[[ 33.,  23.]]]]],
                                [[[[[ 30.,  61.]]],
                                  [[[ 62.,  45.]]],
                                  [[[ 59., 119.]]]]],
                                [[[[[116.,  90.]]],
                                  [[[156., 198.]]],
                                  [[[373., 326.]]]]]]).cuda()
    no = 28

    for i in range(len(pre)):
        # print(pre[0].shape)
        # bs, _, ny, nx = x.shape  # x(bs,255,20,20) to x(bs,3,20,20,85)
        # x[i] = x.view(bs, 3, 28, ny, nx).permute(0, 1, 3, 4, 2).contiguous()

        bs,_,ny,nx,_ = pre[i].shape
        grid = _make_grid(nx, ny).to(pre[i].device)

        y = pre[i].sigmoid()
        y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + grid) * stride[i]  # xy
        y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * anchor_grid[i]  # wh
        z.append(y.view(bs, -1, no))

    return (torch.cat(z, 1), pre)


c1 = time.time()

for i in range(10000):
    y_trt = model_trt(x)
    # y_trt = model(x)
    res = detect(y_trt)
    print(res[0].shape,len(res[1]))

print(time.time() - c1)
# 0.0004208087921142578
# print(y_trt.argmax(dim=1, keepdim=True))
# tensor([[534]], device='cuda:0')

# origin 57.03164482116699
# trt 12.815325498580933

  • 0
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 3
    评论
将YOLOv5 PyTorch模型换为TensorRT模型需要以下步骤: 1. 安装TensorRTPyTorch。 2. 下载并安装yolov5。 3. 使用PyTorch将yolov5模型换为ONNX格式。 ``` python models/export.py --weights yolov5s.pt --img 640 --batch 1 --include onnx # yolov5s ``` 4. 安装ONNX-TensorRT。 ``` git clone https://github.com/onnx/onnx-tensorrt.git cd onnx-tensorrt git submodule update --init --recursive mkdir build && cd build cmake .. -DTENSORRT_ROOT=/path/to/tensorrt -DCMAKE_CXX_COMPILER=g++-7 make -j sudo make install ``` 5. 使用ONNX-TensorRT将ONNX模型换为TensorRT模型。 ``` import onnx import onnx_tensorrt.backend as backend model = onnx.load("yolov5s.onnx") # Load the ONNX model engine = backend.prepare(model, device="CUDA:0") # Prepare the TensorRT model with open("yolov5s.engine", "wb") as f: # Serialize the TensorRT engine f.write(engine.serialize()) ``` 6. 测试TensorRT模型的性能和准确性。 ``` import pycuda.driver as cuda import pycuda.autoinit import numpy as np import time # Load the TensorRT engine with open("yolov5s.engine", "rb") as f: engine = cuda.Context().deserialize_cuda_engine(f.read()) # Create the TensorRT inference context context = engine.create_execution_context() # Allocate the input and output buffers input_shape = engine.get_binding_shape(0) output_shape = engine.get_binding_shape(1) input_buffer = cuda.mem_alloc(np.prod(input_shape) * 4) output_buffer = cuda.mem_alloc(np.prod(output_shape) * 4) # Prepare the input data input_data = np.random.rand(*input_shape).astype(np.float32) # Copy the input data to the input buffer cuda.memcpy_htod(input_buffer, input_data) # Run inference start_time = time.time() context.execute_v2(bindings=[int(input_buffer), int(output_buffer)]) end_time = time.time() # Copy the output data to the output buffer output_data = np.empty(output_shape, dtype=np.float32) cuda.memcpy_dtoh(output_data, output_buffer) # Print the inference time and output data print("Inference time: {} ms".format((end_time - start_time) * 1000)) print("Output shape: {}".format(output_shape)) print("Output data: {}".format(output_data)) ```
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值