pytorch 直接转 tensorrt 的trt文件，并运行，量化int8

最新推荐文章于 2024-06-13 18:32:06 发布

oyjwin

最新推荐文章于 2024-06-13 18:32:06 发布

阅读量2.6k

点赞数

文章标签： pytorch 深度学习

本文链接：https://blog.csdn.net/oyjwin/article/details/118224609

版权

这个项目是我使用tensorrt，torch2tr包，将训练的yolov5s模型，进行tensorrt推理加速，量化精度为int8，但值得注意的是tensorrt对于forward的部份操作是不支持的，如切片等，这时可以考虑模型部份转换或者改写forward方法；(关注查看完整代码)

1.tensorrt加速的原理：将conv、bn、relu 和 conv、relu 进行融合，融合为一层，从而减少网络参数；

2.tensorrt对于分支结构加速效果尤为明显，像inception网络等；如分支1、分支2、分支3可能在同一个时间步骤下都包含1*1卷积层，tensorrt会将这三个分支的1*1卷积层合并为一个1*1卷积层，从而达到减少卷积层的目的；

3.tensorrt还可以对模型进行量化；量化到int8、fp16等；

yolov5 pt2trt代码示例：

import torch
from torch2trt import torch2trt
from torch2trt import TRTModule
import time


model = torch.load('/home/Oyj/yolov5/yolov5s_trainModel.pt').cuda()


x = torch.rand((1, 3, 608, 608)).cuda()  # 占位符，3通道的608*608尺寸图片，最好选用图片

# convert to TensorRT feeding sample data as input

model_trt = torch2trt(model, [x],int8_mode=True)

#这里首先把pytorch模型加载到CUDA，然后定义好输入的样例x（这里主要用来指定输入的shape，用ones, zeros都可以）。model_trt就是转成功的TensorRT模型，你运行上面代码没报错就证明你转tensorRT成功了。

torch.save(model_trt.state_dict(), 'yolov5s_trt.pth')

model_trt = TRTModule()

model_trt.load_state_dict(torch.load('yolov5s_trt.pth'))

def _make_grid(nx=20, ny=20):
    yv, xv = torch.meshgrid([torch.arange(ny), torch.arange(nx)])
    return torch.stack((xv, yv), 2).view((1, 1, ny, nx, 2)).float()

def detect(pre):
    z = []
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    stride = torch.Tensor([8,16,32]).cuda()
    anchors = torch.Tensor([[[1.25000, 1.87500, 3.62500],
                             [2.00000, 3.87500, 4.87500],
                             [4.12500, 3.68750, 11.65625]],
                            [[1.62500, 3.81250, 2.81250],
                             [3.75000, 2.81250, 6.18750],
                             [2.87500, 7.43750, 10.18750]]]).cuda()
    anchor_grid = torch.Tensor([[[[[[ 10.,  13.]]],
                                  [[[ 16.,  30.]]],
                                  [[[ 33.,  23.]]]]],
                                [[[[[ 30.,  61.]]],
                                  [[[ 62.,  45.]]],
                                  [[[ 59., 119.]]]]],
                                [[[[[116.,  90.]]],
                                  [[[156., 198.]]],
                                  [[[373., 326.]]]]]]).cuda()
    no = 28

    for i in range(len(pre)):
        # print(pre[0].shape)
        # bs, _, ny, nx = x.shape  # x(bs,255,20,20) to x(bs,3,20,20,85)
        # x[i] = x.view(bs, 3, 28, ny, nx).permute(0, 1, 3, 4, 2).contiguous()

        bs,_,ny,nx,_ = pre[i].shape
        grid = _make_grid(nx, ny).to(pre[i].device)

        y = pre[i].sigmoid()
        y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + grid) * stride[i]  # xy
        y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * anchor_grid[i]  # wh
        z.append(y.view(bs, -1, no))

    return (torch.cat(z, 1), pre)


c1 = time.time()

for i in range(10000):
    y_trt = model_trt(x)
    # y_trt = model(x)
    res = detect(y_trt)
    print(res[0].shape,len(res[1]))

print(time.time() - c1)
# 0.0004208087921142578
# print(y_trt.argmax(dim=1, keepdim=True))
# tensor([[534]], device='cuda:0')

# origin 57.03164482116699
# trt 12.815325498580933

oyjwin

关注

0
点赞
踩
8

收藏

觉得还不错? 一键收藏
3
评论
pytorch 直接转 tensorrt 的trt文件，并运行，量化int8

这个项目是我使用tensorrt，torch2tr包，将训练的yolov5s模型，进行tensorrt推理加速，量化精度为int8，但值得注意的是tensorrt对于forward的部份操作是不支持的，如切片等，这时可以考虑模型部份转换或者改写forward方法；(关注查看完整代码)1.tensorrt加速的原理：将conv、bn、relu 和 conv、relu 进行融合，融合为一层，从而减少网络参数；2.tensorrt对于分支结构加速效果尤为明显，像inception网络等；如分支1、分支2、
复制链接

扫一扫