这个项目是我使用tensorrt,torch2tr包,将训练的yolov5s模型,进行tensorrt推理加速,量化精度为int8,但值得注意的是tensorrt对于forward的部份操作是不支持的,如切片等,这时可以考虑模型部份转换或者改写forward方法;(关注查看完整代码)
1.tensorrt加速的原理:将conv、bn、relu 和 conv、relu 进行融合,融合为一层,从而减少网络参数;
2.tensorrt对于分支结构加速效果尤为明显,像inception网络等;如分支1、分支2、分支3可能在同一个时间步骤下都包含1*1卷积层,tensorrt会将这三个分支的1*1卷积层合并为一个1*1卷积层,从而达到减少卷积层的目的;
3.tensorrt还可以对模型进行量化;量化到int8、fp16等;
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
yolov5 pt2trt代码示例:
import torch
from torch2trt import torch2trt
from torch2trt import TRTModule
import time
model = torch.load('/home/Oyj/yolov5/yolov5s_trainModel.pt').cuda()
x = torch.rand((1, 3, 608, 608)).cuda() # 占位符,3通道的608*608尺寸图片,最好选用图片
# convert to TensorRT feeding sample data as input
model_trt = torch2trt(model, [x],int8_mode=True)
#这里首先把pytorch模型加载到CUDA,然后定义好输入的样例x(这里主要用来指定输入的shape,用ones, zeros都可以)。model_trt就是转成功的TensorRT模型,你运行上面代码没报错就证明你转tensorRT成功了。
torch.save(model_trt.state_dict(), 'yolov5s_trt.pth')
model_trt = TRTModule()
model_trt.load_state_dict(torch.load('yolov5s_trt.pth'))
def _make_grid(nx=20, ny=20):
yv, xv = torch.meshgrid([torch.arange(ny), torch.arange(nx)])
return torch.stack((xv, yv), 2).view((1, 1, ny, nx, 2)).float()
def detect(pre):
z = []
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
stride = torch.Tensor([8,16,32]).cuda()
anchors = torch.Tensor([[[1.25000, 1.87500, 3.62500],
[2.00000, 3.87500, 4.87500],
[4.12500, 3.68750, 11.65625]],
[[1.62500, 3.81250, 2.81250],
[3.75000, 2.81250, 6.18750],
[2.87500, 7.43750, 10.18750]]]).cuda()
anchor_grid = torch.Tensor([[[[[[ 10., 13.]]],
[[[ 16., 30.]]],
[[[ 33., 23.]]]]],
[[[[[ 30., 61.]]],
[[[ 62., 45.]]],
[[[ 59., 119.]]]]],
[[[[[116., 90.]]],
[[[156., 198.]]],
[[[373., 326.]]]]]]).cuda()
no = 28
for i in range(len(pre)):
# print(pre[0].shape)
# bs, _, ny, nx = x.shape # x(bs,255,20,20) to x(bs,3,20,20,85)
# x[i] = x.view(bs, 3, 28, ny, nx).permute(0, 1, 3, 4, 2).contiguous()
bs,_,ny,nx,_ = pre[i].shape
grid = _make_grid(nx, ny).to(pre[i].device)
y = pre[i].sigmoid()
y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + grid) * stride[i] # xy
y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * anchor_grid[i] # wh
z.append(y.view(bs, -1, no))
return (torch.cat(z, 1), pre)
c1 = time.time()
for i in range(10000):
y_trt = model_trt(x)
# y_trt = model(x)
res = detect(y_trt)
print(res[0].shape,len(res[1]))
print(time.time() - c1)
# 0.0004208087921142578
# print(y_trt.argmax(dim=1, keepdim=True))
# tensor([[534]], device='cuda:0')
# origin 57.03164482116699
# trt 12.815325498580933