使用Tensorrt的python api 部署支持动态batch的yolov5s-face

上次尝试了

使用Tensorrt的python api 部署支持动态batch的yolov5s_u012160945的博客-CSDN博客系统环境:ubuntu 1804cuda11.3tensorrt 8.2.06显卡2080pytorch 1.10.0onnx 1.10.2onnx-simplifier 0.3.6步骤1:导出onnx 模型(参考https://github.com/shouxieai/tensorRT_Pro)1.1 下载yolov5项目并修改相关代码,目的是减少导出onnx的复杂度,只保留一个输出便于后处理# 下载并进入yolov5项目git clone git@githubhttps://blog.csdn.net/u012160945/article/details/121555088

了解了通过通过python api完成pytorch-->onnx--->tensorrt的过程,最近发现了yolov5-face模型效果也是非常amazing,于是这回依葫芦画瓢把它也转为tensorrt引擎.

系统环境:

ubuntu 1804

cuda11.3

tensorrt 8.2.06

显卡2080

pytorch 1.10.0

onnx 1.10.2

onnx-simplifier 0.3.6

步骤1:导出onnx 模型

1.1 下载项目并获取转onnx的脚本

git clone https://gitee.com/mumuU1156/yolov5-face.git
cd yolov5-face
cp models/export.py ./

1.2 修改models/yolo.py的Detect类的forward方法,目的是让输出变为1个且是已经完成sigmoid和stride 放大的结果,这样就方便后面不用再后处理了.

# yolo.py下Detect类的forward方法
#        if self.export:
#            for i in range(self.nl):
#                x[i] = self.m[i](x[i])
#                bs, _, ny, nx = x[i].shape  # x(bs,48,20,20) to x(bs,3,20,20,16)
#                x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()
#
#            return x
# 修改为
        if self.export:
            for i in range(self.nl):
                x[i] = self.m[i](x[i])
                bs, _, ny, nx = x[i].shape  # x(bs,48,20,20) to x(bs,3,20,20,16)
                bs  = -1
                ny = int(ny)
                nx = int(nx)
                x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()
                if self.grid[i].shape[2:4] != x[i].shape[2:4]:
                    self.grid[i] = self._make_grid(nx, ny).to(x[i].device)
                grid_ = self.grid[i]
                #anchor_grid_ = self.anchor_grid[i]
                # disconnect for pytorch trace
        anchor_grid_ = (self.anchor_grid[i].clone()).view(1,-1,1,1,2)
                stride_ = self.stride[i]
                anchor_grid_cat = torch.cat([anchor_grid_]*5,-1)
                grid_cat = torch.cat([grid_]*5,-1)
                # 把模型结果slice成xy,wh,conf_box,landmakr,conf_face几个部分分别操作
                xy_slice = x[i][...,:2]
                wh_slice = x[i][...,2:4]
                conf_slice_box = x[i][...,4:5]#必须写'4:5'不能写'4',不然会使用不支持的unsqueeze操作
                landmark_slice = x[i][...,5:15]
                conf_slice = x[i][...,15:]#必须写'15:'不能写'15',不然会使用不支持的unsqueeze操作
                xy_slice = (xy_slice.sigmoid()*2. - 0.5 + grid_)*stride_ # xy
                wh_slice = (wh_slice.sigmoid()*2)**2*anchor_grid_ #wh
                conf_slice_box = conf_slice_box.sigmoid()
                conf_slice = conf_slice.sigmoid()
                landmark_slice =landmark_slice*anchor_grid_cat+grid_cat*stride_
                y = torch.cat([xy_slice,wh_slice,conf_slice_box,landmark_slice,conf_slice],4) #结果组合
                z.append(y.view(bs, self.na * ny * nx, self.no))
            return torch.cat(z, 1)

1.3 修改export.py,因为模型输出改为只有1个了.

#    torch.onnx.export(model, img, f, verbose=False, opset_version=12, input_names=['data'],
#                      output_names=['stride_' + str(int(x)) for x in model.stride])
#
#修改为
    torch.onnx.export(model, img, f, verbose=False, opset_version=12, input_names=['data'],
                     output_names=['out'],dynamic_axes={'data': {0: 'batch'},  'out': {0: 'batch'}})

1.4 运行export.py

python export.py --weights ./yolov5s-face.pt

然后就生成了yolov5s-face.onnx文件,用Netron 打开可以看到模型结构,可以发现模型现在只有1个输出,维度是[Batch_size,25200,16] 

 步骤2:创建tensorrt推理引擎

import tensorrt as trt

#构建logger,builder,network
logger = trt.Logger(trt.Logger.WARNING)
builder = trt.Builder(logger)
network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
parser = trt.OnnxParser(network, logger)
#读入onnx查看有无错误
success = parser.parse_from_file("yolov5s-face.onnx")
for idx in range(parser.num_errors):
    print('xxxx',parser.get_error(idx))
if success:
    print('Construction sucess!!!')
    pass # Error handling code here
profile = builder.create_optimization_profile();
profile.set_shape("data", (1,3,640,640), (8,3,640,640), (16,3,640,640)) 
config = builder.create_builder_config()
config.add_optimization_profile(profile)
config.max_workspace_size = 1 << 30 # 1 MiB
serialized_engine = builder.build_serialized_network(network, config)
with open("yolov5s-face.engine", "wb") as f:
    print('正在写入engine文件...')
    f.write(serialized_engine)
    print('构建引擎成功!!!')

步骤3:从engine文件反序列化生成引擎

import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np
import cv2
import time
with open("yolov5s-face.engine", "rb") as f:
    serialized_engine = f.read()
logger = trt.Logger(trt.Logger.WARNING)
runtime = trt.Runtime(logger)
engine = runtime.deserialize_cuda_engine(serialized_engine)

步骤4:推理

BATCH_SIZE=8
context = engine.create_execution_context()
context.set_binding_shape(0, (BATCH_SIZE, 3, 640, 640)) #这句非常重要!!!!!
inputs, outputs, bindings, stream = allocate_buffers(engine,max_batch_size=BATCH_SIZE) #构建输入,输出,流指针

img = cv2.imread('./test.jpg')
batch_data = np.repeat(pre_process(img),BATCH_SIZE,0)
np.copyto(inputs[0].host, batch_data.ravel())
result = do_inference_v2(context, bindings, inputs, outputs, stream)[0]
result = np.reshape(result,[BATCH_SIZE,-1,16])

 *结果绘制(landmark就不画了...)

img = cv2.imread('./test.jpg')
img = cv2.resize(img,(640,640))

boxes, confs, classes = filter_boxes(result[7],0.3)
boxes, confs, classes = non_max_suppression(boxes, confs, classes)
for box,conf,cls in zip(boxes,confs,classes):
    x1,y1,x2,y2 = np.int32(box)
    cv2.rectangle(img,(x1,y1),(x2,y2),(0,0,255),2)
cv2.imwrite('tmp.jpg',img)

----

* 文中未给出的函数和类,请参考上一篇博文使用Tensorrt的python api 部署支持动态batch的yolov5s_u012160945的博客-CSDN博客

----

TODO:

1.NMS相关算法加速

2.yolov5-face在已经到处宽高固定的onnx后,通过改写生成任意宽高的onnx并生成推理引擎

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值