tensorrt安装_400 fps!CenterFace+TensorRT部署人脸和关键点检测

ab170a390dc7dc5567c6ac34d3e25358.gif

本文转载自知乎,为CenterFace原作者手把手教程,欢迎参考。

https://zhuanlan.zhihu.com/p/106774468

12d9ac6569c2f4d648301f359d334d99.png

近来,很多人在使用tensorrt部署centerface遇到了各种问题,下面进行了一些解答:

1) 版本:tensorrt 6.0.1+,python3.7

2) onnx模型转化为tensorrt格式。需要注意,repo里提供centerface.onnx的input shape是10x1x32x32,需要先改为目标分辨率,再转换到tensorrt格式。相关代码如下:

import onnx
import math
input_size =(1080,1920)
model = onnx.load_model("centerface.onnx")
d = model.graph.input[0].type.tensor_type.shape.dim
print(d)
rate = (int(math.ceil(input_size[0]/d[2].dim_value)),int(math.ceil(input_size[1]/d[3].dim_value)))
print("rare",rate)
d[0].dim_value = 1
d[2].dim_value *= rate[0]
d[3].dim_value *= rate[1]
for output in model.graph.output:
d = output.type.tensor_type.shape.dim
print(d)
d[0].dim_value = 1
d[2].dim_value *= rate[0]
d[3].dim_value *= rate[1]
onnx.save_model(model,"centerface_1080_1920.onnx" )

向右滑动

3) 其他问题,可能和tensorrt安装有关。

现在就详细介绍如何安装tensorrt和部署centerface。

1、Centerface模型介绍

Centerface具有具有小巧精度高特点,是目前最快的人脸检测和关键点的方法。该网络采用了anchor-free的方法,并引入了FPN的结构和思想,使得模型在小尺度的脸上具有更好的鲁棒性。

Centerface链接:

https://github.com/Star-Clouds/CenterFace

2、TensorRT 安装

TensorRT的安装方式有好几种安装方式,可以采用简单便捷的tar包的安装方式。

2.1 下载安装包

先使用下面命令确认机器的cuda、cudnn的版本,然后对应下载相应的安装包;

cat /usr/local/cuda/version.txt
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

根据输出进行版本信息获取:

CUDA Version 10.1.243
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 6
#define CUDNN_PATCHLEVEL 3
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#include "driver_types.h"
向右滑动

因此,cuda的版本是19.1.243,cudnn版本为 7.6.3。如果找不到对应的包,则进行相关升级。

2.2 安装过程

  • 安装pycuda

sudo pip install pycuda
  • 安装TensorRT

# 解压安装包
tar xzvf TensorRT-6.0.1.5.Ubuntu-18.04.x86_64-gnu.cuda-10.1.cudnn7.6.tar.gz
cd TensorRT-6.0.1.5
# 安装TensorRT-python
cd python
sudo pip install tensorrt-6.0.1.5-py2.py3-none-any.whl
#安装UFF
cd uff
sudo pip install uff-6.0.1-py2.py3-none-any.whl
#安装graphsurgeon
cd graphsurgeon
sudo pip install graphsurgeon-0.3.2-py2.py3-none-any.whl

向右滑动

  • 环境配置

~sudo vim ~/.bashrc
# 添加下面三行
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/ubuntu/TensorRT-6.0.1.5/lib
export CUDA_INSTALL_DIR=/usr/local/cuda-10.1
export CUDNN_INSTALL_DIR=/usr/local/cuda-10.1
source ~/.bashrc
向右滑动

2.3 测试

  • python环境

f300bfc4feb3a4ae7516f882a96d571a.png

  • C++环境

cd sample
make

编译完成后会在TensorRT-6.0.1.5目录的bin文件夹下生产对应的可执行文件

在执行mnist程序之前,先下载mnist数据放在data/mnist下,并解压:

a10c10e83396c19eefffddb44137e5af.png

然后进行bin文件后执行sample_mnist,结果如下:

1dc8a4557379dc2e9995b650ae85aeed.png

3、TensorRT 推理

现在的深度学习框架太多,直接使用训练框架做推理,很难达到真正的加速效果。而且各个训练框架很难直接进行模型的转换?在这种情况之下,拥有统一化的定义引入onnx,以实现不同框架之间的互相转化和推理,正好满足各个厂商需求。onnx可以使用netron,图像化显示ONNX模型的网络拓扑图。

  • 先把CenterFace的onnx转化为TensorRT的trt文件,然后加载trt文件,从而构建engine。

def get_engine(max_batch_size=1, onnx_file_path="", engine_file_path="", fp16_mode=False, int8_mode=False, save_engine=False):
"""Attempts to load a serialized engine if available, otherwise builds a new TensorRT engine and saves it."""
def build_engine(max_batch_size, save_engine):
"""Takes an ONNX file and creates a TensorRT engine to run inference with"""
with trt.Builder(TRT_LOGGER) as builder, \
builder.create_network() as network, \
trt.OnnxParser(network, TRT_LOGGER) as parser:
builder.max_workspace_size = 1 << 30 # Your workspace size
builder.max_batch_size = max_batch_size
# pdb.set_trace()
builder.fp16_mode = fp16_mode # Default: False
builder.int8_mode = int8_mode # Default: False
if int8_mode:
# To be updated
raise NotImplementedError
# Parse model file
if not os.path.exists(onnx_file_path):
quit('ONNX file {} not found'.format(onnx_file_path))
print('Loading ONNX file from path {}...'.format(onnx_file_path))
with open(onnx_file_path, 'rb') as model:
print('Beginning ONNX file parsing')
parser.parse(model.read())
print('Completed parsing of ONNX file')
print('Building an engine from file {}; this may take a while...'.format(onnx_file_path))
engine = builder.build_cuda_engine(network)
print("Completed creating Engine")
if save_engine:
with open(engine_file_path, "wb") as f:
f.write(engine.serialize())
return engine
if os.path.exists(engine_file_path):
# If a serialized engine exists, load it instead of building a new one.
print("Reading engine from file {}".format(engine_file_path))
with open(engine_file_path, "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
return runtime.deserialize_cuda_engine(f.read())
else:
return build_engine(max_batch_size, save_engine)

向右滑动

  • 构建输入和输出

class HostDeviceMem(object):
def __init__(self, host_mem, device_mem):
"""Within this context, host_mom means the cpu memory and device means the GPU memory
"""
self.host = host_mem
self.device = device_mem
def __str__(self):
return "Host:\n" + str(self.host) + "\nDevice:\n" + str(self.device)
def __repr__(self):
return self.__str__()
def allocate_buffers(engine):
inputs = []
outputs = []
bindings = []
stream = cuda.Stream()
for binding in engine:
size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
dtype = trt.nptype(engine.get_binding_dtype(binding))
# Allocate host and device buffers
host_mem = cuda.pagelocked_empty(size, dtype)
device_mem = cuda.mem_alloc(host_mem.nbytes)
# Append the device buffer to device bindings.
bindings.append(int(device_mem))
# Append to the appropriate list.
if engine.binding_is_input(binding):
inputs.append(HostDeviceMem(host_mem, device_mem))
else:
outputs.append(HostDeviceMem(host_mem, device_mem))
return inputs, outputs, bindings, stream

向右滑动

  • 拿到forward后的结果,进行后处理

def do_inference(context, bindings, inputs, outputs, stream, batch_size=1):
# Transfer data from CPU to the GPU.
[cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
# Run inference.
context.execute_async(batch_size=batch_size, bindings=bindings, stream_handle=stream.handle)
# Transfer predictions back from the GPU.
[cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
# Synchronize the stream
stream.synchronize()
# Return only the host outputs.
return [out.host for out in outputs]
向右滑动

4:运行结果如下

d30a2f7b477685351c5761afed99678a.png

完整代码:https://github.com/Star-Clouds/CenterFace

最后看一下tensorrt加速的效果:

d65b47c66ed821bbb01c99196e934d97.png

END

0de06e91aeb1f53df5af713c16027824.png

备注:人脸

a5c21c1b983899fba99b4273eb04f8d9.png

人脸技术交流群

人脸检测、识别、对齐、重建、表情识别、表情迁移等技术,

若已为CV君其他账号好友请直接私信。

我爱计算机视觉

微信号:aicvml

QQ群:805388940

微博知乎:@我爱计算机视觉

投稿:amos@52cv.net

网站:www.52cv.net

47143889c384b359c34eecda71b00f37.png

在看,让更多人看到  dc6c992fb844a8602a069d10bea38b78.gif

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值