项目地址:NVIDIA TensorRT
前言
TensorRT(GIE)是一个C++库,适用于Jetson TX1和Pascal架构的显卡(Tesla P100, K80, M4 and Titan X等),支持fp16特性,也就是半精度运算。由于采用了“精度换速度”的策略,在精度无明显下降的同时,其对inference的加速很明显,往往可以有一倍的性能提升,而且还支持使用caffe模型。目前网上关于TensorRT的介绍很少,这里博主尝试着写一些,有空还会继续补充。
TensorRT简介
TensorRT目前基于gcc4.8而写成,其独立于任何深度学习框架。对于caffe而言,TensorRT是把caffe那一套东西转化后独立运行,能够解析caffe模型的相关工具叫做 NvCaffeParser,它根据prototxt文件和caffemodel权值,转化为支持半精度的新的模型。
目前TensorRT 支持caffe大部分常用的层,包括:
- Convolution(卷积层), with or without bias. Currently only 2D convolutions (i.e. 4D input and output tensors) are supported. Note: The operation this layer performs is actually a correlation, which is a consideration if you are formatting weights to import via GIE’s API rather than the caffe parser library.
- Activation(激活层): ReLU, tanh and sigmoid.
- Pooling(池化层): max and average.
- Scale(尺度变换层): per-tensor, per channel or per-weight affine transformation and exponentiation by constant values. Batch Normalization can be implemented using the Scale layer.
- ElementWise(矩阵元素运算): sum, product or max of two tensors.
- LRN(局部相应归一化层): cross-channel only.
- Fully-connected(全连接层) with or without bias
- SoftMax: cross-channel only
- Deconvolution(反卷积层), with and without bias
不支持的层包括:
- Deconvolution groups
- PReLU
- Scale, other than per-channel scaling
- EltWise with more than two inputs
使用TensorRT主要有两个步骤(C++代码):
- In the build phase, the toolkit takes a network definition, performs optimizations, and generates the inference engine.
- In the execution phase, the engine runs inference tasks using input and output buffers on the GPU.
想要具体了解TensorRT的相关原理的,可以参看这篇官方博客:
Production Deep Learning with NVIDIA GPU Inference Engine
这里暂时对原理不做太多涉及,下面以mnist手写体数字检测为例,结合官方例程,说明TensorRT的使用步骤。
TensorRT运行caffe模型实战
获取TensorRT支持
首先,Jetson TX1可以通过Jet