Jetson TX1 开发教程（4）--TensorRT加速Caffe初探

最新推荐文章于 2024-10-11 10:58:10 发布

Jesse_Mx

最新推荐文章于 2024-10-11 10:58:10 发布

阅读量1.7w

点赞数 3

分类专栏： Jetson TX1 文章标签： TensorRT caffe Jetson-TX1

本文链接：https://blog.csdn.net/Jesse_Mx/article/details/56022967

版权

项目地址：NVIDIA TensorRT

前言

TensorRT（GIE）是一个C++库，适用于Jetson TX1和Pascal架构的显卡（Tesla P100, K80, M4 and Titan X等），支持fp16特性，也就是半精度运算。由于采用了“精度换速度”的策略，在精度无明显下降的同时，其对inference的加速很明显，往往可以有一倍的性能提升，而且还支持使用caffe模型。目前网上关于TensorRT的介绍很少，这里博主尝试着写一些，有空还会继续补充。

TensorRT简介

TensorRT目前基于gcc4.8而写成，其独立于任何深度学习框架。对于caffe而言，TensorRT是把caffe那一套东西转化后独立运行，能够解析caffe模型的相关工具叫做 NvCaffeParser,它根据prototxt文件和caffemodel权值，转化为支持半精度的新的模型。

目前TensorRT 支持caffe大部分常用的层，包括：

Convolution（卷积层）, with or without bias. Currently only 2D convolutions (i.e. 4D input and output tensors) are supported. Note: The operation this layer performs is actually a correlation, which is a consideration if you are formatting weights to import via GIE’s API rather than the caffe parser library.

Activation（激活层）: ReLU, tanh and sigmoid.

Pooling（池化层）: max and average.

Scale（尺度变换层）: per-tensor, per channel or per-weight affine transformation and exponentiation by constant values. Batch Normalization can be implemented using the Scale layer.

ElementWise（矩阵元素运算）: sum, product or max of two tensors.

LRN（局部相应归一化层）: cross-channel only.

Fully-connected（全连接层） with or without bias

SoftMax: cross-channel only

Deconvolution（反卷积层）, with and without bias

不支持的层包括：

Deconvolution groups

PReLU

Scale, other than per-channel scaling

EltWise with more than two inputs

使用TensorRT主要有两个步骤（C++代码）：

In the build phase, the toolkit takes a network definition, performs optimizations, and generates the inference engine.

In the execution phase, the engine runs inference tasks using input and output buffers on the GPU.

想要具体了解TensorRT的相关原理的，可以参看这篇官方博客：
Production Deep Learning with NVIDIA GPU Inference Engine

这里暂时对原理不做太多涉及，下面以mnist手写体数字检测为例，结合官方例程，说明TensorRT的使用步骤。