TensorRT 5 开发者手册中文版使用深度学习框架（三-6）

本文链接：https://blog.csdn.net/hw5226349/article/details/83899150

本文档是TensorRT 5开发者手册的一部分，介绍了如何使用深度学习框架，特别是TensorFlow，进行模型转换和权重处理。内容包括TensorFlow模型生成冻结图，Keras模型转换，RNN权重的TensorFlow存储格式和转换到TensorRT的过程，以及使用Graph Surgeon API预处理TensorFlow图。

摘要由CSDN通过智能技术生成

原创作品，转载时请务必以超链接形式标明文章原始出处: http://www.dapalm.com/?p=206,作者：大数据，怕了么?
　　本手册为TensorRT 4.0.1.6 GA版英文手册翻译而来，主要作为备忘所用，分享出来以供更多开发者使用。TensorRT Developer Guide手册一共分为四个章节，主要内容在第二、三章，看懂这两章，写代码够用了。第一章为TensorRT综述，就是自吹有多牛逼。第四章为示例，介绍demo的代码结构及功能。开篇是目录，前三章每章为两到三篇，最后第四章示例，会拆分几个关键示例进行详细说明。
　　注意，2018年10月份，NVIDIA又更新TensorRT 5.0.2.6 Linux版本和TensorRT 5.0.1.3 Windows版，正式支持Windows10,经测试向下兼容Windows7也可以用。
　　本来这章应该是TensorRT4的第3章，在TensorRT5的手册上直接到了第8章，这里暂时先从第8章开始，继续翻译。有时间再补充前7章中与TensorRT4中增加的内容。

第8章

第8章使用深度学习框架

第8章使用深度学习框架

使用Python API，利用TensorFlow，Caffe或ONNX等兼容框架构建的模型使用TensorRT提供的解析器构建加速引擎。Python API还支持以NumPy兼容格式存储权重的框架，如PyTorch。

8.1 框架支持的算子

Caffe
Caffe框架支持的操作：

Convolution：3D,with or without bias
Pooling :Max, Average, Max_Average_blend
InnerProduct
Softmax
Activation : ReLu, Sigmoid,Tanh
LRN
Power
ElementWise： sum, product, maximum, subtraction,division, power
Concatenation: across channel
Deconvolution
BatchNormalization
Scale
Crop
Reduction : sum,prod,max,min,avg
Reshape
Permute
Dropout
Concat
ElementWise
RNN : Input,Output,Forget,Update,Reset,Cell,Hidden
Unary : exp,log,sqrt,recip,abs,neg
Padding
Shuffle
Topk max,min
Gather
Matrix_Multiply
Ragged_Softmax
Constant : Uff模型的权值就是保存为常量类型
RNN_v2
Plugin ： FasterRCNN fused plugin (RPN + ROI pooling). Normalize plugin. Permute plugin. PriorBox plugin. SSD DetectionOutput plugin. Concat plugin. YOLO PReLU Plugin. YOLO Reorg Plugin. YOLO Region Plugin.

TensorFlow
TensorFlow框架支持的算子：

Placeholder
Const
Add, Sub, Mul, Div, Minimum and Maximum
BiasAdd
Negative, Abs, Sqrt, Rsqrt, Pow, Exp and Log
注意：NvUffParser只支持Neg, Abs, Sqrt, Rsqrt, Exp和Log的常量节点
FusedBatchNorm
ReLU, TanH, and Sigmoid
SoftMax
Mean
ConcatV2
Reshape
Transpose
Conv2D
DepthwiseConv2dNative
ConvTranspose2D
MaxPool
AvgPool
Pad

ONNX
ONNX解析器是一个开源项目,可以在GitHub：ONNX TensorRT中找到有关支持算子的最新信息。

8.2 使用TensorFlow

有关TensorFlow模型直接使用TensorRT的信息，请参阅：

Python示例 - [9.2.2 end_to_end_tensorflow_mnist]
TensorFlow框架中直接创建TensorRT引擎

8.2.1 TensorFlow模型生成冻结图

为了使用UFF命令行工具，TensorFlowGraph必须保存为.pd的冻结图文件，请参阅：

TensorFlow模型生成冻结图的方法
TensorFlow模型导出方法
注意：一般是在模型导出的过程中直接转换成冻结图

8.2.2 Keras模型生成冻结图

使用如下代码生成Keras模型的冻结图：

from keras.models import load_model
import keras.backend as K
from tensorflow.python.framework import graph_io
from tensorflow.python.tools import freeze_graph
from tensorflow.core.protobuf import saver_pb2
from tensorflow.python.training import saver as saver_lib

def convert_keras_to_pb(keras_model, out_names, models_dir,model_filename):
	model = load_model(keras_model)
	K.set_learning_phase(0)
	sess = K.get_session()
	saver = saver_lib.Saver(write_version=saver_pb2.SaverDef.V2)
	checkpoint_path = saver.save(sess, 'saved_ckpt', global_step=0,
								latest_filename='checkpoint_state')
	graph_io.write_graph(sess.graph, '.', 'tmp.pb')
	freeze_graph.freeze_graph('./tmp.pb', '',
								False, checkpoint_path, out_names,
								"save/restore_all", "save/Const:0",
								models_dir+model_filename, False, "")

8.2.3 冻结图转换为UFF

使用如下示例代码将.pb冻结图转换成.uff格式文件：

convert-to-uff input_file [-o output_file] [-O output_node]

convert-to-uff input_file -l #打印TensorFlow层

8.2.4 使用TensorFlow RNN权重

本节提供有关TensorFlow权重及其存储格式的信息。此外，以下部分将指导您如何从TensorFlow处理和解密RNN权重。
这节主要内容是将各种训练框架下输出的模型权重转换成TensorRT格式（planner格式）

8.2.4.1 TensorRT支持的TensorFlow RNN单元

TensorRT中的循环神经网络层来自TensorFlow的MultiRNNCell算子。每一层由具有相同配置的多个子层组成，换句话说，隐藏和嵌入大小。完成该封装使得多个子层之间的内部连接可以从用户抽象出来（其实和DenseBlock、ResBlock一个意思，内部包含多个层）。当涉及更深层的网络时，这允许更简单的代码。

TensorRT支持四种不同的RNN层类型。这些层类型是RNN relu，RNN tanh，LSTM和GRU。与这些类型匹配的TensorFlow Cells是：

TensorRT RNN Relu/Tanh Layer
BaiscRNNCells

允许的激活函数有：tf.tanh 和 tf.nn.relu
这是一个独立于平台的单元

TensorRT LSTM Layer

BasicLSTMCell

在TensorFlow中创建这个算子的实例时，forget_bias必须设置为0。为了支持非0的遗忘偏置，需要通过添加一个参数化的遗忘偏置去转存TensorFlow的遗忘偏置。
这是一个独立于平台的单元

CudnnCompatibleLSTMCell

对forget bias的条件和BasicLSTMCell一样
目前不支持peepholes（gate由当前输入xt、上一时刻隐藏状态ht-1、上一时刻单元状态ct-1），use_peepholes必须设置成False
cudnn兼容

TensorRT GRU Layer
CudnnCompatibleGRUCell

cudnn兼容
由于与标准的、平台独立的GRUCell不同，所以CudnnCompatibleGRUCell可以在TensorRT中正确使用

8.2.4.2 保持TensorFlow和TensorRT之间的模型一致性

对于未在TensorFlow RNN Cells Supported In TensorRT中列出的任何TensorFlow Cell，请参阅TensorRT API和TensorFlow API以确保Cell在数学上等同于TensorRT支持的Cell，并且存储格式与您期望的格式一致。这样做的一个好方法是设计单元测试，使用TensorFlow作为正