tensorrt 安装和对tensorflow模型做推理,附python3.6解决方案
环境
- ubuntu 18.04
- tensorflow-gpu 1.12
- cuda 9.0
- cudnn 7.1
- python 3.6
- python包一大堆,差啥pip啥
安装 tensorrt4.0
从官网下载tensorrt4.0 (cuda9.0 ubuntu16压缩包版本)
解压到/usr/local 目录下,并将解压后的文件目录配置到LD_LIBRARY_PATH 环境变量
进入到tensorrt解压目录中,cd python/
进行python api的安装,sudo pip install一下需要的包
进入tensorrt的其他几个目录中,用相似的方式安装其他几个工具的python api在安装python api时,官网推荐的python3版本是3.5,如果和我一样是使用3.6的话直接sudo pip install会报错(…找遍国内外博客都没找到解决方案,改名字也是治标不治本,安装上无法使用)
- 第一步,把whl安装包名字中的3.5改成3.6(备份一下),安装
- 第二步,安装sublime
- 进入库安装目录,找到.so文件,右键属性,记下文件大小
- sudo vim 打开.so文件使用 /python3.5查找,替换成python3.6,用N n查找上一项下一项
- 全部替换完后用subl 命令打开,删除文件最后的a0(还是0a,忘了,反正删除最后两个字符),保存
不想自己改的话,可以试试用我改好的文件下载地址,替换掉就行,trt版本是4.0.1.6
进入安装目录下的simple/tf_to_trt目录(我的是/usr/local/lib/python3.6/dist-packages/tensorrt/examples/tf_to_trt)
执行tf_to_trt.py,不会报错就ok了.
tf_to_trt
在nvidia给的例子中lenet5已经完成了训练过程,但是tf_to_trt模块又重新训练了一次再进行推理,我们想要的使用方式当然不是这样.
在lenet5模块中完成了神经网络模型的训练和uff格式的保存,而tf_to_trt模块只需要读取保存的模型做推理就行了.# # Copyright 1993-2018 NVIDIA Corporation. All rights reserved. # # NOTICE TO LICENSEE: # # This source code and/or documentation ("Licensed Deliverables") are # subject to NVIDIA intellectual property rights under U.S. and # international Copyright laws. # # These Licensed Deliverables contained herein is PROPRIETARY and # CONFIDENTIAL to NVIDIA and is being provided under the terms and # conditions of a form of NVIDIA software license agreement by and # between NVIDIA and Licensee ("License Agreement") or electronically # accepted by Licensee. Notwithstanding any terms or conditions to # the contrary in the License Agreement, reproduction or disclosure # of the Licensed Deliverables to any third party without the express # written consent of NVIDIA is prohibited. # # NOTWITHSTANDING ANY TERMS OR CONDITIONS TO THE CONTRARY IN THE # LICENSE AGREEMENT, NVIDIA MAKES NO REPRESENTATION ABOUT THE # SUITABILITY OF THESE LICENSED DELIVERABLES FOR ANY PURPOSE. IT IS # PROVIDED "AS IS" WITHOUT EXPRESS OR IMPLIED WARRANTY OF ANY KIND. # NVIDIA DISCLAIMS ALL WARRANTIES WITH REGARD TO THESE LICENSED # DELIVERABLES, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY, # NONINFRINGEMENT, AND FITNESS FOR A PARTICULAR PURPOSE. # NOTWITHSTANDING ANY TERMS OR CONDITIONS TO THE CONTRARY IN THE # LICENSE AGREEMENT, IN NO EVENT SHALL NVIDIA BE LIABLE FOR ANY # SPECIAL, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, OR ANY # DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, # WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS # ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE # OF THESE LICENSED DELIVERABLES. # # U.S. Government End Users. These Licensed Deliverables are a # "commercial item" as that term is defined at 48 C.F.R. 2.101 (OCT # 1995), consisting of "commercial computer software" and "commercial # computer software documentation" as such terms are used in 48 # C.F.R. 12.212 (SEPT 1995) and is provided to the U.S. Government # only as a commercial end item. Consistent with 48 C.F.R.12.212 and # 48 C.F.R. 227.7202-1 through 227.7202-4 (JUNE 1995), all # U.S. Government End Users acquire the Licensed Deliverables with # only those rights set forth herein. # # Any use of the Licensed Deliverables in individual and commercial # software must include, in the user documentation and internal # comments to the code, the above Disclaimer and U.S. Government End # Users Notice. # import os import numpy as np import pycuda.driver as cuda import pycuda.autoinit '初始化cuda' import uff import tensorrt as trt from tensorrt.parsers import uffparser import lenet5 G_LOGGER = trt.infer.ConsoleLogger(trt.infer.LogSeverity.INFO) ''' 这里为了连接trt的api需要连接trt的输出 ''' MAX_WORKSPACE = 1 << 20 INPUT_W = 28 INPUT_H = 28 OUTPUT_SIZE = 10 MAX_BATCHSIZE = 1 ITERATIONS = 10 ''' 定义超参数: 为trt开辟的显存 输入的形状和输出到屏幕的结果数量 ''' def infer(context, input_img, batch_size): ''' 推理函数 :param context: 推理上下文 :param input_img: 输入 :param batch_size: 输入的批大小 :return: 模型输出 ''' engine = context.get_engine() '从模型上下文获取推理引擎' dims = engine.get_binding_dimensions(1).to_DimsCHW() '转换输入维度次序' elt_count = dims.C() * dims.H() * dims.W() * batch_size '计算输入大小' input_img = input_img.astype(np.float32) '将输入转换成float32' output = cuda.pagelocked_empty(elt_count, dtype=np.float32) '为输出分配内存' d_input = cuda.mem_alloc(batch_size * input_img.size * input_img.dtype.itemsize) d_output = cuda.mem_alloc(batch_size * output.size * output.dtype.itemsize) '分配显卡内存' bindings = [int(d_input), int(d_output)] '绑定输入输出' stream = cuda.Stream() '初始化cuda操作队列' cuda.memcpy_htod_async(d_input, input_img, stream) '内存拷贝,从主机内存到设备内存' context.enqueue(batch_size, bindings, stream.handle, None) '执行运算' cuda.memcpy_dtoh_async(output, d_output, stream) '内存拷贝,从设备内存到主机内存' return output def readUffToEngine(): parser = uffparser.create_uff_parser() '创建模型解析器' parser.register_input("Placeholder", (1, 28, 28), 0) parser.register_output("fc2/Relu") '为解析器定义输入和输出' engine = trt.utils.uff_file_to_trt_engine( G_LOGGER, 'trained_lenet5.uff', parser, MAX_BATCHSIZE, MAX_WORKSPACE ) '从文件解析uff模型并使用trt优化模型,输出会显示优化步骤' parser.destroy() '解析器可以丢掉了' return engine def trainToEngine(): tf_model = lenet5.learn() '训练模型并接收' uff_model = uff.from_tensorflow(tf_model, ["fc2/Relu"]) parser = uffparser.create_uff_parser() parser.register_input("Placeholder", (1, 28, 28), 0) parser.register_output("fc2/Relu") '这和上面一样' engine = trt.utils.uff_to_trt_engine( G_LOGGER, uff_model, parser, MAX_BATCHSIZE, MAX_WORKSPACE ) parser.destroy() return engine def main(): engine = readUffToEngine() '使用文件创建engine' # engine=trainToEngine() '训练模型创建engine' context = engine.create_execution_context() '获取引擎执行上下文' print("\n| TEST CASE | PREDICTION |") for i in range(ITERATIONS): img, label = lenet5.get_testcase() img = img[0] label = label[0] out = infer(context, img, 1) print("|-----------|------------|") print("| " + str(label) + " | " + str(np.argmax(out)) + " |") if __name__ == "__main__": main()
输出:
TEST CASE PREDICTION 8 8 7 7 0 0 3 3 2 2 6 6 2 2 6 6 5 5 8 8