TensorRT 安装与测试

最新推荐文章于 2024-06-01 17:16:45 发布

零点七零七

最新推荐文章于 2024-06-01 17:16:45 发布

阅读量2.4k

点赞数

文章标签： linux python 服务器

本文链接：https://blog.csdn.net/BCblack/article/details/129985142

版权

☘️前言

在正式安装前，应确保已经安装好了 NVIDIA CUDA™ Toolkit，如果没有安装可以参考：NVIDIA CUDA Installation Guide
对于 TensorRT 来说，目前 cuDNN 是一个可选项，他现在只用来加速很少的层。

可以采用下面三种模式来安装 TensorRT：

TensorRT的完整安装，包括TensorRT计划文件构建器功能。此模式与 TensorRT 8.6.0 之前提供的运行时相同。
learn runtime 安装，此安装明显小于完整安装，并允许您加载和运行使用版本兼容的构建器标志构建的引擎。此安装将不提供生成 TensorRT 计划文件的功能。
调度运行时安装。此安装允许以最小的内存消耗进行部署，并允许您加载和运行使用版本兼容的构建器标志构建的引擎，并包含精简运行时。此安装将不提供生成 TensorRT 计划文件的功能。

查看 CUDA 版本

nvcc -V

输出如下：

查看 CuDNN 版本

针对 8.0 前的版本使用以下指令查看cudnn的版本：

cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

输出：

#define CUDNN_MAJOR 5
#define CUDNN_MINOR 1
#define CUDNN_PATCHLEVEL 10
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

#include "driver_types.h"

8.0 以上版本查看方式为

cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2

输出：

#define CUDNN_MAJOR 8
#define CUDNN_MINOR 0
#define CUDNN_PATCHLEVEL 4
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

#endif /* CUDNN_VERSION_H */

🛴开始安装

1. 使用 DEB 安装

首先安装 CUDA，参考：https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
接着安装 cuDNN，参考：https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html
下载与您正在使用的 Ubuntu 版本和 CPU 架构匹配的 TensorRT 本地存储库文件。
最后安装使用 Debian 方式安装 TensorRT，注意替换 ubuntu 、 cuda 、cpu架构版本

os="ubuntuxx04"
tag="8.x.x-cuda-x.x"
sudo dpkg -i nv-tensorrt-local-repo-${os}-${tag}_1.0-1_amd64.deb
sudo cp /var/nv-tensorrt-local-repo-${os}-${tag}/*-keyring.gpg /usr/share/keyrings/
sudo apt-get update

For full runtime

sudo apt-get install tensorrt

For the lean runtime only, instead of tensorrt

sudo apt-get install libnvinfer-lean8 sudo apt-get install libnvinfer-vc-plugin8

For lean runtime Python package

sudo apt-get install python3-libnvinfer-lean

For dispatch runtime Python package

sudo apt-get install python3-libnvinfer-dispatch

For all TensorRT Python packages

python3 -m pip install numpy 
sudo apt-get install python3-libnvinfer-dev

The following additional packages will be installed

python3-libnvinfer
python3-libnvinfer-lean
python3-libnvinfer-dispatch

If you want to install Python packages for the lean or dispatch runtime only, specify these individually rather than installing the dev package.
If you want to use TensorRT with the UFF converter to convert models from TensorFlow

python3 -m pip install protobuf 
sudo apt-get install uff-converter-tf

The graphsurgeon-tf package will also be installed with this command.
If you want to run samples that require onnx-graphsurgeon or use the Python module for your own project

python3 -m pip install numpy onnx 
sudo apt-get install onnx-graphsurgeon

🍁验证安装

dpkg-query -W tensorrt

输出如下：

2. 使用 TAR 安装

2.1 从官网下载与环境匹配的 TAR 安装包

这里更新了服务器，服务器环境为

python : 3.10
cuda ：12.0
显卡：3090 * 2
cuDnn : 8.8.0

EA 版本代表抢先体验（在正式发布之前）。
GA 代表通用性。表示稳定版，经过全面测试。

解压

tar -xzvf TensorRT-8.6.0.12.Linux.x86_64-gnu.cuda-12.0.tar.gz

接着进入该文件
添加环境变量

vim ~/.bashrc
export LD_LIBRARY_PATH=解压TensorRT的路径/TensorRT-8.6.0.12/lib:$LD_LIBRARY_PATH
source ~/.bashrc

这里笔者使用的是 zsh
所以上面命令修改为

vim ~/.zshrc
export LD_LIBRARY_PATH=解压TensorRT的路径/TensorRT-8.6.0.12/lib:$LD_LIBRARY_PATH
source ~/.zshrc

接着依次安装

cd TensorRT-${version}/uff

python3 -m pip install uff-0.6.9-py2.py3-none-any.whl

#以下为按需安装

cd TensorRT-${version}/uff

python3 -m pip install uff-0.6.9-py2.py3-none-any.whl

cd TensorRT-${version}/graphsurgeon

python3 -m pip install graphsurgeon-0.4.6-py2.py3-none-any.whl

cd TensorRT-${version}/onnx_graphsurgeon
	
python3 -m pip install onnx_graphsurgeon-0.3.12-py2.py3-none-any.whl

🍁验证安装

例程测试：

cd samples/sampleOnnxMNIST/
make -j8
cd ../../bin
bin ./sample_onnx_mnist

输出如下：

&&&& RUNNING TensorRT.sample_onnx_mnist [TensorRT v8600] # ./sample_onnx_mnist
[04/06/2023-11:00:53] [I] Building and running a GPU inference engine for Onnx MNIST
[04/06/2023-11:00:53] [I] [TRT] [MemUsageChange] Init CUDA: CPU +362, GPU +0, now: CPU 367, GPU 375 (MiB)
[04/06/2023-11:00:56] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +1209, GPU +264, now: CPU 1652, GPU 639 (MiB)
[04/06/2023-11:00:56] [W] [TRT] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
[04/06/2023-11:00:56] [I] [TRT] ----------------------------------------------------------------
[04/06/2023-11:00:56] [I] [TRT] Input filename:   ../../../data/mnist/mnist.onnx
[04/06/2023-11:00:56] [I] [TRT] ONNX IR version:  0.0.3
[04/06/2023-11:00:56] [I] [TRT] Opset version:    8
[04/06/2023-11:00:56] [I] [TRT] Producer name:    CNTK
[04/06/2023-11:00:56] [I] [TRT] Producer version: 2.5.1
[04/06/2023-11:00:56] [I] [TRT] Domain:           ai.cntk
[04/06/2023-11:00:56] [I] [TRT] Model version:    1
[04/06/2023-11:00:56] [I] [TRT] Doc string:       
[04/06/2023-11:00:56] [I] [TRT] ----------------------------------------------------------------
[04/06/2023-11:00:56] [W] [TRT] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/06/2023-11:00:56] [I] [TRT] Graph optimization time: 0.000296735 seconds.
[04/06/2023-11:00:56] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[04/06/2023-11:00:57] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[04/06/2023-11:00:57] [I] [TRT] Total Host Persistent Memory: 24224
[04/06/2023-11:00:57] [I] [TRT] Total Device Persistent Memory: 0
[04/06/2023-11:00:57] [I] [TRT] Total Scratch Memory: 0
[04/06/2023-11:00:57] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 4 MiB
[04/06/2023-11:00:57] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 6 steps to complete.
[04/06/2023-11:00:57] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 0.008767ms to assign 3 blocks to 6 nodes requiring 32256 bytes.
[04/06/2023-11:00:57] [I] [TRT] Total Activation Memory: 31744
[04/06/2023-11:00:57] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +4, now: CPU 0, GPU 4 (MiB)
[04/06/2023-11:00:57] [I] [TRT] Loaded engine size: 0 MiB
[04/06/2023-11:00:57] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[04/06/2023-11:00:57] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[04/06/2023-11:00:57] [W] [TRT] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
[04/06/2023-11:00:57] [I] Input:
[04/06/2023-11:00:57] [I] @@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@+ @@@@@@@@@@@@@@
@@@@@@@@@@@@. @@@@@@@@@@@@@@
@@@@@@@@@@@@- @@@@@@@@@@@@@@
@@@@@@@@@@@#  @@@@@@@@@@@@@@
@@@@@@@@@@@#  *@@@@@@@@@@@@@
@@@@@@@@@@@@  :@@@@@@@@@@@@@
@@@@@@@@@@@@= .@@@@@@@@@@@@@
@@@@@@@@@@@@#  %@@@@@@@@@@@@
@@@@@@@@@@@@% .@@@@@@@@@@@@@
@@@@@@@@@@@@%  %@@@@@@@@@@@@
@@@@@@@@@@@@%  %@@@@@@@@@@@@
@@@@@@@@@@@@@= +@@@@@@@@@@@@
@@@@@@@@@@@@@* -@@@@@@@@@@@@
@@@@@@@@@@@@@*  @@@@@@@@@@@@
@@@@@@@@@@@@@@  @@@@@@@@@@@@
@@@@@@@@@@@@@@  *@@@@@@@@@@@
@@@@@@@@@@@@@@  *@@@@@@@@@@@
@@@@@@@@@@@@@@  *@@@@@@@@@@@
@@@@@@@@@@@@@@  *@@@@@@@@@@@
@@@@@@@@@@@@@@* @@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@

[04/06/2023-11:00:57] [I] Output:
[04/06/2023-11:00:57] [I]  Prob 0  0.0000 Class 0: 
[04/06/2023-11:00:57] [I]  Prob 1  1.0000 Class 1: **********
[04/06/2023-11:00:57] [I]  Prob 2  0.0000 Class 2: 
[04/06/2023-11:00:57] [I]  Prob 3  0.0000 Class 3: 
[04/06/2023-11:00:57] [I]  Prob 4  0.0000 Class 4: 
[04/06/2023-11:00:57] [I]  Prob 5  0.0000 Class 5: 
[04/06/2023-11:00:57] [I]  Prob 6  0.0000 Class 6: 
[04/06/2023-11:00:57] [I]  Prob 7  0.0000 Class 7: 
[04/06/2023-11:00:57] [I]  Prob 8  0.0000 Class 8: 
[04/06/2023-11:00:57] [I]  Prob 9  0.0000 Class 9: 
[04/06/2023-11:00:57] [I] 
&&&& PASSED TensorRT.sample_onnx_mnist [TensorRT v8600] # ./sample_onnx_mnist