Python微信订餐小程序课程视频
https://edu.csdn.net/course/detail/36074
Python实战量化交易理财系统
https://edu.csdn.net/course/detail/35475
TensorRT 是 NVIDIA 自家的高性能推理库,其 Getting Started 列出了各资料入口,如下:
本文基于当前的 TensorRT 8.2 版本,将一步步介绍从安装,直到加速推理自己的 ONNX 模型。
安装
进 TensorRT 下载页 选择版本下载,需注册登录。
本文选择了 TensorRT-8.2.2.1.Linux.x86_64-gnu.cuda-11.4.cudnn8.2.tar.gz
,可以注意到与 CUDA cuDNN 要匹配好版本。也可以准备 NVIDIA Docker 拉取对应版本的 nvidia/cuda 镜像,再 ADD
TensorRT
即可。
# 解压进 $HOME (以免 sudo 编译样例,为当前用户)
tar -xzvf TensorRT-*.tar.gz -C $HOME/
# 软链到 /usr/local/TensorRT (以固定一个路径)
sudo ln -s $HOME/TensorRT-8.2.2.1 /usr/local/TensorRT
之后,编译运行样例,保证 TensorRT 安装正确。
编译样例
样例在 TensorRT/samples
,说明见 Sample Support Guide 或各样例目录里的 README.md
。
cd /usr/local/TensorRT/samples/
# 设定环境变量,可见 Makefile.config
export CUDA_INSTALL_DIR=/usr/local/cuda
export CUDNN_INSTALL_DIR=/usr/local/cuda
export ENABLE_DLA=
export TRT_LIB_DIR=../lib
export PROTOBUF_INSTALL_DIR=
# 编译
make -j`nproc`
# 运行
export LD_LIBRARY_PATH=/usr/local/TensorRT/lib:$LD\_LIBRARY\_PATH
cd /usr/local/TensorRT/
./bin/trtexec -h
./bin/sample_mnist -d data/mnist/ --fp16
运行结果参考:
$ ./bin/sample_mnist -d data/mnist/ --fp16
&&&& RUNNING TensorRT.sample_mnist [TensorRT v8202] # ./bin/sample\_mnist -d data/mnist/ --fp16
[12/23/2021-20:20:16] [I] Building and running a GPU inference engine for MNIST
[12/23/2021-20:20:16] [I] [TRT] [MemUsageChange] Init CUDA: CPU +322, GPU +0, now: CPU 333, GPU 600 (MiB)
[12/23/2021-20:20:16] [I] [TRT] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 333 MiB, GPU 600 MiB
[12/23/2021-20:20:16] [I] [TRT] [MemUsageSnapshot] End constructing builder kernel library: CPU 468 MiB, GPU 634 MiB
[12/23/2021-20:20:17] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +518, GPU +224, now: CPU 988, GPU 858 (MiB)
[12/23/2021-20:20:17] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +114, GPU +52, now: CPU 1102, GPU 910 (MiB)
[12/23/2021-20:20:17] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[12/23/2021-20:20:33] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[12/23/2021-20:20:34] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[12/23/2021-20:20:34] [I] [TRT] Total Host Persistent Memory: 8448
[12/23/2021-20:20:34] [I] [TRT] Total Device Persistent Memory: 1626624
[12/23/2021-20:20:34] [I] [TRT] Total Scratch Memory: 0
[12/23/2021-20:20:34] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 2 MiB, GPU 13 MiB
[12/23/2021-20:20:34] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 0.01595ms to assign 3 blocks to 8 nodes requiring 57857 bytes.
[12/23/2021-20:20:34] [I] [TRT] Total Activation Memory: 57857
[12/23/2021-20:20:34] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1621, GPU 1116 (MiB)
[12/23/2021-20:20:34] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1621, GPU 1124 (MiB)
[12/23/2021-20:20:34] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +4, now: CPU 0, GPU 4 (MiB)
[12/23/2021-20:20:34] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 1622, GPU 1086 (MiB)
[12/23/2021-20:20:34] [I] [TRT] Loaded engine size: 1 MiB
[12/23/2021-20:20:34] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1622, GPU 1096 (MiB)
[12/23/2021-20:20:34] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +8, now: CPU 1623, GPU 1104 (MiB)
[12/23/2021-20:20:34] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +1, now: CPU 0, GPU 1 (MiB)
[12/23/2021-20:20:34] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1485, GPU 1080 (MiB)
[12/23/2021-20:20:34] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1485, GPU 1088 (MiB)
[12/23/2021-20:20:34] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +2, now: CPU 0, GPU 3 (MiB)
[12/23/2021-20:20:34] [I] Input:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@