下面记录在Ubuntu20.04环境下,使用ASUS ATS-RTX4060-O8G-V2显卡,搭建Nvidia TensorRT开发环境。
1.安装步骤
0)准备工作
使用如下命令创建我们的工作目录:
mkdir ~/nvidia
再使用如下命令进入到上面的目录(接下来的步骤,如无特殊说明,均在该目录下进行):
cd ~/nvidia
1)安装CUDA
下载并安装NVIDIA CUDA Toolkit:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.5.0/local_installers/cuda-repo-ubuntu2004-12-5-local_12.5.0-555.42.02-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2004-12-5-local_12.5.0-555.42.02-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2004-12-5-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-5
sudo apt-get install -y nvidia-driver-555-open
sudo apt-get install -y cuda-drivers-555
2)安装cuDNN
下载并安装cuDNN:
wget https://developer.download.nvidia.com/compute/cudnn/9.2.0/local_installers/cudnn-local-repo-ubuntu2004-9.2.0_1.0-1_amd64.deb
sudo dpkg -i cudnn-local-repo-ubuntu2004-9.2.0_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-ubuntu2004-9.2.0/cudnn-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cudnn9
sudo apt-get -y install libcudnn9-cuda-12
3)安装TensorRT
下载并安装TensorRT:
wget -c https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.0.1/local_repo/nv-tensorrt-local-repo-ubuntu2004-10.0.1-cuda-12.4_1.0-1_amd64.deb
sudo dpkg -i nv-tensorrt-local-repo-ubuntu2004-10.0.1-cuda-12.4_1.0-1_amd64.deb
sudo cp /var/nv-tensorrt-local-repo-ubuntu2004-10.0.1-cuda-12.4/nv-tensorrt-local-4BE0C9B6-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get install tensorrt
2.运行例子
上述环境安装完成后,在/usr/src/tensorrt/samples/目录下为C++例子源码,在/usr/src/tensorrt/samples/python/目录下为Python例子源码,我们执行下面命令,将例程代码复制到我们的工作目录下(后面相关验证在该目录下的代码下操作):
cd ~/nvidia
cp -arf /usr/src/tensorrt/samples/ tensorrt_samples
cp -arf /usr/src/tensorrt/data/ ~/nvidia/
cd tensorrt_samples
1).C++例子
我们在上面的例子目录下找一个例子进行编译测试(其他例子编译、执行方式一样):
cd sampleOnnxMNIST
make
cd ../../bin
./sample_onnx_mnist
执行后有如下打印信息:
stxinu@tsi:~/nvidia/bin$ ./sample_onnx_mnist
&&&& RUNNING TensorRT.sample_onnx_mnist [TensorRT v100001] # ./sample_onnx_mnist
[06/15/2024-17:18:08] [I] Building and running a GPU inference engine for Onnx MNIST
[06/15/2024-17:18:08] [I] [TRT] [MemUsageChange] Init CUDA: CPU +17, GPU +0, now: CPU 19, GPU 101 (MiB)
[06/15/2024-17:18:09] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +1772, GPU +314, now: CPU 1927, GPU 415 (MiB)
[06/15/2024-17:18:09] [I] [TRT] ----------------------------------------------------------------
[06/15/2024-17:18:09] [I] [TRT] Input filename: ../data/mnist/mnist.onnx
[06/15/2024-17:18:09] [I] [TRT] ONNX IR version: 0.0.3
[06/15/2024-17:18:09] [I] [TRT] Opset version: 8
[06/15/2024-17:18:09] [I] [TRT] Producer name: CNTK
[06/15/2024-17:18:09] [I] [TRT] Producer version: 2.5.1
[06/15/2024-17:18:09] [I] [TRT] Domain: ai.cntk
[06/15/2024-17:18:09] [I] [TRT] Model version: 1
[06/15/2024-17:18:09] [I] [TRT] Doc string:
[06/15/2024-17:18:09] [I] [TRT] ----------------------------------------------------------------
[06/15/2024-17:18:09] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[06/15/2024-17:18:10] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[06/15/2024-17:18:10] [I] [TRT] Total Host Persistent Memory: 26272
[06/15/2024-17:18:10] [I] [TRT] Total Device Persistent Memory: 0
[06/15/2024-17:18:10] [I] [TRT] Total Scratch Memory: 0
[06/15/2024-17:18:10] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 6 steps to complete.
[06/15/2024-17:18:10] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 0.011426ms to assign 3 blocks to 6 nodes requiring 32256 bytes.
[06/15/2024-17:18:10] [I] [TRT] Total Activation Memory: 31744
[06/15/2024-17:18:10] [I] [TRT] Total Weights Memory: 26152
[06/15/2024-17:18:10] [I] [TRT] Engine generation completed in 1.3149 seconds.
[06/15/2024-17:18:10] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 5 MiB
[06/15/2024-17:18:10] [I] [TRT] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 3046 MiB
[06/15/2024-17:18:10] [I] [TRT] Loaded engine size: 0 MiB
[06/15/2024-17:18:11] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[06/15/2024-17:18:11] [I] Input:
[06/15/2024-17:18:11] [I] @@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@#-:.-=@@@@@@@@@@@@@@
@@@@@%= . *@@@@@@@@@@@@@
@@@@% .:+%%% *@@@@@@@@@@@@@
@@@@+=#@@@@@# @@@@@@@@@@@@@@
@@@@@@@@@@@% @@@@@@@@@@@@@@
@@@@@@@@@@@: *@@@@@@@@@@@@@@
@@@@@@@@@@- .@@@@@@@@@@@@@@@
@@@@@@@@@: #@@@@@@@@@@@@@@@
@@@@@@@@: +*%#@@@@@@@@@@@@
@@@@@@@% :+*@@@@@@@@
@@@@@@@@#*+--.:: +@@@@@@
@@@@@@@@@@@@@@@@#=:. +@@@@@
@@@@@@@@@@@@@@@@@@@@ .@@@@@
@@@@@@@@@@@@@@@@@@@@#. #@@@@
@@@@@@@@@@@@@@@@@@@@# @@@@@
@@@@@@@@@%@@@@@@@@@@- +@@@@@
@@@@@@@@#-@@@@@@@@*. =@@@@@@
@@@@@@@@ .+%%%%+=. =@@@@@@@
@@@@@@@@ =@@@@@@@@
@@@@@@@@*=: :--*@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[06/15/2024-17:18:11] [I] Output:
[06/15/2024-17:18:11] [I] Prob 0 0.0000 Class 0:
[06/15/2024-17:18:11] [I] Prob 1 0.0000 Class 1:
[06/15/2024-17:18:11] [I] Prob 2 0.0000 Class 2:
[06/15/2024-17:18:11] [I] Prob 3 1.0000 Class 3: **********
[06/15/2024-17:18:11] [I] Prob 4 0.0000 Class 4:
[06/15/2024-17:18:11] [I] Prob 5 0.0000 Class 5:
[06/15/2024-17:18:11] [I] Prob 6 0.0000 Class 6:
[06/15/2024-17:18:11] [I] Prob 7 0.0000 Class 7:
[06/15/2024-17:18:11] [I] Prob 8 0.0000 Class 8:
[06/15/2024-17:18:11] [I] Prob 9 0.0000 Class 9:
[06/15/2024-17:18:11] [I]
&&&& PASSED TensorRT.sample_onnx_mnist [TensorRT v100001] # ./sample_onnx_mnist
2).Python例子
同样的,我们也在Python的例子里挑一个来测试,命令如下:
首先,安装所有Python例子依赖的工具:
sudo apt install python3-pip
cd ~/nvidia/tensorrt_samples/python
python3 -m pip install -r requirements.txt
接下来,安装相关例子的依赖并执行:
cd ~/nvidia/tensorrt_samples/python/network_api_pytorch_mnist
pip3 install --upgrade pip
pip3 install -r requirements.txt
python3 sample.py
执行后的打印信息如下:
stxinu@tsi:~/nvidia/tensorrt_samples/python/network_api_pytorch_mnist$ python3 sample.py
Train Epoch: 1 [0/60000 (0%)] Loss: 2.309925
Train Epoch: 1 [6400/60000 (11%)] Loss: 0.473879
Train Epoch: 1 [12800/60000 (21%)] Loss: 0.218094
Train Epoch: 1 [19200/60000 (32%)] Loss: 0.197866
Train Epoch: 1 [25600/60000 (43%)] Loss: 0.323297
Train Epoch: 1 [32000/60000 (53%)] Loss: 0.135932
Train Epoch: 1 [38400/60000 (64%)] Loss: 0.093019
Train Epoch: 1 [44800/60000 (75%)] Loss: 0.134863
Train Epoch: 1 [51200/60000 (85%)] Loss: 0.140987
Train Epoch: 1 [57600/60000 (96%)] Loss: 0.140408
Test set: Average loss: 0.0891, Accuracy: 9714/10000 (97%)
Train Epoch: 2 [0/60000 (0%)] Loss: 0.050104
Train Epoch: 2 [6400/60000 (11%)] Loss: 0.042686
Train Epoch: 2 [12800/60000 (21%)] Loss: 0.035165
Train Epoch: 2 [19200/60000 (32%)] Loss: 0.070323
Train Epoch: 2 [25600/60000 (43%)] Loss: 0.088045
Train Epoch: 2 [32000/60000 (53%)] Loss: 0.140878
Train Epoch: 2 [38400/60000 (64%)] Loss: 0.012205
Train Epoch: 2 [44800/60000 (75%)] Loss: 0.012712
Train Epoch: 2 [51200/60000 (85%)] Loss: 0.019727
Train Epoch: 2 [57600/60000 (96%)] Loss: 0.054260
Test set: Average loss: 0.0654, Accuracy: 9793/10000 (98%)
Test Case: 7
Prediction: 7