Ubuntu20.04 RTX4060 AI环境搭建

下面记录在Ubuntu20.04环境下,使用ASUS ATS-RTX4060-O8G-V2显卡,搭建Nvidia TensorRT开发环境。

1.安装步骤

0)准备工作

     使用如下命令创建我们的工作目录:

mkdir ~/nvidia

     再使用如下命令进入到上面的目录(接下来的步骤,如无特殊说明,均在该目录下进行):

cd ~/nvidia

1)安装CUDA

     下载并安装NVIDIA CUDA Toolkit:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.5.0/local_installers/cuda-repo-ubuntu2004-12-5-local_12.5.0-555.42.02-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2004-12-5-local_12.5.0-555.42.02-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2004-12-5-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-5
sudo apt-get install -y nvidia-driver-555-open
sudo apt-get install -y cuda-drivers-555

2)安装cuDNN

     下载并安装cuDNN:

wget https://developer.download.nvidia.com/compute/cudnn/9.2.0/local_installers/cudnn-local-repo-ubuntu2004-9.2.0_1.0-1_amd64.deb
sudo dpkg -i cudnn-local-repo-ubuntu2004-9.2.0_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-ubuntu2004-9.2.0/cudnn-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cudnn9
sudo apt-get -y install libcudnn9-cuda-12

3)安装TensorRT

     下载并安装TensorRT:

wget -c https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.0.1/local_repo/nv-tensorrt-local-repo-ubuntu2004-10.0.1-cuda-12.4_1.0-1_amd64.deb
sudo dpkg -i nv-tensorrt-local-repo-ubuntu2004-10.0.1-cuda-12.4_1.0-1_amd64.deb
sudo cp /var/nv-tensorrt-local-repo-ubuntu2004-10.0.1-cuda-12.4/nv-tensorrt-local-4BE0C9B6-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get install tensorrt

2.运行例子

     上述环境安装完成后,在/usr/src/tensorrt/samples/目录下为C++例子源码,在/usr/src/tensorrt/samples/python/目录下为Python例子源码,我们执行下面命令,将例程代码复制到我们的工作目录下(后面相关验证在该目录下的代码下操作):

cd ~/nvidia
cp -arf /usr/src/tensorrt/samples/ tensorrt_samples
cp -arf /usr/src/tensorrt/data/ ~/nvidia/
cd tensorrt_samples

1).C++例子

     我们在上面的例子目录下找一个例子进行编译测试(其他例子编译、执行方式一样):

cd sampleOnnxMNIST
make
cd ../../bin
./sample_onnx_mnist

执行后有如下打印信息:

stxinu@tsi:~/nvidia/bin$ ./sample_onnx_mnist
&&&& RUNNING TensorRT.sample_onnx_mnist [TensorRT v100001] # ./sample_onnx_mnist
[06/15/2024-17:18:08] [I] Building and running a GPU inference engine for Onnx MNIST
[06/15/2024-17:18:08] [I] [TRT] [MemUsageChange] Init CUDA: CPU +17, GPU +0, now: CPU 19, GPU 101 (MiB)
[06/15/2024-17:18:09] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +1772, GPU +314, now: CPU 1927, GPU 415 (MiB)
[06/15/2024-17:18:09] [I] [TRT] ----------------------------------------------------------------
[06/15/2024-17:18:09] [I] [TRT] Input filename:   ../data/mnist/mnist.onnx
[06/15/2024-17:18:09] [I] [TRT] ONNX IR version:  0.0.3
[06/15/2024-17:18:09] [I] [TRT] Opset version:    8
[06/15/2024-17:18:09] [I] [TRT] Producer name:    CNTK
[06/15/2024-17:18:09] [I] [TRT] Producer version: 2.5.1
[06/15/2024-17:18:09] [I] [TRT] Domain:           ai.cntk
[06/15/2024-17:18:09] [I] [TRT] Model version:    1
[06/15/2024-17:18:09] [I] [TRT] Doc string:
[06/15/2024-17:18:09] [I] [TRT] ----------------------------------------------------------------
[06/15/2024-17:18:09] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[06/15/2024-17:18:10] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[06/15/2024-17:18:10] [I] [TRT] Total Host Persistent Memory: 26272
[06/15/2024-17:18:10] [I] [TRT] Total Device Persistent Memory: 0
[06/15/2024-17:18:10] [I] [TRT] Total Scratch Memory: 0
[06/15/2024-17:18:10] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 6 steps to complete.
[06/15/2024-17:18:10] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 0.011426ms to assign 3 blocks to 6 nodes requiring 32256 bytes.
[06/15/2024-17:18:10] [I] [TRT] Total Activation Memory: 31744
[06/15/2024-17:18:10] [I] [TRT] Total Weights Memory: 26152
[06/15/2024-17:18:10] [I] [TRT] Engine generation completed in 1.3149 seconds.
[06/15/2024-17:18:10] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 5 MiB
[06/15/2024-17:18:10] [I] [TRT] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 3046 MiB
[06/15/2024-17:18:10] [I] [TRT] Loaded engine size: 0 MiB
[06/15/2024-17:18:11] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[06/15/2024-17:18:11] [I] Input:
[06/15/2024-17:18:11] [I] @@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@#-:.-=@@@@@@@@@@@@@@
@@@@@%=     . *@@@@@@@@@@@@@
@@@@%  .:+%%% *@@@@@@@@@@@@@
@@@@+=#@@@@@# @@@@@@@@@@@@@@
@@@@@@@@@@@%  @@@@@@@@@@@@@@
@@@@@@@@@@@: *@@@@@@@@@@@@@@
@@@@@@@@@@- .@@@@@@@@@@@@@@@
@@@@@@@@@:  #@@@@@@@@@@@@@@@
@@@@@@@@:   +*%#@@@@@@@@@@@@
@@@@@@@%         :+*@@@@@@@@
@@@@@@@@#*+--.::     +@@@@@@
@@@@@@@@@@@@@@@@#=:.  +@@@@@
@@@@@@@@@@@@@@@@@@@@  .@@@@@
@@@@@@@@@@@@@@@@@@@@#. #@@@@
@@@@@@@@@@@@@@@@@@@@#  @@@@@
@@@@@@@@@%@@@@@@@@@@- +@@@@@
@@@@@@@@#-@@@@@@@@*. =@@@@@@
@@@@@@@@ .+%%%%+=.  =@@@@@@@
@@@@@@@@           =@@@@@@@@
@@@@@@@@*=:   :--*@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 
[06/15/2024-17:18:11] [I] Output:
[06/15/2024-17:18:11] [I]  Prob 0  0.0000 Class 0:
[06/15/2024-17:18:11] [I]  Prob 1  0.0000 Class 1:
[06/15/2024-17:18:11] [I]  Prob 2  0.0000 Class 2:
[06/15/2024-17:18:11] [I]  Prob 3  1.0000 Class 3: **********
[06/15/2024-17:18:11] [I]  Prob 4  0.0000 Class 4:
[06/15/2024-17:18:11] [I]  Prob 5  0.0000 Class 5:
[06/15/2024-17:18:11] [I]  Prob 6  0.0000 Class 6:
[06/15/2024-17:18:11] [I]  Prob 7  0.0000 Class 7:
[06/15/2024-17:18:11] [I]  Prob 8  0.0000 Class 8:
[06/15/2024-17:18:11] [I]  Prob 9  0.0000 Class 9:
[06/15/2024-17:18:11] [I]
&&&& PASSED TensorRT.sample_onnx_mnist [TensorRT v100001] # ./sample_onnx_mnist

2).Python例子

    同样的,我们也在Python的例子里挑一个来测试,命令如下:

    首先,安装所有Python例子依赖的工具:

sudo apt install python3-pip
cd ~/nvidia/tensorrt_samples/python
python3 -m pip install -r requirements.txt

    接下来,安装相关例子的依赖并执行:

cd ~/nvidia/tensorrt_samples/python/network_api_pytorch_mnist
pip3 install --upgrade pip
pip3 install -r requirements.txt
python3 sample.py

     执行后的打印信息如下:

stxinu@tsi:~/nvidia/tensorrt_samples/python/network_api_pytorch_mnist$ python3 sample.py
Train Epoch: 1 [0/60000 (0%)]   Loss: 2.309925
Train Epoch: 1 [6400/60000 (11%)]       Loss: 0.473879
Train Epoch: 1 [12800/60000 (21%)]      Loss: 0.218094
Train Epoch: 1 [19200/60000 (32%)]      Loss: 0.197866
Train Epoch: 1 [25600/60000 (43%)]      Loss: 0.323297
Train Epoch: 1 [32000/60000 (53%)]      Loss: 0.135932
Train Epoch: 1 [38400/60000 (64%)]      Loss: 0.093019
Train Epoch: 1 [44800/60000 (75%)]      Loss: 0.134863
Train Epoch: 1 [51200/60000 (85%)]      Loss: 0.140987
Train Epoch: 1 [57600/60000 (96%)]      Loss: 0.140408
 
Test set: Average loss: 0.0891, Accuracy: 9714/10000 (97%)
 
Train Epoch: 2 [0/60000 (0%)]   Loss: 0.050104
Train Epoch: 2 [6400/60000 (11%)]       Loss: 0.042686
Train Epoch: 2 [12800/60000 (21%)]      Loss: 0.035165
Train Epoch: 2 [19200/60000 (32%)]      Loss: 0.070323
Train Epoch: 2 [25600/60000 (43%)]      Loss: 0.088045
Train Epoch: 2 [32000/60000 (53%)]      Loss: 0.140878
Train Epoch: 2 [38400/60000 (64%)]      Loss: 0.012205
Train Epoch: 2 [44800/60000 (75%)]      Loss: 0.012712
Train Epoch: 2 [51200/60000 (85%)]      Loss: 0.019727
Train Epoch: 2 [57600/60000 (96%)]      Loss: 0.054260
 
Test set: Average loss: 0.0654, Accuracy: 9793/10000 (98%)
 
Test Case: 7
Prediction: 7

  • 15
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

stxinu

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值