安装过程趟了很多坑,记录一下。
硬件:
Tesla T4
系统:
Ubuntu18.04 华为服务器
目标:
安装cuda11.1+cudnn8.0
1、驱动安装
ubuntu-driver devices 查看合适的驱动
sudo apt install nvidia-driver-450-server
驱动安装成功!
2、安装cuda
离线下载CUDA11.1.0
参考:
https://cyfeng.science/2020/05/02/ubuntu-install-nvidia-driver-cuda-cudnn-suits/
https://blog.csdn.net/sinat_36721621/article/details/115326307?utm_medium=distribute.pc_relevant.none-task-blog-baidujs_title-0&spm=1001.2101.3001.4242
https://blog.csdn.net/qq_42167046/article/details/113246994
sudo sh cuda…run
选择“continue”
勾去掉驱动选项。
配置环境变量后
gedit ~/.bashrc
export PATH=/usr/local/cuda-11.1/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
source ~/.bashrc
验证是否安装成功
nvcc -V
3、安装cudnn
(1)将cuda/include/cudnn.h文件复制到usr/local/cuda/include文件夹,将cuda/lib64/下所有文件复制到/usr/local/cuda/lib64文件夹中,并添加读取权限:
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
–然后更改权限
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
接下来安装Deb包, cuDNN Runtime Library for Ubuntu18.04(Deb),cuDNN Developer Library for Ubuntu18.04(Deb),cuDNN Code Samples and User Guide for Ubuntu18.04(Deb)
sudo dpkg -i libcudnn8_8.0.5.39-1+cuda11.0_amd64.deb
sudo dpkg -i libcudnn8-dev_8.0.5.39-1+cuda11.0_amd64.deb
sudo dpkg -i libcudnn8-samples_8.0.5.39-1+cuda11.0_amd64.deb
测试是否安装成功,分别输入以下四个命令:
cp -r /usr/src/cudnn_samples_v8/ ~
cd ~/cudnn_samples_v8/mnistCUDNN/
make clean && make
./mnistCUDNN
4、安装tensorRT
#解压
tar xzvf TensorRT-${version}.${os}.${arch}-gnu.${cuda}.${cudnn}.tar.gz
设置环境变量
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<TensorRT-${version}/lib>
cd python
sudo pip3 install tensorrt-*-cp3x-none-linux_x86_64.whl
cd uff
sudo pip3 install uff-0.6.9-py2.py3-none-any.whl
cd graphsurgeon
sudo pip2 install graphsurgeon-0.4.5-py2.py3-none-any.whl
cd onnx-graphsurgeon
sudo pip3 install onnx_graphsurgeon-0.2.6-py2.py3-none-any.whl
第8步安装onnx
各种报错!!!崩溃!!!
解决办法(WTF!!!)
sudo apt-get install libprotobuf-dev protobuf-compiler
问题:
安装好后,验证tensorRT。
编译好sample_mnist后
执行
./sample_mnist
cudaErrorUnsupportedPtxVersion = 222
这表明提供的PTX是使用不受支持的工具链编译的。最常见的原因是PTX是由比CUDA驱动程序和PTX JIT编译器支持的编译器更新的编译器生成的。
解决:
当前驱动450.102.04升级为460.73.01
再执行
./sample_mnist
折腾两天,成功!!!