1、k80不带输出接口,我的电脑里另外又放了一张显卡,先查看GPU信息
运行lspci | grep -i nvidia
我得到如下结果
03:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
04:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
05:00.0 VGA compatible controller: NVIDIA Corporation GM206 [GeForce GTX 950] (rev a1)
05:00.1 Audio device: NVIDIA Corporation Device 0fba (rev a1)
2、下载cuda_11.4.2_470.57.02_linux.run,也就是11.4.2的版本
3、上cuda官网找到对应的版本,进入查看安装方法上面有介绍系统及各种工具的版本要求,按照要求安装或更改相应的版本, cudnn download.
4、查看cuda和cudnn版本的对应关系,下载对应的版本
5、参考链接
参考链接cuda11+ubuntu18.04
6、检查是否安装成功
7、cuda、python、tensorflow版本对应关系
8、windows下安装cuda
当安装多个版本的cuda时,建议环境变量写成下面这样,如此便不需要调整顺序了:
%CUDA_PATH%\bin;%CUDA_PATH%\lib;%CUDA_PATH%\libnvvp;%CUDA_PATH%\lib\x64;%CUDA_PATH%\include;
只需要更改CUDA_PATH便能更换cuda版本
10、官方tensorflow和cuda、python对应关系
问题解决
1)LINK问题
/sbin/ldconfig.real: /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_train.so.8 is not a symbolic link
/sbin/ldconfig.real: /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8 is not a symbolic link
/sbin/ldconfig.real: /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8 is not a symbolic link
/sbin/ldconfig.real: /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8 is not a symbolic link
/sbin/ldconfig.real: /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_train.so.8 is not a symbolic link
/sbin/ldconfig.real: /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn.so.8 is not a symbolic link
/sbin/ldconfig.real: /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8 is not a symbolic link
方案
sudo ln -sf /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.1 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_train.so.8
sudo ln -sf /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.1 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8
sudo ln -sf /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.1 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8
sudo ln -sf /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.1 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8
sudo ln -sf /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.1 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_train.so.8
sudo ln -sf /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.1 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn.so.8
sudo ln -sf /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.1 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8
12、ubuntu下安装多个版本的cuda切换
查看已安装版本
查看cuda指向版本,在/usr/local/下使用stat cuda或直接ll
删除链接并重新指向新的版本
13. install cudnn after installing cuda
sudo cp cuda/lib64/* /usr/local/cuda-11.2/lib64/
sudo cp cuda/include/* /usr/local/cuda-11.2/include/
附录:
1、cuda安装失败参考我安装时cuda选项取消了visual studio安装成功,虽然我安装了对应版本的visual studio,但取消之前一直无法安装成功。
2、ubuntu18.04安装报错自带diriver报错
3、ubuntu单独安装驱动
4、nvidia-smi错误解决
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running
重装过,比较可信的说法是内和更新导致的,重启计算机选择之前的内核。
内核安装和更换,另外参考使用dkms安装的方法,使用下面的命令查看设备版本
ls -l /usr/src/
后更换版本安装,不重启一样的结果,重启正常。未测试原版本重装重启是否可行性。
sudo apt-get install -y gcc-5
sudo apt-get install -y g++-5
cd /usr/bin
sudo rm gcc
sudo ln -s gcc-5 gcc
sudo rm g++
sudo ln -s g++-5 g++
export CUDA_HOME=/usr/local/cuda-11.2
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64
export PATH=${CUDA_HOME}/bin:${PATH}
下面这个方法可供借鉴(以在ubuntu16.04上安装cuda8为例,其他版本类似)
#!/bin/bash
# install CUDA Toolkit v8.0
# instructions from https://developer.nvidia.com/cuda-downloads (linux -> x86_64 -> Ubuntu -> 16.04 -> deb (network))
CUDA_REPO_PKG="cuda-repo-ubuntu1604_8.0.61-1_amd64.deb"
wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/${CUDA_REPO_PKG}
sudo dpkg -i ${CUDA_REPO_PKG}
sudo apt-get update
sudo apt-get -y install cuda
# install cuDNN v6.0
CUDNN_TAR_FILE="cudnn-8.0-linux-x64-v6.0.tgz"
wget http://developer.download.nvidia.com/compute/redist/cudnn/v6.0/${CUDNN_TAR_FILE}
tar -xzvf ${CUDNN_TAR_FILE}
sudo cp -P cuda/include/cudnn.h /usr/local/cuda-8.0/include
sudo cp -P cuda/lib64/libcudnn* /usr/local/cuda-8.0/lib64/
sudo chmod a+r /usr/local/cuda-8.0/lib64/libcudnn*
# set environment variables
export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}