Linux 服务器 cuda安装
1. 驱动安装
写在前面:
我后续尝试使用cuda里面直接安装driver,但是最终都会失败install of driver component failed,
尝试网上说的关闭nouveau的方法无果,最后还是选择单独安装driver。
驱动下载地址:
https://www.nvidia.com/Download/index.aspx
安装方法:
sudo sh 文件名
遇到问题,X服务没关闭问题:
ERROR: You appear to be running an X server; please exit X before
installing. For further details, please see the section INSTALLING
THE NVIDIA DRIVER in the README available on the Linux driver
download page at www.nvidia.com.
解决办法:
在上面的安装指令后加 -no-x-check
仍然报错:
ERROR: Unable to find the development tool `cc` in your path; please make sure that you have the package 'gcc' installed. If gcc is installed on your system, then please check that `cc` is in your PATH.
没有gcc,安装
sudo yum install gcc
gcc --version #检查是否安装成功
仍然报错:
ERROR: Unable to find the kernel source tree for the currently running kernel. Please make sure you have installed the kernel source files for your kernel and that they are properly configured; on Red Hat
Linux systems, for example, be sure you have the 'kernel-source' or 'kernel-devel' RPM installed. If you know the correct kernel source files are installed, you may specify the kernel source path with
the '--kernel-source-path' command line option.
安装内核:
sudo yum install kernel-devel-$(uname -r)
再次执行安装指令:
sudo sh NVIDIA-Linux-x86_64-535.129.03.run -no-x-check
#安装成功后使用
nvidia-smi检查是否成功
2. Cuda 安装
2.1 安装
cuda安装链接:
https://developer.nvidia.com/cuda-toolkit-archive
选择如下配置:
下载安装指令:
wget https://developer.download.nvidia.com/compute/cuda/12.2.2/local_installers/cuda_12.2.2_535.104.05_linux.run
sudo sh cuda_12.2.2_535.104.05_linux.run
安装失败:
Installation failed. See log at /var/log/cuda-installer.log for details.
[INFO]: Driver not installed.
[INFO]: Checking compiler version...
[INFO]: gcc location: /bin/gcc
[INFO]: gcc version: gcc 版本 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC)
[INFO]: Initializing menu
[INFO]: nvidia-fs.setKOVersion(2.17.5)
[INFO]: Setup complete
[INFO]: Installing: Driver
[INFO]: Installing: 535.104.05
[INFO]: Executing NVIDIA-Linux-x86_64-535.104.05.run --ui=none --no-questions --accept-license --disable-nouveau --no-cc-version-check --install-libglvnd 2>&1
[INFO]: Finished with code: 256
[ERROR]: Install of driver component failed. Consult the driver log at /var/log/nvidia-installer.log for more details.
[ERROR]: Install of 535.104.05 failed, quitting
原因是我们之前已经安装过显卡驱动了,因此在如下界面时要将Driver中的X按一下回车去掉,否则安装失败
显示下面样例就是安装成功了:
2.2 配置环境变量
配置命令:
vim ~/.bashrc
export CUDA_HOME=/usr/local/cuda-12.2# 在上图中有显示位置
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64
export PATH=${CUDA_HOME}/bin:${PATH}
source ~/.bashrc
nvcc -V
3. cuDNN安装
地址:
https://developer.nvidia.com/rdp/cudnn-archive
选择对应的cuda版本,下载可能很慢,需要一些方法
xz -d cudnn-linux-x86_64-8.9.5.30_cuda12-archive.tar.xz
tar xvf cudnn-linux-x86_64-8.9.5.30_cuda12-archive.tar
# 将文件mv一下
mv /root/cu122/cudnn-linux-x86_64-8.9.5.30_cuda12-archive/include/* /usr/local/cuda-12.2/include/
mv /root/cu122/cudnn-linux-x86_64-8.9.5.30_cuda12-archive/lib/* /usr/local/cuda-12.2/lib64/
#查看cudnn版本:
cat /usr/local/cuda/include/cudnn_version.h