环境 Rocky Linux 8.5,在官网下载镜像制作启动盘进行系统安装即可。(承接上一篇文章:磁盘挂载与gcc9.3安装 cat /etc/redhat-release)
目录
一、NVIDIA460.84驱动安装
1、禁用nouveau驱动
输入以下 命令进行查看,应该是有回显出现的。如果没有回显出现,那么你可以省略此步骤。
lsmod | grep nouveau
在/etc/modprobe.d/blacklist.conf 中添加nouveau 到黑名单。
vim /etc/modprobe.d/blacklist.conf
在里面添加:
blacklist nouveau
options nouveau modeset=0
保存退出
dracut --force //Linux更新内核
或者备份并更新内核
//重新建立initramfs image文件(生成新的内核,这个内核在开机的时候不会加载nouveau驱动程序)
mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
mv /boot/initramfs-$(uname -r).img.bak /home/initramfs-$(uname -r).img.bak
dracut /boot/initramfs-$(uname -r).img $(uname -r)
//切换到命令行界面
systemctl set-default multi-user.target
//切回图像界面(systemctl用法:开机启动服务 systemctl enable ***.service)
//systemctl set-default graphical.target
修改后需要重启系统。确认下Nouveau是已经被你干掉,使用命令: lsmod | grep nouveau
2、安装显卡驱动
查看显卡型号
lshw -c video
yum install epel-release #安装epel源
yum -y install gcc kernel-devel dkms
//yum -y install gcc kernel-devel "kernel-devel-uname-r == $(uname -r)" dkms
yum install libglvnd-devel.x86_64
已安装:
dkms-3.0.3-1.el8.noarch
elfutils-libelf-devel-0.185-1.el8.x86_64
kernel-devel-4.18.0-348.20.1.el8_5.x86_64
安装显卡驱动 :
./NVIDIA-Linux-x86_64-460.84.run
1. Would you like to register the kernel module souces with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel later? 【No】
2. Nvidia’s 32-bit compatibility libraries? 【No】
3. Would you like to run the nvidia-xconfigutility to automatically update your x configuration so that the NVIDIA x driver will be used when you restart x? Any pre-existing x confile will be backed up. 【Yes】
验证是否安装成功
nvidia-smi
Mon Apr 18 15:19:50 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.84 Driver Version: 460.84 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 106... Off | 00000000:03:00.0 Off | N/A |
| 28% 34C P0 28W / 120W | 0MiB / 6078MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 106... Off | 00000000:82:00.0 Off | N/A |
| 39% 39C P0 30W / 120W | 0MiB / 6078MiB | 1% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
二、CUDA安装
1、安装cuda_11.2.0
./cuda_11.2.0_460.27.04_linux.run
系统安装时,是UEFI模式启动的,则在BIOS中需禁用Security BOOT选项。
踩坑之神:安装失败,查看失败原因
cat /var/log/cuda-installer.log
[INFO]: Driver not installed.
[INFO]: Checking compiler version...
[INFO]: gcc location: /usr/local/bin/gcc
[INFO]: gcc version: gcc 版本 9.3.0 (GCC)
[INFO]: Initializing menu
[INFO]: Setup complete
[INFO]: Components to install:
[INFO]: Driver
[INFO]: 460.27.04
[INFO]: Executing NVIDIA-Linux-x86_64-460.27.04.run --ui=none --no-questions --accept-license --disable-nouveau --no-cc-version-check --install-libglvnd 2>&1
[INFO]: Finished with code: 256
[ERROR]: Install of driver component failed.
[ERROR]: Install of 460.27.04 failed, quitting
cat /var/log/nvidia-installer.log
ERROR: The nvidia-drm kernel module was not created.
ERROR: The nvidia-drm kernel module failed to build. This kernel module is required for the proper operation of DRM-KMS. If you do not need to use DRM-KMS, you can try to install this driver package again with the '--no-drm' option.
ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.
uname -r
ll /usr/src/kernels/
如查询的结果不一致,则解决办法,升级内核
yum -y update
问题解决:安装时去掉Driver选项,因为刚才已经单独安装过了。
./cuda_11.2.0_460.27.04_linux.run
===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-11.2/
Samples: Installed in /home/hhs-face/, but missing recommended libraries
Please make sure that
- PATH includes /usr/local/cuda-11.2/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-11.2/lib64, or, add /usr/local/cuda-11.2/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.2/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 460.00 is required for CUDA 11.2 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run --silent --driver
配置环境变量
vim /etc/profile
//在末尾添加:
export PATH=/usr/local/cuda-11.2/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.2/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}
//保存退出,立即生效
source /etc/profile
重启、查看版本,验证安装成功
reboot
nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Nov_30_19:08:53_PST_2020
Cuda compilation tools, release 11.2, V11.2.67
Build cuda_11.2.r11.2/compiler.29373293_0
2、安装cudnn8.1.1.33
chmod +x cudnn-11.2-linux-x64-v8.1.1.33.tgz
tar -zxvf cudnn-11.2-linux-x64-v8.1.1.33.tgz
//将解压后得到的文件夹,分别复制到cuda安装路径下与cuda的bin ,include 和lib文件夹合并。
cp cuda/include/cudnn.h /usr/local/cuda-11.2/include
cp cuda/include/cudnn_version.h /usr/local/cuda-11.2/include
cp cuda/lib64/libcudnn* /usr/local/cuda-11.2/lib64
chmod a+r /usr/local/cuda-11.2/include/cudnn.h /usr/local/cuda-11.2/lib64/libcudnn*
//检验证CUDNN是否安装成功, 检查CUDNN版本,这里的版本是8.1.1。
cat /usr/local/cuda-11.2/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 8
#define CUDNN_MINOR 1
#define CUDNN_PATCHLEVEL 1
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)