查看显卡型号
lspci | grep -i vga
02:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti Rev. A] (rev a1)
05:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30)
82:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti Rev. A] (rev a1)
如果是最小化安装centos,则无lspci命令,安装 pciutils即可,
yum install pciutils
我这里是GeForce RTX 2080 Ti 型号的,在官网中找到对应的驱动下载下来, 官网地址:https://www.nvidia.cn/Download/index.aspx?lang=cn
安装依赖
yum -y install gcc dkms
更新内核(非必须)
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm
yum --disablerepo=\* --enablerepo=elrepo-kernel list kernel*
yum --disablerepo=\* --enablerepo=elrepo-kernel install -y kernel-lt.x86_64
yum --disablerepo=\* --enablerepo=elrepo-kernel install -y kernel-lt-tools.x86_64
yum --disablerepo=\* --enablerepo=elrepo-kernel install -y kernel-lt-devel.x86_64
设置新的内核为grub2的默认版本
grub2-set-default 0
阻止 nouveau 模块的加载
echo -e "blacklist nouveau\noptions nouveau modeset=0" > /etc/modprobe.d/blacklist.conf
重新建立initramfs image文件
mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
dracut /boot/initramfs-$(uname -r).img $(uname -r)
重启机器
reboot
给安装驱动脚本赋权
chmod u+x NVIDIA-Linux-x86_64-440.64.run
执行脚本
bash NVIDIA-Linux-x86_64-440.64.run
查看驱动是否安装成功
nvidia-smi
Fri Mar 13 06:27:59 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64 Driver Version: 440.64 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:02:00.0 Off | N/A |
| 17% 38C P0 63W / 250W | 0MiB / 11019MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 208... Off | 00000000:82:00.0 Off | N/A |
| 11% 39C P0 31W / 250W | 0MiB / 11019MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
期间遇到了如下错误
ERROR: An NVIDIA kernel module ‘nvidia-uvm’ appears to already be loaded in your kernel. This may be because it is in use (for example, by the X server), but may also happen if your kernel was configured
原因是我卸载nvidia驱动时没有停掉使用gpu的程序,通过 lsof | grep nvidia.uvm 查看那些进程使用了gpu,kill掉即可
完毕,,,
有问题加QQ群: 526855734