1、安装显卡驱动编译工具
yum install gcc make kernel-devel
2、安装显卡驱动依赖包
yum install vulkan-loader
可选安装项,不安装该系统包时会出现以下警告提示,但不影响安装和使用。
3、安装 NVIDIA GPU 驱动
生产环境建议选择 .run 格式的驱动安装包。从官方NVIDIA 显卡驱动下载地址下载驱动 NVIDIA-Linux-x86_64-550.54.15.run,并上传到每个 GPU 节点。
-
下载选项
-
550 版驱动支持显卡列表
chmod u+x NVIDIA-Linux-x86_64-550.54.15.run
./NVIDIA-Linux-x86_64-550.54.15.run
初次执行,请按提示操作,然后重启服务器。
安装过程大部分截图如下:
选择 Abort installation,然后重启服务器。
服务器重启完成后,再次执行安装命令,会自动执行构建、安装的任务(截图不全)。
建议驱动安装完成后,再次重启服务器。
4、验证显卡驱动
-
执行下面的命令
nvidia-smi
Tesla M40 节点,正确执行后,输出结果如下:
$ nvidia-smi
Thu May 19 08:59:57 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla M40 24GB Off | 00000000:00:10.0 Off | 0 |
| N/A 37C P0 65W / 250W | 0MiB / 23040MiB | 100% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Tesla P100 节点,正确执行后,输出结果如下:
$ nvidia-smi
Thu May 19 09:19:19 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla P100-PCIE-16GB Off | 00000000:00:10.0 Off | 0 |
| N/A 40C P0 31W / 250W | 0MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+