Centos 安装nvidia和cuda驱动

前期环境配置操作

查看NVIDIA

[root@localhost ~]# lspci -nn | grep NV
1a:00.0 3D controller [0302]: NVIDIA Corporation GP102GL [Tesla P40] [10de:1b38] (rev a1)
[root@localhost ~]# lshw -numeric -C display
  *-display                 
       description: VGA compatible controller
       product: ASPEED Graphics Family [1A03:2000]
       vendor: ASPEED Technology, Inc. [1A03]
       physical id: 0
       bus info: pci@0000:03:00.0
       version: 41
       width: 32 bits
       clock: 33MHz
       capabilities: pm msi vga_controller bus_master cap_list rom
       configuration: driver=ast latency=0
       resources: irq:17 memory:9c000000-9cffffff memory:9d000000-9d01ffff ioport:2000(size=128) memory:c0000-dffff
  *-display
       description: 3D controller
       product: GP102GL [Tesla P40] [10DE:1B38]
       vendor: NVIDIA Corporation [10DE]
       physical id: 0
       bus info: pci@0000:1a:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress bus_master cap_list
       configuration: driver=nvidia latency=0
       resources: iomemory:38700-386ff iomemory:38780-3877f irq:392 memory:a9000000-a9ffffff memory:387000000000-3877ffffffff memory:387800000000-387801ffffff

检测是否安装了NVIDIA的GPU(硬件)

[root@localhost local]# lspci | grep -i nvidia
09:00.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M6] (rev a1)

安装GCC、kernal组件、dkms

yum install gcc
yum install gcc-g++
yum install -y elfutils-libelf-devel
yum install -y kernel-devel gcc -y

查看显卡信息,检测内核版本和源码版本是否一致,保证一致

[root@localhost pkg]# ls /boot | grep vmlinu
vmlinuz-0-rescue-f89c734ac8c2471a948b2b8e7cea7df3
vmlinuz-3.10.0-957.el7.x86_64
[root@localhost pkg]# rpm -aq | grep kernel-devel
kernel-devel-3.10.0-957.el7.x86_64

root@localhost pkg]# lsmod | grep nouveau
nouveau              1869689  0 
video                  24538  1 nouveau
mxm_wmi                13021  1 nouveau
i2c_algo_bit           13413  2 ast,nouveau
drm_kms_helper        179394  2 ast,nouveau
ttm                   114635  2 ast,nouveau
drm                   429744  6 ast,ttm,drm_kms_helper,nouveau
wmi                    21636  2 mxm_wmi,nouveau

屏蔽默认的nouveau

## vim /lib/modprobe.d/dist-blacklist.conf


# watchdog drivers
blacklist i8xx_tco

# framebuffer drivers
blacklist aty128fb
blacklist atyfb
blacklist radeonfb
blacklist i810fb
blacklist cirrusfb
blacklist intelfb
blacklist kyrofb
blacklist i2c-matroxfb
blacklist hgafb
#blacklist nvidiafb
blacklist rivafb
blacklist savagefb
blacklist sstfb
blacklist neofb
blacklist tridentfb
blacklist tdfxfb
blacklist virgefb
blacklist vga16fb
blacklist viafb

增加
blacklist nouveau
options nouveau modeset=0

重建initramfs image步骤

[root@localhost pkg]# mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
[root@localhost pkg]# dracut /boot/initramfs-$(uname -r).img $(uname -r)

修改运行级别为文本模式

[root@localhost pkg]# systemctl set-default multi-user.target 
Removed symlink /etc/systemd/system/default.target.
Created symlink from /etc/systemd/system/default.target to /usr/lib/systemd/system/multi-user.target.

重启服务器:reboot

查询nouveau是否关闭

[root@localhost ~]# lsmod | grep nouveau

安装NVIDIA驱动和CUDA驱动

run文件增加权限

chmod a+x cuda_10.0.130_410.48_linux.run

安装cuda

./cuda_10.0.130_410.48_linux.run -no-opengl-libs
增加 -no-opengl-libs参数,表示不安装OpenGL文件,这个参数能够避免无法进入图形界面的问题。

安装nvidia驱动

如果需要单独安装nvidia驱动,安装 另外参数 –no-opengl-files表示不安装OpenGL文件,这个参数能够避免无法进入图形界面的问题
sudo ./NVIDIA.run -no-x-check -no-nouveau-check -no-opengl-files

等会儿协议,输入accept后回车。如果已经提前安装了NVIDIA驱动,则需要回车取消Nvidia driver那一项,其他不变,install安装。

设置环境变量

~/.bashrc文件增加以下内容:

export PATH="/usr/local/cuda-10.0/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-10.0/lib64:$LD_LIBRARY_PATH"
export CUDACXX="/usr/local/cuda-10.0/bin/nvcc"

输入 source ~/.bashrc
在当前shell中,使环境变量生效。

查看是否安装成功

[root@localhost local]# nvcc -V
nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

验证NVIDIA是否安装成功

[root@localhost local]# nvidia-smi
Fri Jun 22 08:07:11 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla M6            Off  | 00000000:09:00.0 Off |                  Off |
| N/A   43C    P0    25W / 100W |      0MiB /  8129MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

CUDA demo程序验证是否安装成功

[root@localhost cuda-10.0]# cd /usr/local/cuda-10.0/samples/1_Utilities/deviceQuery
[root@localhost deviceQuery]# sudo make
[root@localhost deviceQuery]# ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Tesla M6"
  CUDA Driver Version / Runtime Version          11.0 / 10.0
  CUDA Capability Major/Minor version number:    5.2
  Total amount of global memory:                 8129 MBytes (8524136448 bytes)
  (12) Multiprocessors, (128) CUDA Cores/MP:     1536 CUDA Cores
  GPU Max Clock rate:                            1050 MHz (1.05 GHz)
  Memory Clock rate:                             2300 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 2097152 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            No
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 9 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.0, CUDA Runtime Version = 10.0, NumDevs = 1
Result = PASS

出现Result = PASS并成功检测出所有显卡,代表显卡驱动和cuda安装成功

问题汇总

vncserver 和nvidia驱动冲突问题

问题现象

1、nvidia驱动安装时,如果vncserver开启,会出现安装失败。
2、nvidia驱动安装后,vncserver开启,远程桌面显示黑屏。

原因

NVIDIA自带的OPENGL和系统的opengl冲突,导致图形界面损坏。

解决方案

卸载NVIDIA驱动,重新安装不带opengl的nvidia驱动。

//命令:
systemctl get-default  multi-user.target
reboot

./NVIDIA-Linux-x86_64-510.47.03.run --no-opengl-files
 systemctl get-default   graphical.target
reboot

查询默认界面模式

systemctl get-default

设置命令界面:

systemctl set-default  multi-user.target

设置图形界面:

 systemctl set-default  graphical.target

inittab文件描述

[root@localhost etc]# cat inittab 
# inittab is no longer used when using systemd.
#
# ADDING CONFIGURATION HERE WILL HAVE NO EFFECT ON YOUR SYSTEM.
#
# Ctrl-Alt-Delete is handled by /usr/lib/systemd/system/ctrl-alt-del.target
#
# systemd uses 'targets' instead of runlevels. By default, there are two main targets:
#
# multi-user.target: analogous to runlevel 3
# graphical.target: analogous to runlevel 5
#
# To view current default target, run:
# systemctl get-default
#
# To set a default target, run:
# systemctl set-default TARGET.target

关于centos 8 安装cuda 10.2驱动失败的问题

现象

执行cuda 10.2.run文件后,会出现安装nvidia 440.33失败和cuda安装失败的log信息。
查看log信息:cat /var/log/nvidia-install.log

解决方案

通过安装测试,nvidia 驱动安装也失败,cuda 10.2证实是支持centos 8的,显卡型号Nvidia P40。怀疑是缺少依赖组件。
通过排查,缺少elfutils-libelf-devel组件。

查找是否已经安装,命令:

rpm -qa | grep elfutils-libelf-devel
  • 1
    点赞
  • 12
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值