Ubuntu 18.04配置Nvidia P100驱动

网上找到的一般安装驱动的方法有两种
第一种是直接执行
sudo apt-cache search nvidia*
然后安装最新驱动
sudo apt-get install nvidia-driver-530

另外一种方法是ubuntu自动安装
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
ubuntu-drivers devices

xxxxx@xxx:~$ ubuntu-drivers devices
WARNING:root:_pkg_get_support nvidia-driver-525: package has invalid Support PBheader, cannot determine support level
WARNING:root:_pkg_get_support nvidia-driver-510: package has invalid Support PBheader, cannot determine support level
WARNING:root:_pkg_get_support nvidia-driver-515-server: package has invalid Support PBheader, cannot determine support level
WARNING:root:_pkg_get_support nvidia-driver-525-server: package has invalid Support PBheader, cannot determine support level
WARNING:root:_pkg_get_support nvidia-driver-530: package has invalid Support PBheader, cannot determine support level
WARNING:root:_pkg_get_support nvidia-driver-515: package has invalid Support PBheader, cannot determine support level
WARNING:root:_pkg_get_support nvidia-driver-390: package has invalid Support Legacyheader, cannot determine support level
== /sys/devices/pci0000:00/0000:00:1c.0/0000:03:00.0 ==
modalias : pci:v000010DEd000015F8sv000010DEsd0000118Fbc03sc02i00
vendor   : NVIDIA Corporation
model    : GP100GL [Tesla P100 PCIe 16GB]
manual_install: True
driver   : nvidia-driver-418-server - distro non-free
driver   : nvidia-driver-450-server - distro non-free
driver   : nvidia-driver-525 - distro non-free
driver   : nvidia-driver-470 - distro non-free recommended
driver   : nvidia-driver-510 - distro non-free
driver   : nvidia-driver-515-server - distro non-free
driver   : nvidia-driver-525-server - distro non-free
driver   : nvidia-driver-530 - distro non-free
driver   : nvidia-driver-470-server - distro non-free
driver   : nvidia-driver-515 - distro non-free
driver   : nvidia-driver-390 - distro non-free
driver   : xserver-xorg-video-nouveau - distro free builtin

然后执行自动安装
sudo ubuntu-drivers autoinstall


执行完以上方法后需要重启设备,最好是先关机后开机
如果开机后能进入到图形界面
打开命令行执行
nvidia-smi

xxxxx@xx:~$ nvidia-smi
Sun Feb 25 13:56:38 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05   Driver Version: 525.147.05   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:03:00.0 Off |                    2 |
| N/A   43C    P0   ERR! / 250W |      4MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P100-PCIE...  Off  | 00000000:82:00.0 Off |                    9 |
| N/A   39C    P0   ERR! / 250W |      4MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
如果以上都能顺利说明显卡驱动已经正常配置成功,非常幸运。

--------------------------------------------------幸运分割线-----------------------------------------------------------
但如果像我一样出现重启后不能开机,或者重启后只有登录,登录之后卡在桌面,请按照如下步骤配置

如果已经出现问题,那只能先卸载目前的显卡驱动,指令如下
sudo apt-get purge nvidia*
sudo apt-get autoremove

如果是用的linux.run指令运行的,卸载执行如下指令
sudo /usr/local/cuda-11.3/bin/cuda-uninstaller

执行之后再执行
nvidia-smi
没有任何输出表示卸载干净


下边进行重新安装配置
准备工作
要下载好配置cuda+ubuntu合成包,可以百度或者官网来下载,我是从百度找到ubuntu18.04对应的链接
不是越新越好,要对应ubuntu版本很关键
我的ubuntu18.04有下过两个版本
cuda_10.0.130_410.48_linux.run 安装后不匹配
cuda_11.3.1_465.19.01_linux.run 使用的这个版本
大约2GB,下载后拷贝到Home目录下,方便命令行下使用
确保N卡供电正常

1,禁用自带的nouvean
vi /etc/modprobe.d/blacklist.conf
#在最后加入(也可以只加第一行)
blacklist nouveau
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm-nouveau off


2,安装lightdm
用于管理2d桌面,否则安装时会有失败
sudo apt-get install lightdm
sudo service lightdm start

3,进入命令行模式
使用快捷键alt+ctrl+F6进入命令行,输入密码账号进入
关闭桌面管理服务
sudo service lightdm stop

4,安装
sudo chmod a+x cuda_11.3.1_465.19.01_linux.run
sudo ./cuda_11.3.1_465.19.01_linux.run --no-opengl-libs

经过一会等待,会有一个提示说你之前安装过,如果确定之前的都卸载过就忽略
选择 continue

接下来会让你确定一个协议,没有问题就
accept

如果是老的版本,会强制你一点一点看完,回车一点点往下移动,非常糟心,可以按快捷键D,快速翻页
然后就是选择
Y
直到最后,选择
Install

如果安装成功并完成,测试下,
nvidia-smi
正常会输出上边的显卡内容
如果提示未检测到驱动,需要自查下过程有没有问题。

如果失败要根据日志分析下失败原因,有两个日志文件可以分析
/var/log/cuda-installer.log
/var/log/nvidia-installer.log


安装成功后根据提示配置环境变量
>vi .bashrc
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.3/lib64
export PATH=$PATH:/usr/local/cuda-11.3/bin
export CUDA_HOME=$CUDA_HOME:/usr/local/cuda-11.3

PATH=$PATH:/usr/local/cuda/bin
export PATH

>source .bashrc

5,恢复桌面
上边都顺利执行之后启动桌面
sudo service lightdm start

自动进入到桌面登录,输入密码进入
打开命令窗口再确认下驱动是否正常
nvidia-smi

6,环境配置和测试确认
安装tensorflow环境, 建议使用conda容器来安装配置,防止毁掉系统环境
具体步骤网上自行查找,激活环境
conda activate tensorflow_2.0
使用tensorflow查看显卡是否安装正常
使用如下测试脚本test.py
import tensorflow.compat.v1 as tf
hello =tf.constant("hello")
sess = tf.Session()
print(sess.run(hello))

测试指令
python test.py
有显卡信息输出,目测应该是可以了
2024-06-17 22:22:16.023202: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-06-17 22:22:16.023545: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-06-17 22:22:16.023874: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15191 MB memory) -> physical GPU (device: 0, name: NVIDIA Tesla P100-PCIE-16GB, pci bus id: 0000:03:00.0, compute capability: 6.0)

  • 27
    点赞
  • 34
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值