Ubuntu16,Nvidia安装总结

Tensorflow和Nvidia驱动、cuda和cudnn的版本协调:

https://www.tensorflow.org/install/source#tested_build_configurations

 

一、安装驱动

ubuntu本身拥有开源的集成显卡驱动程序nouveau,先屏蔽nouveau,再安装NVIDIA官方驱动。 
查看属性
ls -lh /etc/modprobe.d/blacklist.conf

查看是否屏蔽了nouveau(命令没有输出就行)
lsmod | grep nouveau

 

屏蔽nouveau的方法:

1)sudo gedit /etc/modprobe.d/blacklist.conf
2)在最后一行添加:

blacklist nouveau
options nouveau modeset=0
3)执行:sudo update-initramfs -u
4)重启生效:reboot

 

查看GPU型号
lspci | grep -i nvidia

 

禁用x-windows服务:

sudo /etc/init.d/lightdm stop (或sudo service lightdm stop)


查看nvidia驱动版本
dpkg --list | grep nvidia-*

卸载原有的nvidia驱动
apt-get remove --purge nvidia*

 

正式安装命令:
./NVIDIA-Linux-x86_64-390.77.run -no-opengl-files -no-nouveau-check -no-x-check
./NVIDIA-Linux-x86_64-390.77.run -no-opengl-files

(–no-opengl-files 只安装驱动文件,不安装OpenGL文件)

启动x服务:

sudo /etc/init.d/lightdm start (或sudo service lightdm start)

 

二、cudn安装(相关补丁类似)

什么是cuda?为什么有了nvidia驱动、cuda和cudnn三者之间的关系?

1)CUDA(Compute Unified Device Architecture,统一并行计算架构)是由NVIDIA所推出的一种集成技术。

 

下载页面:

step1:
cd /data/bigData/nvidia_driver_390.77 -- 自己的个人安装文件目录
chmod +x ./cuda_9.0.176_384.81_linux.run
sh ./cuda_9.0.176_384.81_linux.run

step2:

export PATH=/usr/local/cuda-10.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64:$LD_LIBRARY_PATH
source ~/.bashrc

step3:cuda是否安装通过

cd /usr/local/cuda/samples/1_Utilities/deviceQuery

make 

./deviceQuery

 

注意点:

1、nvidia官网下载的cuda文件类型选择runfile(local)

2、cuda安装时候,会提示是否安装cuda内部的确定,切记选择no!


三、cudnn安装(假定cudn已经安装在/usr/local/cuda/目录下)

下载页面:

执行命令:

cp cudnn-9.0-linux-x64-v7.solitairetheme8  cudnn-9.0-linux-x64-v7.tgz
tar -xvf cudnn-9.0-linux-x64-v7.tgz  //解压结果位于当前目录的cuda目录下

在解压的目录下(当前目录的cuda目录):
cp cuda/include/*.h /usr/local/cuda/include/
cp cuda/lib64/lib* /usr/local/cuda/lib64/

(以下的3步在重新安装cudnn时可以省略)【so文件为什么需要建立软连接???】
chmod +r libcudnn.so.7.0.5
ln -s libcudnn.so.7.0.5 libcudnn.so.7
ln -s libcudnn.so.7 libcudnn.so
ldconfig -- 立刻生效动态链接库

Note:
ldconfig是一个动态链接库管理命令,实现动态链接库的系统共享。

禁止ubutnu系统自动更新
less /etc/apt/apt.conf.d/10periodic
 

检查系统内核版本

uname -sr

 

遇到的几个问题

1)安装nvidia驱动之后,输入nvidia-smi,没有输出显卡的相关信息

解决方案:重新安装nvidia驱动,中间有一步提示“是否restart x”,选择“yes”

2)"Would you like to register the kernel module sources with DKMS?This will allow DKMS to auomatically build a new module,if you install a different kernel later"问题:

选择 NO!

3)“Loaded runtime CuDNN library: 7101 (compatibility version 7100)”的cudnn版本问题:

解决方案:重新安装cudnn,需要在官网下载对应的v7.0的驱动(笔者安装的7.04),可以解决问题

4)频繁的调用和暂停显卡,比如频繁使用nvidia-smi,会导致rpa-**问题

解决方案:未找到解决方法,查阅资料说可能是显卡本身的硬件问题。

 

检查tensorflow能否正确使用显卡:

import os

import tensorflow as tf

os.environ["CUDA_VISIBLE_DEVICES"] = "0"

sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

2019-08-29 09:51:46.603464: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-08-29 09:51:46.603878: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:01:00.0
2019-08-29 09:51:46.603923: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-08-29 09:51:46.603935: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-08-29 09:51:46.603944: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-08-29 09:51:46.603954: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-08-29 09:51:46.603963: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-08-29 09:51:46.603973: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-08-29 09:51:46.603983: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-08-29 09:51:46.604021: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-08-29 09:51:46.604458: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-08-29 09:51:46.604842: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-08-29 09:51:46.604861: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-08-29 09:51:46.604866: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-08-29 09:51:46.604873: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2019-08-29 09:51:46.605019: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-08-29 09:51:46.605430: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-08-29 09:51:46.605838: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10468 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1
2019-08-29 09:51:46.605881: I tensorflow/core/common_runtime/direct_session.cc:296] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.

如上显示,说明显卡可以正常使用~

 

-- over --

 

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值