tensorflow2.0安装 CUDA和CUDNN并运行成功

最新推荐文章于 2024-04-23 14:14:48 发布

楓尘林间

最新推荐文章于 2024-04-23 14:14:48 发布

阅读量1.7k

点赞数 1

分类专栏：深度学习 Tensorflow Linux 文章标签： tensorflow 深度学习

本文链接：https://blog.csdn.net/bowenlaw/article/details/108428326

版权

Linux 同时被 3 个专栏收录

27 篇文章 0 订阅

订阅专栏

深度学习

8 篇文章 0 订阅

订阅专栏

Tensorflow

6 篇文章 0 订阅

订阅专栏

1.先查看是否安装nvidia的驱动

nvidia-smi

如果报错 -bash: nvidia: command not found

则登陆 nvidia 驱动官方下载

找到对应版本下载后

sudo chmod +x NVIDIA-Linux-x86_64-470.63.01.run
sudo sh NVIDIA-Linux-x86_64-470.63.01.run

报错：

ERROR: The Nouveau kernel driver is currently in use by your system. This driver is incompatible with the NVIDIA driver, and must be disabled before proceeding. Please consult the NVIDIA driver README and your Linux distribution’s documentation for details on how to correctly disable the Nouveau kernel driver.

参考博客

报错：

Unable to find the kernel source tree for the currently running kernel. Please make sure you have installed the kernel source files for your kernel and that they are properly configured; on Red Hat Linux systems, for example, be sure you have the ‘kernel-source’ or ‘kernel-devel’ RPM installed. If you know the correct kernel source files are installed, you may specify the kernel source path with the ‘–kernel-source-path’ command line option.

参考 centos7.5英伟达驱动问题
在这里插入图片描述参考
1.CentOS7–manually upgrade the kernel to the specified version
2.lernel-devel下载
3.ERROR: Unable to find the kernel source tree
4.kernel devel 安装与卸载

rpm -qa | grep -E “kernel-devel|kernel-headers”

在这里插入图片描述
发现 kernel-devel和kernel-headers 版本不一致，通过参考1 和参考2 下载对应kernel-devel版本放入服务器

yum localinstall kernel-devel-3.10.0-957.27.2.el7.x86_64.rpm

删除多余的kernel-devel

yum remove kernel-devel-3.10.0-1160.41.1.el7.x86_64

再次开始

sudo sh NVIDIA-Linux-x86_64-470.63.01.run

报warning：

WARNING: nvidia-installer was forced to guess the X library path ‘/usr/lib64’ and X module path ‘/usr/lib64/xorg/modules’; these paths were not queryable from the system. If X fails to find the NVIDIA X driver module, please install the pkg-config utility and the X.Org SDK/development package for your distribution and reinstall the driver.

不用管，一直 yes。

最后使用 nvidia-smi 查看。

成功！！

2.查看 CUDA 版本：

cat /usr/local/cuda/version.txt

或者：

nvcc -V

查看 CUDNN 版本：

cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

查看能否使用gpu：

jupyter输入：

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

如果版本不对：

Num GPUs Available: 0

从tensorflow官方地址查看对应版本TF CUDA和cudnn
在这里插入图片描述

https://tensorflow.google.cn/install/source

经查阅：

tensorflow2.0.0需要安装cuda10.0和cudnn7.6：

2.安装cuda10.0

卸载之前的cuda(可选)

cd /usr/local/cuda-XX/bin
sudo ./uninstall_cuda_toolkit_XX.pl

下载对应的cuda10.0：

下载地址：https://developer.nvidia.com/cuda-toolkit-archive

首先查看Linux系统版本：

cat /etc/redhat-release

显示为 CentOS Linux release 7.7

再看架构：

uname -a

显示为： x86_64

下载对应版本：

拷贝到服务器上，进行安装：

sudo chmod +x cuda_10.1.105_418.39_linux.run
sudo sh cuda_10.1.105_418.39_linux.run

选项参考：
https://www.freesion.com/article/6641492348/

报错：
2. An NVIDIA kernel module ‘nvidia-drm’ appears to already be loaded in your kernel…

安装驱动时报的错误。

解决方案：

sudo service lightdm stop
禁用图形目标
sudo systemctl isolate multi-user.target
卸载Nvidia驱动程序
modprobe -r nvidia-drm

安装完毕查看：

cat /usr/local/cuda/version.txt

显示：CUDA Version 10.0.130

加入环境变量

sudo vim ~/.bashrc

添加：

export PATH="/usr/local/cuda-10.0/bin:$PATH" 
export LD_LIBRARY_PATH="/usr/local/cuda-10.0/lib64:$LD_LIBRARY_PATH"

source ~/.bashrc

查看：

nvcc -V

结果：

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:17_PST_2019
Cuda compilation tools, release 10.1, V10.1.105

安装CUDA完毕。

3.安装cudnn-10.0

官网下载：https://developer.nvidia.com/rdp/cudnn-archive

选择对应版本

我选择的是： cudnn-10.0-linux-x64-v7.6.5.32.tgz

拷贝到服务器：

tar -xvf cudnn-10.0-linux-x64-v7.6.5.32.tgz

解压后出现一个cuda 文件夹

拷贝：

sudo cp cuda/include/cudnn.h /usr/local/cuda-10.0/include # 填写对应的版本的cuda路径
sudo cp cuda/lib64/libcudnn* /usr/local/cuda-10.0/lib64 # 填写对应的版本的cuda路径
sudo chmod a+r /usr/local/cuda-10.0/include/cudnn.h /usr/local/cuda-10.0/lib64/libcudnn*

查看：cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

在这里插入图片描述
cudnn 安装完成！

4.jupyter内查看GPU是否可用

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices(“GPU”)))

4.tensorflow2.0使用gpu

import tensorflow as tf
sess = tf.compat.v1.Session(config=tf.compat.v1.ConfigProto(log_device_placement=True))

如果打印出现GPU和CPU 则使用了GPU
只出现CPU 则未启动GPU

5.查看keras与tensorflow对应关系和当前版本

对应关系查询网址：点击这里
在这里插入图片描述 tensorflow2.0对应keras版本为2.3.1

import keras 

print(keras.__version__)

显示版本为：2.2.5

重新安装：

cd xxx/xxx/anaconda3/bin 

./pip install keras==2.3.1

安装完毕！

参考：
https://www.freesion.com/article/9245510937/
https://blog.csdn.net/sinat_23619409/article/details/84202651
https://blog.csdn.net/kingfoulin/article/details/98872965

楓尘林间

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
tensorflow2.0安装 CUDA和CUDNN并运行成功

查看 CUDA 版本：cat /usr/local/cuda/version.txt或者：nvcc -V查看 CUDNN 版本：cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2查看能否使用gpu：jupyter输入：import tensorflow as tfprint("Num GPUs Available: ", len(tf.config.experimental.list_physical_devi
复制链接

扫一扫