cuda11: tensorflow is_gpu_available总是False, Could not load dynamic library libcudart.so.10.0


记录一下防脱发, 哦对了, 不然以后哪天又更新个什么鬼的用不了前列腺很容易发炎.


问题描述:

因为要用tensorRT,不得已更新了最新的cuda-11.0, 于是在使用tensorflow的时候查看是否可以用GPU的函数

import tensorflow as tf
tf.test.is_gpu_available()

总是False, 而且还报如下错误:

2020-08-31 17:17:04.380095: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda-11.0/lib64:/usr/local/cuda-11.0/lib64
2020-08-31 17:17:04.380368: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda-11.0/lib64:/usr/local/cuda-11.0/lib64
2020-08-31 17:17:04.398758: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda-11.0/lib64:/usr/local/cuda-11.0/lib64
2020-08-31 17:17:04.399120: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda-11.0/lib64:/usr/local/cuda-11.0/lib64
2020-08-31 17:17:04.399388: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda-11.0/lib64:/usr/local/cuda-11.0/lib64
2020-08-31 17:17:04.399645: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda-11.0/lib64:/usr/local/cuda-11.0/lib64
2020-08-31 17:17:05.132994: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-08-31 17:17:05.133087: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1662] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.

成因:

  1. Cuda 或者 cudnn 没安装对
  2. 环境变量没配好
  3. tensorflow 版本不支持CUDA-11.0+cudnn_v8(本文属于这种原因)

没耐心看bb, 解决

conda install cudatoolkit
conda install cudnn

了事

有耐心慢慢看bb, 解决

对于成因1

检查CUDA, 命令行

nvcc --version
nvidia-smi

没问题就基本ok

检查cudnn 有点麻烦, 命令行

cd /usr/local/cuda/include/

如果有`cudnn.h’文件就基本ok, 如果没有可以在这里找找,

ls /usr/include/cudnn*

一般会出现如下文件:

cudnn_adv_infer.h  cudnn_adv_train.h  cudnn_backend.h    cudnn_cnn_infer.h  cudnn_cnn_train.h  cudnn.h            cudnn_ops_infer.h  cudnn_ops_train.h  cudnn_version.h 

如果还没有那么可以在/usr/include/x86_64-linux-gnu$这里再找找, 因为估计你安装了很多个版本,然后人家自动帮你覆盖了:

ls /usr/include/x86_64-linux-gnu/cudnn*

有的话你可以看到文件的如下:

cudnn_adv_infer_v8.h  cudnn_adv_train_v8.h  cudnn_backend_v8.h  cudnn_cnn_infer_v8.h  cudnn_cnn_train_v8.h  cudnn_ops_infer_v8.h  cudnn_ops_train_v8.h  cudnn_v7.h  cudnn_v8.h  cudnn_version_v8.h

我这里就是连续安装了三个cudnn, v7,v8,v8, 在这些地方的cudnn_version_v8.h可以看到版本, 选一个cudnn.h文件复制到/usr/local/cuda/include/里面就好了

成因2, 环境变量

命令行:

sudo gedit ~/.bashrc

看看有没有这几行,没有的自己添加, 其中cuda-11.0这玩意儿可以自己在local文件夹下选.

export PATH=$PATH:/snap/bin
#export PATH=/usr/local/cuda-10.2/bin${PATH:+:$PATH}} 
export PATH=$PATH:~/.local/bin
export PATH=/usr/local/cuda-11.0/bin${PATH:+:$PATH}} 
export LD_LIBRARY_PATH=/usr/local/cuda-11.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

然后

sudo gedit /etc/profile

加上这几行

export PATH=/usr/local/cuda-11.0/bin${PATH:+:$PATH}} 
export LD_LIBRARY_PATH=/usr/local/cuda-11.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

最后

source /etc/profile
source  ~/.bashrc
sudo lbconfig

成因3 本文的情况

我是这么检查的, 在一个有pytorch的虚拟环境里:

conda activate 有pytorch的虚拟环境
python
import torch 
torch.cuda.is_available()

返回True,说明和前面原因没啥关系了, 就tm tensorflow自己的问题.

版本不支持不要紧,下载个支持的就好了, conda还会自己帮你选

conda install cudatoolkit
conda install cudnn

调用的自然就是合适的版本了,里面的一些依赖文件自然也会有.

参考文献

居然在 里你敢信
https://blog.csdn.net/u012388993/article/details/102573117
https://www.cnblogs.com/sddai/p/11135941.html
https://blog.csdn.net/roxxo/article/details/105138007
https://github.com/tensorflow/tensorflow/issues/26182

  • 3
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值