rn is not function怎么解决_四 ubuntu下训练GPU未工作问题及解决办法

0191ac3232a1edd0c2da484cf040d729.png

问题描述:

训练的时候发现尤其慢,并且查看GPU使用情况发现根本没变化,但是cuda 、cudnn、tensorflow 以及 tensorrt均以正常安装完毕,并且版本相互对应,到底是哪里出了问题呢?

Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/xxx/TensorRT-7.0.0.11/lib:/usr/local/cuda-10.2/lib64

a4f2b252748fe7f57d1ec4f68a256b0b.png

实时每隔1S刷新GPU使用情况:

watch -n 1 nvidia-smi

ca243d65636f7e0ce668ce2ff84a60f3.png

具体看一下加载cuda时候的输出信息:

Using TensorFlow backend.
2020-07-20 22:04:01.107109: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-07-20 22:04:01.143741: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-20 22:04:01.144279: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1070 Ti computeCapability: 6.1
coreClock: 1.683GHz coreCount: 19 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 238.66GiB/s
2020-07-20 22:04:01.144466: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/xxx/TensorRT-7.0.0.11/lib:/usr/local/cuda-10.2/lib64
2020-07-20 22:04:01.346245: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-07-20 22:04:01.459871: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-07-20 22:04:01.495662: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-07-20 22:04:01.717196: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-07-20 22:04:01.747251: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-07-20 22:04:02.147318: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-07-20 22:04:02.147411: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1598] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-07-20 22:04:02.149165: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-07-20 22:04:02.267155: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3199980000 Hz
2020-07-20 22:04:02.268828: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fa514000b60 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-07-20 22:04:02.268905: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-07-20 22:04:02.286339: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-20 22:04:02.286375: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      
tracking <tf.Variable 'conv4_3_norm/conv4_3_norm_gamma:0' shape=(512,) dtype=float32> gamma
WARNING:tensorflow:From /home/xxx/ssd_keras/keras_loss_function/keras_ssd_loss.py:133: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
WARNING:tensorflow:From /home/xxx/ssd_keras/keras_loss_function/keras_ssd_loss.py:168: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.

可以发现并没有load gpu, 为了确认,我们在看一下显卡驱动是否正常安装。

首先得安装mesa-utils,在终端输入命令:

sudo apt-get install mesa-utils 

然后再运行命令:

glxinfo | grep rendering 

如果结果是“yes”,证明显卡 驱动已经成功安装。

5f6df72e23b25957c9cef2488187e850.png

查看GPU是否可用:

python
import tensorflow as tf
tf.test.is_gpu_available()

ef0f83e95aa6fd8be03136edc65cf710.png

历尽千辛万苦什么版本不匹配、什么包不完整等等弯路,终于找到解决办法!

就是创建一个libcudart.so.10.1的软链接就可解决,什么版本不匹配都是渣渣,直接pass.

sudo ln -s /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2 /usr/lib/x86_64-linux-gnu/libcudart.so.10.1

d9042d6c3bddfcc47f23287045b21acd.png

再来看看test_gpu,

ca7eace4a4292c01fcfe144b54cc7f6b.png

重新训练正常,完结撒花!

f6d5d64b32b78e81295fe36866275081.png

6dd3dce26e58cb7d5183ecf284563c24.png
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值