Could not load dynamic library ‘cublas64_10.dll‘; dlerror: cublas64_10.dll not found或者缺少DLL问题

项目场景:

       tensorflow对GPU的支持,尤其是对英伟达GPU的支持,不只是下个tensorflow-gpu就完事了,还需要一系列显卡相关的软件的版本匹配,比如最常见的就是那个cuda的版本匹配问题。当然匹配后也不可避免的有很多奇怪的问题,接下来我要讲的就是其中一个
Could not load dynamic library ‘cublas64_10.dll’; dlerror: cublas64_10.dll not found
Could not load dynamic library ‘cufft64_10.dll’; dlerror: cufft64_10.dll not found
Could not load dynamic library ‘curand64_10.dll’; dlerror: curand64_10.dll not found
Could not load dynamic library ‘cusolver64_10.dll’; dlerror: cusolver64_10.dll not found
Could not load dynamic library ‘cusparse64_10.dll’; dlerror: cusparse64_10.dll not found
问题。


       (不过如果有同学提示的是没有cuDNN的dll文件,那就看一下附录!,这边的报错是针对版本配好的)

问题描述:

       很多同学根据网上的提示都配好了CUDA的环境,但发现运行还是会有一大堆奇怪的提示,但是训练似乎没受影响很多同学可能就没多想,毕竟一大堆专业名词,没报错就等于没错好了。

但我们仔细看就发现问题了:
Skipping registering GPU devices…,
翻译一下,跳过GPU,好家伙,人家压根就没用你的GPU,没报错也就是因为你还有个CPU,所以全然不知,就算训练速度慢了,也没个概念,感觉估计就这个速度算快了吧。
在这里插入图片描述

       但我们不可能搞个那么值钱的GPU为了深度学习,结果还不用,那也太浪费了,接下来就是分析一下怎么调试吧。


原因分析:

我们可以看到上面的一些提示
在这里插入图片描述
那种什么successfully的肯定是没问题,但那些could not load的肯定是有问题了,可是我们明明对了版本为什么还是会说打不开呢?

Could not load dynamic library 'cublas64_10.dll'; dlerror: cublas64_10.dll not found
Could not load dynamic library 'cufft64_10.dll'; dlerror: cufft64_10.dll not found
Could not load dynamic library 'curand64_10.dll'; dlerror: curand64_10.dll not found
Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found
Could not load dynamic library 'cusparse64_10.dll'; dlerror: cusparse64_10.dll not found

解决方案:

其实解决办法很简单,去到我们的CUDA按照路径,一般都是

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin

我们会在里面找到这些所谓的could not load的文件,说明我们都是有的

在这里插入图片描述
但为什么打不开呢?看仔细点,人家后缀是10,我们的后缀都是100
所以我们尝试一下,把后缀名改成10看看
在这里插入图片描述
再放回去那个文件夹
再重新启动我们的程序
在这里插入图片描述
       而且我们可以看到那个Skipping registering GPU devices…也没了,那就是说明他启动成功了,但最直观的证明在哪呢?你的训练速度现在应该是几何倍数的提升了,像之前我的示例简单的一个手写数字识别,原来启动失败训练10轮的时间的5分20秒,启动独显后训练时间是53秒。这就是你那块那么值钱的显卡的厉害之处。

附录:

有些同学可能是连CUDA都没更新好,那用这个办法肯定是不行的,我这边就简单附上一个tensorflow-gpu版本和CUDA版本的对应图
在这里插入图片描述
以及对应的cuDNN的版本:
在这里插入图片描述

已标记关键词 清除标记
运行tensorflow时出现tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed这个错误,查了一下说是gpu被占用了,从下面这里开始出问题的: ``` 2019-10-17 09:28:49.495166: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6382 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1) (60000, 28, 28) (60000, 10) 2019-10-17 09:28:51.275415: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cublas64_100.dll'; dlerror: cublas64_100.dll not found ``` ![图片说明](https://img-ask.csdn.net/upload/201910/17/1571277238_292620.png) 最后显示的问题: ![图片说明](https://img-ask.csdn.net/upload/201910/17/1571277311_655722.png) 试了一下网上的方法,比如加代码: ``` gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333) sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)) ``` 但最后提示: ![图片说明](https://img-ask.csdn.net/upload/201910/17/1571277460_72752.png) 现在不知道要怎么解决了。新手想试下简单的数字识别,步骤也是按教程一步步来的,可能用的版本和教程不一样,我用的是刚下的:2.0tensorflow和以下: ![图片说明](https://img-ask.csdn.net/upload/201910/17/1571277627_439100.png) 不知道会不会有版本问题,现在紧急求助各位大佬,还有没有其它可以尝试的方法。测试程序加法运算可以执行,数字识别图片运行的时候我看了下,GPU最大占有率才0.2%,下面是完整数字图片识别代码: ``` import os import tensorflow as tf from tensorflow import keras from tensorflow.keras import layers, optimizers, datasets os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' #gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.2) #sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)) gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333) sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)) (x, y), (x_val, y_val) = datasets.mnist.load_data() x = tf.convert_to_tensor(x, dtype=tf.float32) / 255. y = tf.convert_to_tensor(y, dtype=tf.int32) y = tf.one_hot(y, depth=10) print(x.shape, y.shape) train_dataset = tf.data.Dataset.from_tensor_slices((x, y)) train_dataset = train_dataset.batch(200) model = keras.Sequential([ layers.Dense(512, activation='relu'), layers.Dense(256, activation='relu'), layers.Dense(10)]) optimizer = optimizers.SGD(learning_rate=0.001) def train_epoch(epoch): # Step4.loop for step, (x, y) in enumerate(train_dataset): with tf.GradientTape() as tape: # [b, 28, 28] => [b, 784] x = tf.reshape(x, (-1, 28 * 28)) # Step1. compute output # [b, 784] => [b, 10] out = model(x) # Step2. compute loss loss = tf.reduce_sum(tf.square(out - y)) / x.shape[0] # Step3. optimize and update w1, w2, w3, b1, b2, b3 grads = tape.gradient(loss, model.trainable_variables) # w' = w - lr * grad optimizer.apply_gradients(zip(grads, model.trainable_variables)) if step % 100 == 0: print(epoch, step, 'loss:', loss.numpy()) def train(): for epoch in range(30): train_epoch(epoch) if __name__ == '__main__': train() ``` 希望能有人给下建议或解决方法,拜谢!
先放图 ![代码](https://img-ask.csdn.net/upload/201703/21/1490094445_164133.png) ![结果](https://img-ask.csdn.net/upload/201703/21/1490094459_829443.png) 虽然有结果 但是 E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "BestSplits" device_type: "CPU"') for unknown op: BestSplits E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "CountExtremelyRandomStats" device_type: "CPU"') for unknown op: CountExtremelyRandomStats E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "FinishedNodes" device_type: "CPU"') for unknown op: FinishedNodes E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "GrowTree" device_type: "CPU"') for unknown op: GrowTree E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "ReinterpretStringToFloat" device_type: "CPU"') for unknown op: ReinterpretStringToFloat E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "SampleInputs" device_type: "CPU"') for unknown op: SampleInputs E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "ScatterAddNdim" device_type: "CPU"') for unknown op: ScatterAddNdim E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "TopNInsert" device_type: "CPU"') for unknown op: TopNInsert E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "TopNRemove" device_type: "CPU"') for unknown op: TopNRemove E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "TreePredictions" device_type: "CPU"') for unknown op: TreePredictions E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "UpdateFertileSlots" device_type: "CPU"') for unknown op: UpdateFertileSlots 是什么玩意
©️2020 CSDN 皮肤主题: 创作都市 设计师:CSDN官方博客 返回首页