Tensorflow与cudnn配置报错：Failed to get convolution algorithm. 或 cudnn64_7.dll not found

最新推荐文章于 2023-12-12 11:50:03 发布

一个做图像的人

最新推荐文章于 2023-12-12 11:50:03 发布

阅读量573

点赞数 1

分类专栏：环境依赖配置文章标签： bug cuda tensorflow cudnn python

本文链接：https://blog.csdn.net/Sau_Hit/article/details/113094111

版权

环境依赖配置专栏收录该内容

9 篇文章 0 订阅

订阅专栏

因为某种需要，从linux系统转移到windows系统，tensorflow与cuda/cudnn需要重新配置（具体对应版本，网上已经很多）

在调试过程中，遇到一个问题：

2021-01-24 17:50:45.163408: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2021-01-24 17:50:45.418034: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudnn64_7.dll'; dlerror: cudnn64_7.dll not found
2021-01-24 17:50:45.418182: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2021-01-24 17:50:45.418408: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2021-01-24 17:50:45.418507: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
         [[{{node res_net/conv2d/Conv2D}}]]
Traceback (most recent call last):
  File "e:/Python_Project/xxxx_project/ver3.0/py2xsl.py", line 123, in <module>
    predict_and_xlsx(file_path)
  File "e:/Python_Project/xxxx_project/ver3.0/py2xsl.py", line 113, in predict_and_xlsx
    model_pre_result =  predict(img_path)
  File "e:\Python_Project\xxxx_project\ver3.0\predict.py", line 211, in predict
    set_name_list, all_name_list,loc_list = detect_img(img_path)
  File "e:\Python_Project\xxxx_project\ver3.0\predict.py", line 144, in detect_img
    detect_result = detect_model.predict(img_send_array_)
  File "C:\Users\Administrator\envs\env1\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 909, in predict
    use_multiprocessing=use_multiprocessing)
  File "C:\Users\Administrator\envs\env1\lib\site-packages\tensorflow_core\python\keras\engine\training_arrays.py", line 722, in predict
    callbacks=callbacks)
  File "C:\Users\Administrator\envs\env1\lib\site-packages\tensorflow_core\python\keras\engine\training_arrays.py", line 393, in model_iteration
    batch_outs = f(ins_batch)
  File "C:\Users\Administrator\envs\env1\lib\site-packages\tensorflow_core\python\keras\backend.py", line 3740, in __call__
    outputs = self._graph_fn(*converted_inputs)
  File "C:\Users\Administrator\envs\env1\lib\site-packages\tensorflow_core\python\eager\function.py", line 1081, in __call__
    return self._call_impl(args, kwargs)
  File "C:\Users\Administrator\envs\env1\lib\site-packages\tensorflow_core\python\eager\function.py", line 1121, in _call_impl
    return self._call_flat(args, self.captured_inputs, cancellation_manager)
  File "C:\Users\Administrator\envs\env1\lib\site-packages\tensorflow_core\python\eager\function.py", line 1224, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager)
  File "C:\Users\Administrator\envs\env1\lib\site-packages\tensorflow_core\python\eager\function.py", line 511, in call
    ctx=ctx)
  File "C:\Users\Administrator\envs\env1\lib\site-packages\tensorflow_core\python\eager\execute.py", line 67, in quick_execute
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.UnknownError:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
         [[node res_net/conv2d/Conv2D (defined at C:\Users\Administrator\envs\env1\lib\site-packages\tensorflow_core\python\framework\ops.py:1751) ]] [Op:__inference_keras_scratch_graph_3404]

Function call stack:
keras_scratch_graph

只看报错的最后两行，这个错误在linux系统上遇到过，原因似乎像是网友说的那样——显存不够引起的。当关闭了IDE或者其他的东西，的确能够解决这个问题。（仅针对报错最后两行与当时的linux系统，其他的系统不清楚）

今日，在windows上遇到了这个问题，翻来覆去找到了许多杂七杂八的方法，其中有：

1.代码块添加，设置显存

2.环境变量配置

3.更改cudnn版本

其中，最让我相信的原因是更改cudnn版本，但是我按照之前配置的，是一模一样的版本，cuda10+cudnn7.6.x，是完全没问题的，但是出于怀疑，还是更换了好多。

试过了都没用，最后还是自己一行一行找出错误的原因。

在翻阅错误信息的时候，发现日志一直在寻找dll文件，其中有一行是报出了：cudnn64_7.dll not found

这个文件见过，就是cudnn解压后，bin文件夹下的文件，想到网上说的两种方法：

1.把解压后的文件复制到cuda目录下，以cudnn或者cuda存在：C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\cudnn

2.把解压后的文件夹下所有的文件，复制到cuda下的对应文件夹中：bin-bin,include-include,lib-lib，也就是将cudnn的配置添加到cuda中

之前用的是1方法，可能还要配置环境变量什么的，但是还不好使，