Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

最新推荐文章于 2021-06-12 16:28:09 发布

Qin_xian_shen

最新推荐文章于 2021-06-12 16:28:09 发布

阅读量252

点赞数

本文链接：https://blog.csdn.net/Qin_xian_shen/article/details/103112568

版权

Here is a bit more info on how I temporarily resolved it. I believe these issues are all related to GPU memory allocation and have nothing to do with the errors being reported. There were other errors before this indicating some sort of memory allocation problem but the program continued to progress, eventually giving the cudnn errors that everyone is getting. The reason I believe it works sometimes is that if you use the gpu for other things besides tensorflow such as your primary display, the available memory fluctuates. Sometimes you can allocate what you need and other times it can't.

From the API
https://www.tensorflow.org/versions/r0.12/how_tos/using_gpu/
"By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICES) visible to the process. This is done to more efficiently use the relatively precious GPU memory resources on the devices by reducing memory fragmentation."

I think this default allocation is broken in some way that causes this erratic behavior and certain situations to work and others to fail.

I have resolved this issue by changing the default behavior of TF to allocate a minimum amount of memory and grow as needed as detailed in the webpage.
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)

I have also tried the alternate way and was able to get it to work and fail with experimentally choosing a percentage that worked. In my case it ended up being about .7.

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)

Still no word from anyone on the TF team confirming this but it is worth a shot to see if others can confirm similar behavior.