Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED. Possibly insufficient driver version: 3

1  安装了cuda-9.1(7.1.2.21-1+cuda9.1) . 但cudnn版本太高了(7.1.4.18-1+cuda9.2),需要降级。


2 报错情况:

root@0d4:~/net# ./run_and_time.sh 2 | tee benchmark-`date "+%F-%T"`.log
STARTING TIMING RUN AT 2018-06-21 03:47:10 AM
running benchmark with seed 2
INFO:tensorflow:Using config: {'_master': '', '_global_id_in_cluster': 0, '_log_step_count_steps': 100, '_num_worker_replicas': 1, '_is_chief': True, '_service': None
, '_tf_random_seed': 2, '_save_checkpoints_secs': 600, '_train_distribute': <tensorflow.contrib.distribute.python.one_device_strategy.OneDeviceStrategy object at 0x7f0f934da080>, '_session_config': allow_soft_placement: true, '_task_type': 'worker', '_save_summary_steps': 100, '_task_id': 0, '_keep_checkpoint_every_n_hours': 10000, '_keep_checkpoint_max': 5, '_num_ps_replicas': 0, '_clus
ter_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f0f7f8d7240>, '_save_checkpoints_steps': None, '_evaluation_master': '', '_device_fn': None, '_model_dir': '/tmp/imn_example'}INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
2018-06-21 03:47:25.068465: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: A
VX2 FMA2018-06-21 03:47:25.785164: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.562
pciBusID: 0000:86:00.0
totalMemory: 11.17GiB freeMemory: 10.99GiB
2018-06-21 03:47:25.785244: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2018-06-21 03:47:26.224662: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-06-21 03:47:26.224748: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958]      0 
2018-06-21 03:47:26.224776: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   N 
2018-06-21 03:47:26.225463: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1064
7 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:86:00.0, compute capability: 3.7)INFO:tensorflow:Restoring parameters from /tmp/imn_example/model.ckpt-2674
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 2674 into /tmp/imn_example/model.ckpt.
2018-06-21 03:47:35.910664: E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2018-06-21 03:47:35.910895: E tensorflow/stream_executor/cuda/cuda_dnn.cc:360] Possibly insufficient driver version: 387.26.0


按照建议。Doesn't work with cudnn v7.1.1.5,和 Upgrade to latest cuDNN v7 (7.1.3.16) 


ARG repository
FROM ${repository}:9.1-devel-ubuntu16.04
LABEL maintainer "NVIDIA CORPORATION <cudatools@nvidia.com>"

ENV CUDNN_VERSION 7.1.2.21
LABEL com.nvidia.cudnn.version="${CUDNN_VERSION}"

RUN apt-get update && apt-get install -y --no-install-recommends \
            libcudnn7=$CUDNN_VERSION-1+cuda9.1 \
            libcudnn7-dev=$CUDNN_VERSION-1+cuda9.1 && \
    rm -rf /var/lib/apt/lists/*


于是执行:

root@0d4660a33475:~/resnet# apt-get update && apt-get install -y --allow-downgrades --no-install-recommends libcudnn7=7.1.2.21-1+cuda9.1 libcudnn7-dev=7.1.2.21-1+cuda
9.1 && rm -rf /var/lib/apt/lists/*Hit:1 http://security.ubuntu.com/ubuntu xenial-security InRelease
Hit:2 http://archive.ubuntu.com/ubuntu xenial InRelease
Hit:3 http://archive.ubuntu.com/ubuntu xenial-updates InRelease
Ign:4 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64  InRelease
Hit:5 http://archive.ubuntu.com/ubuntu xenial-backports InRelease
Ign:6 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64  InRelease
Hit:7 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64  Release
Hit:9 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64  Release
Reading package lists... Done
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
  libcudnn7-dev
The following packages will be DOWNGRADED:
  libcudnn7
0 upgraded, 1 newly installed, 1 downgraded, 0 to remove and 0 not upgraded.
Need to get 256 MB of archives.
After this operation, 340 MB of additional disk space will be used.
Get:1 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64  libcudnn7 7.1.2.21-1+cuda9.1 [133 MB]
Get:2 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64  libcudnn7-dev 7.1.2.21-1+cuda9.1 [123 MB]                              
Fetched 256 MB in 1min 59s (2143 kB/s)                                                                                                                               
debconf: delaying package configuration, since apt-utils is not installed
dpkg: warning: downgrading libcudnn7 from 7.1.4.18-1+cuda9.2 to 7.1.2.21-1+cuda9.1
(Reading database ... 14749 files and directories currently installed.)
Preparing to unpack .../libcudnn7_7.1.2.21-1+cuda9.1_amd64.deb ...
Unpacking libcudnn7 (7.1.2.21-1+cuda9.1) over (7.1.4.18-1+cuda9.2) ...
Selecting previously unselected package libcudnn7-dev.
Preparing to unpack .../libcudnn7-dev_7.1.2.21-1+cuda9.1_amd64.deb ...
Unpacking libcudnn7-dev (7.1.2.21-1+cuda9.1) ...
Processing triggers for libc-bin (2.23-0ubuntu10) ...
Setting up libcudnn7 (7.1.2.21-1+cuda9.1) ...
Setting up libcudnn7-dev (7.1.2.21-1+cuda9.1) ...
update-alternatives: using /usr/include/x86_64-linux-gnu/cudnn_v7.h to provide /usr/include/cudnn.h (libcudnn) in auto mode
Processing triggers for libc-bin (2.23-0ubuntu10) ...


done!

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
cuDNN error: CUDNN_STATUS_NOT_INITIALIZED错误通常是由于多种原因引起的。根据引用\[1\]中的描述,这个错误可能与PyTorch的版本与cuda版本不匹配无关。引用\[2\]中提到,这个错误可能是由于输入的label值超出了值域所导致的。在这种情况下,网络要求的label应该在0到8之间,但是输入的label值域不正确,因此导致了错误。另外,引用\[3\]中指出,这个错误也可能是由于模型输出有问题,例如关系类别数量定义错误导致的。因此,要解决cuDNN error: CUDNN_STATUS_NOT_INITIALIZED错误,你可以尝试以下几个步骤: 1. 确保PyTorch和cuda的版本匹配。 2. 检查输入的label值是否在正确的值域内。 3. 检查模型的输出是否正确,例如关系类别数量是否定义正确。 通过逐步排查这些可能的原因,你应该能够解决cuDNN error: CUDNN_STATUS_NOT_INITIALIZED错误。 #### 引用[.reference_title] - *1* *3* [RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED](https://blog.csdn.net/liaoningxinmin/article/details/119139840)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insertT0,239^v3^insert_chatgpt"}} ] [.reference_item] - *2* [RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED【解决方法】](https://blog.csdn.net/weixin_47675950/article/details/120437087)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insertT0,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值