RuntimeError: Unable to find a valid cuDNN algorithm to run convolution

不得不说,算力太强,版本太新也是一种烦恼哈哈

(base) [s503-1@s518-7 code_Spiking_CNN_Rathi_hybrid]$ conda activate pytorch17
(pytorch17) [s503-1@s518-7 code_Spiking_CNN_Rathi_hybrid]$ conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/linux-64/
Solving environment: done


==> WARNING: A newer version of conda exists. <==
  current version: 4.5.11
  latest version: 4.12.0

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /data1/s503-1/anaconda3/envs/pytorch17

  added / updated specs:
    - cudatoolkit=11.0
    - pytorch==1.7.1
    - torchaudio==0.7.2
    - torchvision==0.8.2


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    torchvision-0.8.2          |       py38_cu110        17.9 MB  http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
    pytorch-1.7.1              |py3.8_cuda11.0.221_cudnn8.0.5_0       770.6 MB  http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
    cudatoolkit-11.0.221       |       h6bb024c_0       952.7 MB  defaults
    ------------------------------------------------------------
                                           Total:        1.70 GB

The following packages will be UPDATED:

    cudatoolkit: 10.2.89-hfd86e86_1                   defaults                                                   --> 11.0.221-h6bb024c_0                   defaults
    pytorch:     1.7.1-py3.8_cuda10.2.89_cudnn7.6.5_0 http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch --> 1.7.1-py3.8_cuda11.0.221_cudnn8.0.5_0 http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
    torchvision: 0.8.2-py38_cu102                     http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch --> 0.8.2-py38_cu110                      http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch

Proceed ([y]/n)? y


Downloading and Extracting Packages
torchvision-0.8.2    | 17.9 MB   | ############################################################################# | 100%
pytorch-1.7.1        | 770.6 MB  | ############################################################################# | 100%
cudatoolkit-11.0.221 | 952.7 MB  | ############################################################################# | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: - By downloading and using the CUDA Toolkit conda packages, you accept the terms and conditions of the CUDA End User License Agreement (EULA): https://docs.nvidia.com/cuda/eula/index.html
                                                                                                                      done
         kernel_size          : 3
         test_acc_every_batch : False
         train_acc_batches    : 200
         devices              : 0
 Loaded module.features.0.weight from ./trained_models/snn/snn_vgg5_mnist_100.pth
 Loaded module.features.3.weight from ./trained_models/snn/snn_vgg5_mnist_100.pth
 Loaded module.features.6.weight from ./trained_models/snn/snn_vgg5_mnist_100.pth
 Loaded module.classifier.0.weight from ./trained_models/snn/snn_vgg5_mnist_100.pth
 Loaded module.classifier.3.weight from ./trained_models/snn/snn_vgg5_mnist_100.pth
 Loaded module.classifier.6.weight from ./trained_models/snn/snn_vgg5_mnist_100.pth
 DataParallel(
  (module): VGG_SNN_STDB(
    (input_layer): PoissonGenerator()
    (features): Sequential(
      (0): Conv2d(1, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (1): ReLU(inplace=True)
      (2): AvgPool2d(kernel_size=2, stride=2, padding=0)
      (3): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (4): ReLU(inplace=True)
      (5): Dropout(p=0.3, inplace=False)
      (6): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (7): ReLU(inplace=True)
      (8): AvgPool2d(kernel_size=2, stride=2, padding=0)
    )
    (classifier): Sequential(
      (0): Linear(in_features=6272, out_features=4096, bias=False)
      (1): ReLU(inplace=True)
      (2): Dropout(p=0.5, inplace=False)
      (3): Linear(in_features=4096, out_features=4096, bias=False)
      (4): ReLU(inplace=True)
      (5): Dropout(p=0.5, inplace=False)
      (6): Linear(in_features=4096, out_features=10, bias=False)
    )
  )
)
 Adam (
Parameter Group 0
    amsgrad: True
    betas: (0.9, 0.999)
    eps: 1e-08
    lr: 0.0001
    weight_decay: 0.0005
)snn.py:182: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  mask = torch.tensor(mask,dtype=torch.float)

在安装torch时,一定要注意显卡的cuda版本问题。

比如,在 RTX2080上 同样的环境中 程序可以正常运行,而换到A100中,就会报错如下:139c34c784f7a1e50895b2f1e1b215c4.pngb1c8b6115076a4ba933ea852d4c3b00d.png

NVIDIA A100-PCIE-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75. If you want to use the NVIDIA A100-PCIE-40GB GPU with PyTorch, please check the instructions at Start Locally | PyTorch

大概意思就是: NVIDIA A100-PCIE-40GB 带有的CUDA算力是8.0,它和现有的PyTorch版本不匹配,现有的PyTorch版本支持的CUDA算力是 3.7,5.0,6.0,7.0,7.5。

支持的CUDA算力是与安装的cuda的版本有关的,cuda 10.2 仅仅支持 3.7,5.0,6.0,7.0算力,不支持8.0算力。而cuda11是支持8.0算力的。

目前安装的torch版本是1.7.0,所以,需要安装cuda11及其以上,并且和torch 1.7.0不冲突的版本。

进入 PyTorch官网Previous PyTorch Versions | PyTorch

选择合适的CUDA版本, 也可以去 Previous PyTorch Versions 进行查看选择,

 

最终选择了 v1.7.1  CUDA 11.0的版本


   
   
  1. # CUDA 11.0
  2. pip install torch== 1.7 .1+cu110 torchvision== 0.8 .2+cu110 torchaudio== 0.7 .2 -f https: //download.pytorch.org/whl/torch_stable.html

问题解决。 

参考:https://zhuanlan.zhihu.com/p/427395039

 

这个问题常常会伴随着这几个输出信息:
NVIDIA A100 GPU - RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR
问题:
A100-PCIE-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the A100-PCIE-40GB GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

建议看这里:
NVIDIA A100 GPU - RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR

https://discuss.pytorch.org/t/nvidia-a100-gpu-runtimeerror-cudnn-error-cudnn-status-mapping-error/121648

另外,pytorch或者tensorflow使用conda安装失败,解决环境失败,网速太慢,可能是你网速或者安装源里面根本就没有这个版本的,你需要换源或者换版本。

更多参考

https://blog.csdn.net/hb_learing/article/details/114851335
https://blog.csdn.net/n_fly/article/details/120952287
https://blog.csdn.net/xiaobai11as/article/details/108357857
https://discuss.pytorch.org/t/nvidia-a100-gpu-runtimeerror-cudnn-error-cudnn-status-mapping-error/121648
https://blog.csdn.net/Willen_
https://blog.csdn.net/wxd1233/article/details/120509750
https://blog.csdn.net/weixin_43615569/article/details/108932451

  • 0
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值