深度学习环境配置避坑-NVIDIA A100-PCIE-40GB配置pytorch
查看A100支持CUDA版本
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.89.02 Driver Version: 525.89.02 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100-PCI... Off | 00000000:18:00.0 Off | 0 |
| N/A 34C P0 34W / 250W | 16640MiB / 40960MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA A100-PCI... Off | 00000000:3B:00.0 Off | 0 |
| N/A 35C P0 36W / 250W | 2MiB / 40960MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
踩坑
这里需要为NVIDIA A100-PCIE-40GB配置pytorch运行环境,已配置环境为pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=10.2
,运行pytorch代码报错
NVIDIA A100-PCIE-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA A100-PCIE-40GB GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
报错分析:NVIDIA A100-PCIE-40GB 带有的CUDA算力是8.0,它和现有的PyTorch版本不匹配,现有的PyTorch版本支持的CUDA算力是 3.7,5.0,6.0,7.0,7.5。
解决方法
将CUDA版本提高到11.0以上。
尝试 - 从pytorch官网查询对应pip命令:
# CUDA 11.1
pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html
再重新运行pytorch代码,成功。