问题
新申请了几张H100的显卡,但运行程序会出现提示
NVIDIA H100 PCIe with CUDA capability sm_90 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75 sm_80 sm_86.
If you want to use the NVIDIA H100 PCIe GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
原本的cuda版本是12.1,torch版本是2.0.1
解决
卸载掉之前安装的,重新安装11.8版本的cuda
pip install torch2.0.0+cu118 torchaudio2.0.0+cu118 torchvision==0.15.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
conda install cudnn
conda install -c “nvidia/label/cuda-11.8.0” cuda-toolkit
conda install -c “nvidia/label/cuda-11.8.0” cuda-nvcc
conda install -c “nvidia/label/cuda-11.8.0” cuda-runtime
验证
import torch
print("PyTorch Version: ",torch.__version__) ;
print("Is available: ", torch.cuda.is_available()) ;
print("Current Device: ", torch.cuda.current_device()) ;
print("Number of GPUs: ",torch.cuda.device_count())
结果
import torch
print("PyTorch Version: ",torch.__version__) ;
# PyTorch Version: 2.0.0+cu118
print("Is available: ", torch.cuda.is_available()) ;
# Is available: True
print("Current Device: ", torch.cuda.current_device()) ;
# Current Device: 0
print("Number of GPUs: ",torch.cuda.device_count())
# Number of GPUs: 8
补充