完整的报错信息为:
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: device >= 0 && device < num_g
pus INTERNAL ASSERT FAILED at "../aten/src/ATen/cuda/CUDAContext.cpp":50, please report a bug to PyTorch.
后来发现是‘CUDA_VISIBLE_DEVICES‘和代码中冲突了,其中在代码中包含了:
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0,1,2,3'
和环境变量冲突了,将CUDA_VISIBLE_DEVICES的环境变量注释掉解决了问题
unset CUDA_VISIBLE_DEVICES