问题
GNN训练时候,突然报错
mask = mask.div(num_neigh).to(embed_matrix.device)
RuntimeError: CUDA error: unspecified launch failure
这种说是batch size 略大,调小就行。
准备重启程序试试,但再次运行后,
torch._C._cuda_init()
RuntimeError: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero.
>>> import torch
>>> torch.cuda.is_available()
/home/computer/.local/lib/python3.6/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:109.)
return torch._C._cuda_getDeviceCount() > 0
False
>>> torch.cuda.device_count()
0
想起刚安装的tensorboard,好像冲突,卸载了,结果还是不行。按提示搜到的都是重装环境,随手一跑就崩溃实在接受不了,重启电脑,居然好了。