今天出现了很奇怪的错误:
Traceback (most recent call last):
File "main_batch.py", line 175, in <module>
main(my_args)
File "main_batch.py", line 102, in main
seq2seq = seq2seq.cuda()
File "/。。anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 216, in cuda
return self._apply(lambda t: t.cuda(device))
File "。。naconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 146, in _apply
module._apply(fn)
File "/。。naconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 146, in _apply
module._apply(fn)
File "/。。/anaconda3/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 123, in _apply
self.flatten_parameters()
File "。。anaconda3/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 85, in flatten_parameters
handle = cudnn.get_handle()
File "。。/anaconda3/lib/python3.6/site-packages/torch/backends/cudnn/__init__.py", line 296, in get_handle
handle = CuDNNHandle()
File "。。naconda3/lib/python3.6/site-packages/torch/backends/cudnn/__init__.py", line 110, in __init__
check_error(lib.cudnnCreate(ctypes.byref(ptr)))
File "/。。/anaconda3/lib/python3.6/site-packages/torch/backends/cudnn/__init__.py", line 283, in check_error
raise CuDNNError(status)
torch.backends.cudnn.CuDNNError: 4: b'CUDNN_STATUS_INTERNAL_ERROR'
Exception ignored in: <bound method CuDNNHandle.__del__ of <torch.backends.cudnn.CuDNNHandle object at 0x7fd3cde68898>>
Traceback (most recent call last):
File "/。。anaconda3/lib/python3.6/site-packages/torch/backends/cudnn/__init__.py", line 114, in __del__
check_error(lib.cudnnDestroy(self))
ctypes.ArgumentError: argument 1: <class 'TypeError'>: Don't know how to convert parameter 1
如上文所示为报错信息。
百度无果。
由于程序之前还能跑,突然出现了这样的错误有些手足无措。
查看出错行数与信息,发现是cuda内部问题。
于是执行命令
CUDA_VISIBLE_DEVICES=2 python main_batch.py
错误解决。