提示:以下只是个人环境报错的一种解答方式,希望对您有所帮助
报错如下
(pid=3654) RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR
(pid=3654) /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [0,0,0], thread: [0,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
(pid=3654) /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [0,0,0], thread: [3,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
(pid=3654) /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [0,0,0], thread: [96,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
(pid=3654) /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [0,0,0], thread: [97,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
(pid=3654) /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [0,0,0], thread: [98,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
原因分析
-
根据提示,数组索引越界导致的cudnn错误,所以需要查找数组越界的位置并改正即可。但是在gpu上运行越界信息并没有直接给出某一行代码报错的,因此通过将代码放在cpu运行,即可得到具体出错的位置进行解决,如下:
(pid=26352) Exception in thread Thread-1: (pid=26352) Traceback (most recent call last): (pid=26352) File "/home/xxx/.conda/envs/xxx/lib/python3.8/threading.py", line 932, in _bootstrap_inner (pid=26352) self.run() (pid=26352) File "/home/xxx/.conda/envs/xxx/lib/python3.8/threading.py", line 870, in run (pid=26352) self._target(*self._args, **self._kwargs) (pid=26352) File "/home/xxx/Userlist/xxx/xxx/xxx/Learner.py", line 57, in train (pid=26352) q_action_value = q_value[range(q_value.shape[0]), action] (pid=26352) IndexError: index 2 is out of bounds for dimension 1 with size 1
其他
- 上述报错提示已经清楚告诉我们是数组越界,但是不清楚具体是哪行代码出错,可以通过cpu运行解决。
- RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR 这种类型的错误,网上有各种版本的解决方式,一般可以往下看更多的提示信息来找到原因所在,需要"对症下药"。