RuntimeError: CUDA error: device-side assert triggered 断言形式和下述类似
::operator()(int)->auto: block: [0,0,0], thread: [4,0,0] Assertion index >= -sizes[i] && index < sizes[i] && “index out of bounds” failed.
/opt/conda/conda-bld/pytorch_1549628766161/work/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [0,0,0], thread: [5,0,0] Assertion index >= -sizes[i] && index < sizes[i] && “index out of bounds” failed.
/opt/conda/conda-bld/pytorch_1549628766161/work/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [0,0,0], thread: [6,0,0] Assertion index >= -sizes[i] && index < sizes[i] && “index out of bounds” failed.
/opt/conda/conda-bld/pytorch_1549628766161/work/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [0,0,0], thread: [7,0,0] Assertion index >= -sizes[i] && index < sizes[i] && “index out of bounds” failed.
/opt/conda/conda-bld/pytorch_1549628766161/work/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [0,0,0], thread: [8,0,0] Assertion index >= -sizes[i] && index < sizes[i] && “index out of bounds” failed.
RuntimeError: CUDA error: device-side assert triggered
terminate called after throwing an instance of ‘c10::Error’
该bug是因为索引超出范围,但从当前提示中,很难判断是问题最开始出现的位置。这是因为cuda采用并行计算,不能精准定义具体是哪里的bug。
解决:在训练时添加 CUDA_LAUNCH_BLOCKING=1 , 可以精准地定位bug出现位置。