RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously

运行分类任务的时候出现如下报错:

/opt/conda/conda-bld/pytorch_1646755861072/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [0,0,0], thread: [0,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1646755861072/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [0,0,0], thread: [1,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1646755861072/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [0,0,0], thread: [2,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1646755861072/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [0,0,0], thread: [3,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1646755861072/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [0,0,0], thread: [4,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1646755861072/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [0,0,0], thread: [5,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1646755861072/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [0,0,0], thread: [6,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1646755861072/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [0,0,0], thread: [7,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1646755861072/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [0,0,0], thread: [8,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1646755861072/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [0,0,0], thread: [9,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1646755861072/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [0,0,0], thread: [10,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1646755861072/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [0,0,0], thread: [11,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1646755861072/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [0,0,0], thread: [12,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1646755861072/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [0,0,0], thread: [13,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1646755861072/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [0,0,0], thread: [14,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1646755861072/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [0,0,0], thread: [15,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1646755861072/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [0,0,0], thread: [16,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1646755861072/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [0,0,0], thread: [17,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1646755861072/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [0,0,0], thread: [18,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1646755861072/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [0,0,0], thread: [19,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1646755861072/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [0,0,0], thread: [20,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1646755861072/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [0,0,0], thread: [21,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1646755861072/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [0,0,0], thread: [22,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1646755861072/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [0,0,0], thread: [23,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1646755861072/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [0,0,0], thread: [24,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1646755861072/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [0,0,0], thread: [25,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1646755861072/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [0,0,0], thread: [26,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1646755861072/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [0,0,0], thread: [27,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1646755861072/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [0,0,0], thread: [28,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1646755861072/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [0,0,0], thread: [29,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1646755861072/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [0,0,0], thread: [30,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1646755861072/work/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [0,0,0], thread: [31,0,0] Assertion `input_val >= zero && input_val <= one` failed.


RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,
so the stacktrace below might be incorrect.

真,很烦,因为这是第二次出现这个报错了,更烦的是上次出现这个报错的时候我是怎么修复的我已经不记得了!现在又需要重新整,oh No!故等下找到修复的法门一定要好好记录一番!


  • 记起来了!上次报这个错的原因是做tensor.log()运算的时候,tensor中有很多负数,因此我把tensor用softmax处理一下然后就修复了!!
  • 但是这次的原因还没找到!现在看来应该还是数据的问题,我看github中说是对数组执行索引操作的时候,索引超出了数组本身的数据量,因此出现错误,所以要去定位是什么数据索引超出界限了!
  • 最后发现是梯度爆炸的原因,我把出错那一行代码的前面的tensor值打印出来,发现到后面出现了Nan值,排查后发现是我直接把乘了注意力权重的特征值拿来做后续任务,这就导致后面预测值出现Nan,于是我做了个残差结构:w*F+F(w是权重,F是特征),然后就好了。
  • 3
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值