RuntimeError: CUDA error: device-side assert triggered

 模型训练着突然报错,没找到问题在哪,把这个错简单复现了一下,网上都说是torch.nn.functional的cross_entropy的input和target上出了问题,target的index不对什么的[1,2],如下:target是[1,2,3]和[2,3,0]都没问题,[1,2,100]和[99,100,101]就不对,还没搞明白,先记一下。

[1]https://blog.csdn.net/littlehaes/article/details/102806323

[2]https://blog.csdn.net/qq_27292549/article/details/81084782

[3]多分类交叉熵的计算:https://www.cnblogs.com/jclian91/p/9376117.html

[4]target的label出现非法值:https://www.cnblogs.com/geoffreyone/p/10653619.html

>>> import torch
>>> import torch.nn.functional.F as F

>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.randint(5, (3,), dtype=torch.int64)
>>> input
tensor([[-1.5579,  0.5080,  0.1069,  0.7945,  0.4689],
        [-2.9727,  0.3491, -1.2172, -0.0223,  1.2733],
        [-0.2269, -1.1830, -0.8604,  1.2835,  1.2629]], requires_grad=True)
>>> target
tensor([2, 3, 0])
>>> loss = F.cross_entropy(input, target)
>>> loss
tensor(2.0206, grad_fn=<NllLossBackward>)

>>> input2 = torch.randn(4, 5, requires_grad=True)
>>> lossloss = F.cross_entropy(input, target)
>>> loss=F.cross_entropy(input2, target)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/cmz/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 2056, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
  File "/home/cmz/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1869, in nll_loss
    .format(input.size(0), target.size(0)))
ValueError: Expected input batch_size (4) to match target batch_size (3).

>>> import numpy as np
>>> target
tensor([2, 3, 0])
>>> t=np.array([99,100,101])
>>> tt=torch.from_numpy(t)
>>> tt
tensor([ 99, 100, 101])
>>> loss = F.cross_entropy(input, tt)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/cmz/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 2056, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
  File "/home/cmz/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1871, in nll_loss
    ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes' failed.  at ../aten/src/THNN/generic/ClassNLLCriterion.c:92
'''
>>> t=np.array([1,2,3])
>>> ttt=torch.from_numpy(t)
>>> ttt
tensor([1, 2, 3])
>>> loss = F.cross_entropy(input, ttt)
>>> t=np.array([1,2,100])
>>> ttt=torch.from_numpy(t)
>>> loss = F.cross_entropy(input, ttt)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/cmz/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 2056, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
  File "/home/cmz/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1871, in nll_loss
    ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes' failed.  at ../aten/src/THNN/generic/ClassNLLCriterion.c:92

从断掉的checkpoint又重新载入训练,没再出这个错了,迷。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值