RuntimeError: CUDA error: device-side assert triggered

最新推荐文章于 2023-10-30 15:16:41 发布

justtoomuchforyou

最新推荐文章于 2023-10-30 15:16:41 发布

阅读量6.2k

点赞数

分类专栏： PyTorch

本文链接：https://blog.csdn.net/m0_37663482/article/details/103750661

版权

PyTorch 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

模型训练着突然报错，没找到问题在哪，把这个错简单复现了一下，网上都说是torch.nn.functional的cross_entropy的input和target上出了问题，target的index不对什么的[1,2]，如下：target是[1,2,3]和[2,3,0]都没问题，[1,2,100]和[99,100,101]就不对，还没搞明白，先记一下。

[1]https://blog.csdn.net/littlehaes/article/details/102806323

[2]https://blog.csdn.net/qq_27292549/article/details/81084782

[3]多分类交叉熵的计算：https://www.cnblogs.com/jclian91/p/9376117.html

[4]target的label出现非法值：https://www.cnblogs.com/geoffreyone/p/10653619.html

>>> import torch
>>> import torch.nn.functional.F as F

>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.randint(5, (3,), dtype=torch.int64)
>>> input
tensor([[-1.5579,  0.5080,  0.1069,  0.7945,  0.4689],
        [-2.9727,  0.3491, -1.2172, -0.0223,  1.2733],
        [-0.2269, -1.1830, -0.8604,  1.2835,  1.2629]], requires_grad=True)
>>> target
tensor([2, 3, 0])
>>> loss = F.cross_entropy(input, target)
>>> loss
tensor(2.0206, grad_fn=<NllLossBackward>)

>>> input2 = torch.randn(4, 5, requires_grad=True)
>>> lossloss = F.cross_entropy(input, target)
>>> loss=F.cross_entropy(input2, target)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/cmz/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 2056, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
  File "/home/cmz/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1869, in nll_loss
    .format(input.size(0), target.size(0)))
ValueError: Expected input batch_size (4) to match target batch_size (3).

>>> import numpy as np
>>> target
tensor([2, 3, 0])
>>> t=np.array([99,100,101])
>>> tt=torch.from_numpy(t)
>>> tt
tensor([ 99, 100, 101])
>>> loss = F.cross_entropy(input, tt)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/cmz/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 2056, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
  File "/home/cmz/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1871, in nll_loss
    ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes' failed.  at ../aten/src/THNN/generic/ClassNLLCriterion.c:92
'''
>>> t=np.array([1,2,3])
>>> ttt=torch.from_numpy(t)
>>> ttt
tensor([1, 2, 3])
>>> loss = F.cross_entropy(input, ttt)
>>> t=np.array([1,2,100])
>>> ttt=torch.from_numpy(t)
>>> loss = F.cross_entropy(input, ttt)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/cmz/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 2056, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
  File "/home/cmz/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1871, in nll_loss
    ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes' failed.  at ../aten/src/THNN/generic/ClassNLLCriterion.c:92

从断掉的checkpoint又重新载入训练，没再出这个错了，迷。