【出现错误：Assertion `t ＞= 0 && t ＜ n_classes` failed】

无趣的雨

已于 2024-09-03 11:23:41 修改

阅读量566

点赞数 10

文章标签： python

于 2024-09-03 11:22:16 首次发布

本文链接：https://blog.csdn.net/weixin_43694476/article/details/141855620

版权

在模型训练时报出下面错误
nll_loss_forward_reduce_cuda_kernel_2d: Assertion `t ＞= 0 && t ＜ n__classes` failed.

Traceback (most recent call last):
  File "/paper_code/src/src/run.py", line 42, in <module>
    train(config, model, train_iter, dev_iter, test_iter)
  File "/paper_code/src/src/train_eval.py", line 60, in train
    loss.backward()
  File "/home/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/_tensor.py", line 255, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/autograd/__init__.py", line 147, in backward
    Variable._execution_engine.run_backward(
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [31,0,0] Assertion `t >= 0 && t < n_classes` failed.

先定位到错误的地方发现是在计算损失时出现问题

loss = F.cross_entropy(outputs, targets)  
loss.backward()

打印模型结果和标签发现，标签中存在大于类别数的情况（出现了5），因为我这里是三分类，所以超出了类别数，所以将出错的标签修改为0-2即可

print(outputs)
outputs = [[-0.8861,  1.6711, -0.6029], [-0.7692,  1.6556, -0.6670], [-0.8579,  1.6728, -0.6586],
        [-0.7631,  1.6240, -0.5046], [-0.7564,  1.7385, -0.5424], [-0.7516,  1.7620, -0.6124],
        [-0.8301,  1.5584, -0.6187], [-0.6366,  1.7889, -0.7712], [-0.8066,  1.6487, -0.5457],
        [-0.8030,  1.6144, -0.6283], [-0.7522,  1.6927, -0.6047], [-0.7829,  1.7517, -0.6119],
        [-0.8257,  1.7964, -0.7136], [-0.7701,  1.6883, -0.6130], [-0.7549,  1.7455, -0.6324],
        [-0.8305,  1.7744, -0.7079], [-0.8007,  1.6771, -0.6202], [-0.7593,  1.6338, -0.6308],
        [-0.7834,  1.8090, -0.6572], [-0.8399,  1.7427, -0.6483], [-0.7375,  1.7810, -0.7425],
        [-0.8892,  1.6604, -0.6257], [-0.8031,  1.6751, -0.7284], [-0.7251,  1.6090, -0.4635],
        [-0.7457,  1.8065, -0.6416],  [-0.8192,  1.7231, -0.5664], [-0.8781,  1.8318, -0.8249],
        [-0.7273,  1.6037, -0.5814], [-0.8125,  1.6107, -0.4593], [-0.7245,  1.7086, -0.5779],
        [-0.9000,  1.6345, -0.6055], [-0.8281,  1.7440, -0.6325]]
print(targets)
targets = [1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 2,
        1, 1, 2, 1, 1, 1, 2, 5]

总结： 该错误说明在计算loss时出现标签小于0或是大于类别数的情况
注意： 一般标签要从0开始，否则会出现target<0或是target>=n_class的的错误，同时还要确保训练类别类别与样本标签类别一致
参考：nll_loss_forward_reduce_cuda_kernel_2d: Assertion t ＞= 0 && t ＜ n__classes failed.