【出现错误:Assertion `t >= 0 && t < n_classes` failed】

在模型训练时报出下面错误
nll_loss_forward_reduce_cuda_kernel_2d: Assertion `t >= 0 && t < n__classes` failed.

Traceback (most recent call last):
  File "/paper_code/src/src/run.py", line 42, in <module>
    train(config, model, train_iter, dev_iter, test_iter)
  File "/paper_code/src/src/train_eval.py", line 60, in train
    loss.backward()
  File "/home/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/_tensor.py", line 255, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/autograd/__init__.py", line 147, in backward
    Variable._execution_engine.run_backward(
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [31,0,0] Assertion `t >= 0 && t < n_classes` failed.

先定位到错误的地方发现是在计算损失时出现问题

loss = F.cross_entropy(outputs, targets)  
loss.backward()

打印模型结果和标签发现,标签中存在大于类别数的情况(出现了5),因为我这里是三分类,所以超出了类别数,所以将出错的标签修改为0-2即可

print(outputs)
outputs = [[-0.8861,  1.6711, -0.6029], [-0.7692,  1.6556, -0.6670], [-0.8579,  1.6728, -0.6586],
        [-0.7631,  1.6240, -0.5046], [-0.7564,  1.7385, -0.5424], [-0.7516,  1.7620, -0.6124],
        [-0.8301,  1.5584, -0.6187], [-0.6366,  1.7889, -0.7712], [-0.8066,  1.6487, -0.5457],
        [-0.8030,  1.6144, -0.6283], [-0.7522,  1.6927, -0.6047], [-0.7829,  1.7517, -0.6119],
        [-0.8257,  1.7964, -0.7136], [-0.7701,  1.6883, -0.6130], [-0.7549,  1.7455, -0.6324],
        [-0.8305,  1.7744, -0.7079], [-0.8007,  1.6771, -0.6202], [-0.7593,  1.6338, -0.6308],
        [-0.7834,  1.8090, -0.6572], [-0.8399,  1.7427, -0.6483], [-0.7375,  1.7810, -0.7425],
        [-0.8892,  1.6604, -0.6257], [-0.8031,  1.6751, -0.7284], [-0.7251,  1.6090, -0.4635],
        [-0.7457,  1.8065, -0.6416],  [-0.8192,  1.7231, -0.5664], [-0.8781,  1.8318, -0.8249],
        [-0.7273,  1.6037, -0.5814], [-0.8125,  1.6107, -0.4593], [-0.7245,  1.7086, -0.5779],
        [-0.9000,  1.6345, -0.6055], [-0.8281,  1.7440, -0.6325]]
print(targets)
targets = [1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 2,
        1, 1, 2, 1, 1, 1, 2, 5]

总结: 该错误说明在计算loss时出现标签小于0或是大于类别数的情况
注意: 一般标签要从0开始,否则会出现target<0或是target>=n_class的的错误,同时还要确保训练类别类别与样本标签类别一致
参考nll_loss_forward_reduce_cuda_kernel_2d: Assertion t >= 0 && t < n__classes failed.

  • 6
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值