用PyTorch进行训练时报错RuntimeError: Function AddBackward0 returned an invalid gradient at index 1

在用Pytorch进行训练的时候,进行loss.backward()一步时可能出现如下错误:

RuntimeError: Function AddBackward0 returned an invalid gradient at index 1 - expected type TensorOptions(dtype=float, device=cuda:0, layout=Strided, requires_grad=false) but got TensorOptions(dtype=float, device=cpu, layout=Strided, requires_grad=false) (validate_outputs at /opt/conda/conda-bld/pytorch_1587428270644/work/torch/csrc/autograd/engine.cpp:484)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x4e (0x7fe308ee7b5e in /home/adc/anaconda3/envs/torch1536/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: + 0x2ae2834 (0x7fe3367ba834 in /home/adc/anaconda3/envs/torch1536/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #2: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::Graph Task>&, torch::autograd::Node*, torch::autograd::InputBuffer&) + 0x548 (0x7fe3367bc368 in /home/adc/anaconda3/envs/torch1536/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #3: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::Graph Task> const&, bool) + 0x3d2 (0x7fe3367be2f2 in /home/adc/anaconda3/envs/torch1536/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #4: torch::autograd::Engine::thread_init(int) + 0x39 (0x7fe3367b6969 in /home/adc/anaconda3/envs/torch1536/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #5: torch::autograd::python::PythonEngine::thread_init(int) + 0x38 (0x7fe339afcc38 in /home/adc/anaconda3/envs/torch1536/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #6: + 0xc9067 (0x7fe47f5a5067 in /home/adc/anaconda3/envs/torch1536/bin/…/lib/libstdc++.so.6)
frame #7: + 0x76db (0x7fe4831c86db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #8: clone + 0x3f (0x7fe48254471f in /lib/x86_64-linux-gnu/libc.so.6)

首先看报错的第一句:本应该是cudu tensor,结果出现了cpu tensor.
也就是我们想要用GPU进行训练,但是有些训练数据的格式还是CPU类型的,所以反向传播时的device不一致导致了错误。

首先检查model,input,label是否通过.to(device)或者.cuda()转移到了GPU上,然后检查loss的初始化,这个是比较隐蔽的。要看loss的初始化格式是否是cuda tensor。总的loss张量所在的device取决于新建的初始张量,如果忘了通过.to(device)部署在GPU上,那么即使每个部分的损失都是通过cuda tensor计算出来的,叠加的总loss依然是CPU上的 cpu tensor。此时,调用loss.backward()就会出现上述的错误。如

loss = torch.Tensor([0.0]).float()

这就需要改成

loss = torch.Tensor([0.0]).float().cuda()
  • 3
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值