感谢下面这篇文章,问题解决了:
https://blog.csdn.net/andyL_05/article/details/107952479
报错是这样的:
RuntimeError: Function AddBackward0 returned an invalid gradient at index 1 - expected type TensorOptions(dtype=float, device=cuda:0, layout=Strided, requires_grad=false) but got TensorOptions(dtype=float, device=cpu, layout=Strided, requires_grad=false) (validate_outputs at /pytorch/torch/csrc/autograd/engine.cpp:484)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x46 (0x7f4228517536 in /usr/local/lib/python3.8/dist-packages/torch/lib/libc10.so)
frame #1: + 0x2d84224 (0x7f418672b224 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so)
frame #2: torch::autograd::Engine::evaluate_function(std::shared_ptrtorch::autograd::GraphTask&, torch::autograd::Node*, torch::autograd::InputBuffer&) + 0x548 (0x7f418672cd58 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so)
frame #3: torch::autograd::Engine::thread_main(std::shared_ptrtorch::autograd::GraphTask const&, bool) + 0x3d2 (0x7f418672ece2 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so)
frame #4: torch::autograd::Engine::thread_init(int) + 0x39 (0x7f4186727359 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so)
frame #5: torch::autograd::python::PythonEngine::thread_init(int) + 0x38 (0x7f422f0ad3d8 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so)
说的是反向传播需要数据在GPU上,然而却得到了CPU的数据。但是我明明是都.to(cuda0)了。
原因是在模型中,前向传播函数里面又定义了新的变量,但是这个变量默认是在cpu上定义的。
解决方法就是:
x = torch.tensor([0.]).cuda()
这也就OK啦!