本来应该很早就写一个的,现在也不晚,慢慢记录吧
1、数据同时存在于CPU和GPU上
报错:RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
代码片段
class Linear(nn.Module):
def __init__(self,in_features=1024,out_features=100):
super().__init__()
self.weights = nn.Parameter(torch.Tensor(in_features,out_features))
self.bias = torch.Tensor(out_features)
self.bias_ = torch.Tensor(out_features)
self.register_buffer('bias_0', bias_)
def forward(self,x:torch.Tensor):
return torch.matmul(x, self.weights,)+self.bias_
if __name__ == "__main__":
img = torch.randn([2,1024]).cuda()
net = Linear().cuda()
preds = net(img)
原因分析: 在Linear.cuda()
的过程中,self.bias_
和self.bias
并没有传递到GPU中,他们虽然在计算图里,但是.cuda()
只转移self._parameters
和self._buffers
中的参数,而self.bais
只是普通的Tensor
对象,所以在这个网络中被转移的参数是self.weights
和self.bias_0
。nn.Module
的初始化如下:
def __init__(self):
"""
Initializes internal Module state, shared by both nn.Module and ScriptModule.
"""
torch._C._log_api_usage_once("python.nn_module")
self.training = True
self._parameters = OrderedDict()
self._buffers = OrderedDict()
self._non_persistent_buffers_set = set()
self._backward_hooks = OrderedDict()
self._is_full_backward_hook = None
self._forward_hooks = OrderedDict()
self._forward_pre_hooks = OrderedDict()
self._state_dict_hooks = OrderedDict()
self._load_state_dict_pre_hooks = OrderedDict()
self._modules = OrderedDict()
NOTE:torch.cuda()
和Module.cuda()
是不一样的,在nn.Module
中,self.register_buffer()
是将数据赋值到nn.Module._buffers
这个变量中,它是一个OrderDict
类,nn.Module._buffers
中的变量也可以通过self.XXX
直接调用,所以注册的