1. backward之后x.grad为None
首先检查是否为叶节点,叶节点的require_grad属性是否为True。也有可能是使用的模型中存在torch.max等无法算梯度的操作导致其上层的参数无法计算梯度。
2. Missing key(s) in state_dict:…
这属于单/多GPU之间模型加载问题,以下总结单/多GPU之间模型加载方法:
# 单GPU加载多GPU模型
model.cuda()
checkpoint = torch.load('model.pth')
model.load_state_dict({k.replace('module.', ''): v for k, v in checkpoint["backdoor"].items()})
3. TypeError: conv1d() received an invalid combination of arguments
可能是因为传入的数据的格式不一致,尤其注意检查是numpy.ndarray还是torch.tensor类型。
4. return torch.batch_norm(\ RuntimeError: running_mean should contain 61 elements not 64
batchnormalize层是对整个batch进行规范化的,如果传入的数据缺少batch维度的话会报错,因此需要检查传入数据的维度。一条数据可以用data.unsqueeze(0)在增加一个维度。
5. ValueError: optimizer got an empty parameter list
# 原代码
self.uap = nn.Parameter(torch.zeros(size=(shape), requires_grad=True)).cuda()
报错后调试发现加了cuda之后参数的数量变为0,修改代码如下:
self.uap = nn.Parameter(torch.zeros(size=(shape), requires_grad=True))
6. return torch._C._nn.linear(input, weight, bias) RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)
# 原代码
embedding_dim = 256
self.Cross_attention = CrossAttention(in_dim=embedding_dim, out_dim=embedding_dim, in_q_dim=embedding_dim, hid_q_dim=embedding_dim)
而我的输入的维度是(batch_size, 1920, 10),导致全连接层维度不匹配。修改代码如下:
embedding_dim = 10
self.Cross_attention = CrossAttention(in_dim=embedding_dim, out_dim=embedding_dim, in_q_dim=embedding_dim, hid_q_dim=embedding_dim)
7. AssertionError: Gather function not implemented for CPU tensors
首先检查数据是不是都在gpu上;如果都在gpu了,可能是模型中某些功能或函数不支持gpu处理。
经过调试发现,在模型中返回torch.zeros创建的张量时会报错
# 源代码
def getBatchVideoSimilarity(self, images):
num = images.shape[2] - self.FOV * 2
length = self.FOV * 2 + 1
similarytyDistributionListBatch = torch.zeros((images.shape[0], num, length))
for i in range(images.shape[0]):
similarytyDistributionListBatch[i] = self.getVideoSimilaryty(images[0], FOV=self.FOV)
return similarytyDistributionListBatch
# 报错
assert all(i.device.type != 'cpu' for i in inputs), (
AssertionError: Gather function not implemented for CPU tensors
# 改成如下则不存在该错误
def getBatchVideoSimilarity(self, images):
num = images.shape[2] - self.FOV * 2
length = self.FOV * 2 + 1
similarytyDistributionListBatch = torch.zeros((images.shape[0], num, length), device=self.device)
for i in range(images.shape[0]):
similarytyDistributionListBatch[i] = self.getVideoSimilaryty(images[0], FOV=self.FOV)
return images
8. RuntimeError: Caught RuntimeError in replica 0 on device 0
# 原代码
predict_set = simi_model(video_set, audio)
predict_set = predict_set.unsqueeze(2)
dist_reg_model.train()
mask = torch.zeros((predict_set.shape[0], predict_set.shape[1]), device=predict_set.device)
target = predict_set[:, 1:]
target = nn.functional.softmax(target.float(), dim=1)
input = predict_set[:, :-1]
input_mask_ = mask[:, :-1]
logit= dist_reg_model(input.float(), input_mask_)
RuntimeError: Caught RuntimeError in replica 0 on device 0
Original Traceback (most recent call last):
File "/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "xxx.py", line 333, in forward
x += self.pos_embedding
RuntimeError: output with shape [5, 45, 1, 256] doesn't match the broadcast shape [5, 45, 45, 256]
可能是由于输入的形状问题报错,进而导致复制数据的时候报错。因此修改输入的形状为合适的即可
predict_set = simi_model(video_set, audio)
# predict_set = predict_set.unsqueeze(2)
dist_reg_model.train()
mask = torch.zeros((predict_set.shape[0], predict_set.shape[1]), device=predict_set.device)
target = predict_set[:, 1:]
target = nn.functional.softmax(target.float(), dim=1)
input = predict_set[:, :-1]
input_mask_ = mask[:, :-1]
logit= dist_reg_model(input.float(), input_mask_)
9. RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.
由于我的代码之前都能跑通没有这个cuDNN的环境问题,排除环境问题,而且也没有超出显存
方法1:可能是由于batch size过大了,把batch size缩小之后可以跑通
方法2:代码中加入torch.backends.cudnn.enabled = False
,不使用cudnn进行后端加速计算