在2080ti上运行分类模型时遇到了该问题,检查模型本身没有发现错误,最终确认是验证集评估阶段的张量计算非常占用空间。
法1. 可以对利用torch.tensor().detach().cpu().numpy()转为numpy,在cpu上进行loss和acc的计算
法2. 直接对评估阶段使用with torch.no_grad():
for step, (img, label) in enumerate(dataloader):
......
if (step + 1) % opt.print_interval_steps == 0:
with torch.no_grad():
'''验证集上评估模型'''
print("evaluate the performance on validate data")
total_loss_val = torch.zeros(opt.batch_size).to(device)
total_acc_val = torch.zeros(opt.batch_size).to(device)
for img_val, label_val in tqdm(val_dataloader):
img_val = img_val.to(device)
label_val = label_val.to(device)
y_pred_val = resnet18_model(img_val)
total_loss_val += loss_func(y_pred_val, label_val)
y_pred_class = torch.argmax(y_pred_val)
total_acc_val += (y_pred_class == label_val)
loss_val = torch.sum(total_loss_val) / len(val_db)
acc_val = torch.sum(total_acc_val) / len(val_db)