TensorFlow
或 Keras
配合 TensorBoard
使用,能非常便捷的可视化网络的各种结构细节和参数变化(参考 《Tensorflow | 莫烦 》learning notes),本博客介绍 pytorch 如何配合 tensorboard 使用!
下面用一个例子介绍下 pytorch 中使用 tensorboard 记录 loss,acc 和 learn rate
大致流程如下 (pytorch中使用tensorboard查看损失)
代码参考 pytorch训练自己图像分类数据集
from tensorboardX import SummaryWriter
epochs = 71
milestones = [20,40,50,60,70]
optimizer = torch.optim.Adam(net.parameters(),lr=lr, weight_decay=weight_decay)
lr_scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer,
milestones=milestones, last_epoch= -1)
loss_func = nn.CrossEntropyLoss()
writer = SummaryWriter(comment="ResNet")
for epoch in range(epochs): # 遍历每个 epcoch
net.train() # 网络训练模式
iteration = 0
average_loss_epoch = 0 # 记录损失
train_acc_epoch = 0 # 记录精度
for batch_images, batch_labels in train_loader: # 遍历每次 batch
batch_images, batch_labels = batch_images.cuda(), batch_labels.cuda()
out = net(batch_images)
loss = loss_func(out, batch_labels)
average_loss_batch = loss
prediction = torch.max(out, 1)[1]
train_correct = (prediction == batch_labels).sum()
## 这里得到的train_correct是一个longtensor型,需要转换为float
train_acc_batch = (train_correct.float()) / batch_size
optimizer.zero_grad() # 清空梯度信息,否则在每次进行反向传播时都会累加
loss.backward() # loss反向传播
optimizer.step() # 梯度更新
iteration += 1
# train acc and loss of each iteration
print("Epoch: %d/%d || iteration: %d || average_loss_batch: %.3f || train_acc_batch: %.5f"
% (epoch, epochs, iteration, average_loss_batch, train_acc_batch))
average_loss_epoch += average_loss_batch # sum the batch results
train_acc_epoch += train_acc_batch # sum the batch results
lr_scheduler.step()
average_loss_epoch = average_loss_epoch / iteration # compute the average results
train_acc_epoch = train_acc_epoch / iteration
# write to show loss/acc/lr on tensorboard
writer.add_scalar('Train/Loss',average_loss_epoch,epoch)
writer.add_scalar('Train/Acc',train_acc_epoch,epoch)
writer.add_scalar('Train/Learning Rate',optimizer.state_dict()['param_groups'][0]['lr'],epoch)
writer.close()
上述代码用到了 MultiStepLR
,每隔设定的 epoch(milestones) loss 下降一个数量级
涉及到 write 部分,就是使用 tensorboard 记录参数变化的部分
训练时,会在目录下生成 runs
文件夹,在终端中启动 tensorboard
,目录指向 runs
文件夹下即可
tensorboard --logdir runs
启动 tensorboard 成功后,在浏览器中输入上面提示的网址 http://localhost:6006
即可查看刚才记录的 loss,acc 和 learning rate