使用torch.profiler记录模型训练的轨迹,并使用Tensorboard可视化和分析

Steps
  1. Prepare the data and model
  2. Use profiler to record execution events
  3. Run the profiler
  4. Use TensorBoard to view results and analyze model performance
  5. Improve performance with the help of profiler
  6. Analyze performance with other advanced features
  7. Additional Practices: Profiling PyTorch on AMD GPUs
1. Prepare the data and model

导入需要的库:

import torch
import torch.nn
import torch.optim
import torch.profiler
import torch.utils.data
import torchvision.datasets
import torchvision.models
import torchvision.transforms as T

准备数据集

transform = T.Compose(
    [T.Resize(224),
     T.ToTensor(),
     T.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
train_set = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_set, batch_size=32, shuffle=True)

模型定义

device = torch.device("cuda:0")
model = torchvision.models.resnet18(weights='IMAGENET1K_V1').cuda(device)
criterion = torch.nn.CrossEntropyLoss().cuda(device)
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
model.train()

模型训练

def train(data):
    inputs, labels = data[0].to(device=device), data[1].to(device=device)
    outputs = model(inputs)
    loss = criterion(outputs, labels)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
2. 使用Profiler记录轨迹

some useful parameters are as follow:

schedule: 参数例如wait=1,warmup=1,active=3,repeat=1(profiler 会跳过第一个step/iteration,在第二个iter热身,记录三个iter。). In total, the cycle repeats once. Each cycle is called a “span” in TensorBoard plugin.

wait阶段,profiler 不生效,在warmup 阶段,proliler 开始工作但不记录结果,是为了减少开销,proliling 的开始开销很大,会影响结果。

on_trace_ready : 在每个cylce结束时调用,例如使用torch.profiler.tensorboard_trace_handler来时生成Tensorboard使用的结果文件,在Profiling后,结果文件存储在./log/resnet18中。

record_shapes:是否记录输入张亮的形状

profile_memory: 追踪张量空间申请和释放。

with_stack:记录算子的代码信息,如果在vscode中集成TensorBoard, 单击可以跳转到特定行。

https://code.visualstudio.com/docs/datascience/pytorch-support#_tensorboard-integration

以上下文管理器启动/停止:

with torch.profiler.profile(
        schedule=torch.profiler.schedule(wait=1, warmup=1, active=3, repeat=1),
        on_trace_ready=torch.profiler.tensorboard_trace_handler('./log/resnet18'),
        record_shapes=True,
        profile_memory=True,
        with_stack=True
) as prof:
    for step, batch_data in enumerate(train_loader):
        prof.step()  # Need to call this at each step to notify profiler of steps' boundary.
        if step >= 1 + 1 + 3:
            break
        train(batch_data)

也可以以非上下文管理器启动/停止:

prof = torch.profiler.profile(
        schedule=torch.profiler.schedule(wait=1, warmup=1, active=3, repeat=1),
        on_trace_ready=torch.profiler.tensorboard_trace_handler('./log/resnet18'),
        record_shapes=True,
        with_stack=True)
prof.start()
for step, batch_data in enumerate(train_loader):
    prof.step()
    if step >= 1 + 1 + 3:
        break
    train(batch_data)
prof.stop()
3. 运行profiler
4. 使用Tensorboard展示结果

安装Pytorch Profiler TensorBoard Plugin

pip install torch_tb_profiler

登录TensorBoard

tensorboard --logdir=./log

打开TensorBoard

http://localhost:6006/#pytorch_profiler
  • 6
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值