使用torch.profiler记录模型训练的轨迹，并使用Tensorboard可视化和分析

最新推荐文章于 2024-06-12 11:57:22 发布

牛码当驴

最新推荐文章于 2024-06-12 11:57:22 发布

阅读量793

点赞数 6

分类专栏：工具文章标签： python pytorch

本文链接：https://blog.csdn.net/weixin_46091520/article/details/138216342

版权

工具专栏收录该内容

4 篇文章 0 订阅

订阅专栏

Steps

Prepare the data and model
Use profiler to record execution events
Run the profiler
Use TensorBoard to view results and analyze model performance
Improve performance with the help of profiler
Analyze performance with other advanced features
Additional Practices: Profiling PyTorch on AMD GPUs

1. Prepare the data and model

导入需要的库:

import torch
import torch.nn
import torch.optim
import torch.profiler
import torch.utils.data
import torchvision.datasets
import torchvision.models
import torchvision.transforms as T

准备数据集

transform = T.Compose(
    [T.Resize(224),
     T.ToTensor(),
     T.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
train_set = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_set, batch_size=32, shuffle=True)

模型定义

device = torch.device("cuda:0")
model = torchvision.models.resnet18(weights='IMAGENET1K_V1').cuda(device)
criterion = torch.nn.CrossEntropyLoss().cuda(device)
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
model.train()

模型训练

def train(data):
    inputs, labels = data[0].to(device=device), data[1].to(device=device)
    outputs = model(inputs)
    loss = criterion(outputs, labels)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

2. 使用Profiler记录轨迹

some useful parameters are as follow:

schedule: 参数例如wait=1,warmup=1,active=3,repeat=1(profiler 会跳过第一个step/iteration，在第二个iter热身，记录三个iter。). In total, the cycle repeats once. Each cycle is called a “span” in TensorBoard plugin.

在wait阶段，profiler 不生效，在warmup 阶段，proliler 开始工作但不记录结果，是为了减少开销，proliling 的开始开销很大，会影响结果。

on_trace_ready : 在每个cylce结束时调用，例如使用torch.profiler.tensorboard_trace_handler来时生成Tensorboard使用的结果文件，在Profiling后，结果文件存储在./log/resnet18中。

record_shapes：是否记录输入张亮的形状

profile_memory: 追踪张量空间申请和释放。

with_stack：记录算子的代码信息，如果在vscode中集成TensorBoard, 单击可以跳转到特定行。

https://code.visualstudio.com/docs/datascience/pytorch-support#_tensorboard-integration

以上下文管理器启动/停止：

with torch.profiler.profile(
        schedule=torch.profiler.schedule(wait=1, warmup=1, active=3, repeat=1),
        on_trace_ready=torch.profiler.tensorboard_trace_handler('./log/resnet18'),
        record_shapes=True,
        profile_memory=True,
        with_stack=True
) as prof:
    for step, batch_data in enumerate(train_loader):
        prof.step()  # Need to call this at each step to notify profiler of steps' boundary.
        if step >= 1 + 1 + 3:
            break
        train(batch_data)

也可以以非上下文管理器启动/停止：

prof = torch.profiler.profile(
        schedule=torch.profiler.schedule(wait=1, warmup=1, active=3, repeat=1),
        on_trace_ready=torch.profiler.tensorboard_trace_handler('./log/resnet18'),
        record_shapes=True,
        with_stack=True)
prof.start()
for step, batch_data in enumerate(train_loader):
    prof.step()
    if step >= 1 + 1 + 3:
        break
    train(batch_data)
prof.stop()

3. 运行profiler

4. 使用Tensorboard展示结果

安装Pytorch Profiler TensorBoard Plugin

pip install torch_tb_profiler

登录TensorBoard

tensorboard --logdir=./log

打开TensorBoard

http://localhost:6006/#pytorch_profiler

牛码当驴

关注

6
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
使用torch.profiler记录模型训练的轨迹，并使用Tensorboard可视化和分析

使用 torch.profiler记录模型训练轨迹，并使用Tensorboard进行可视化分析，首先导入需要的库，准备模型和数据集，设置记录器，生成json格式的文件，最后通过Tensorboard可视化。
复制链接

扫一扫