Steps
- Prepare the data and model
- Use profiler to record execution events
- Run the profiler
- Use TensorBoard to view results and analyze model performance
- Improve performance with the help of profiler
- Analyze performance with other advanced features
- Additional Practices: Profiling PyTorch on AMD GPUs
1. Prepare the data and model
导入需要的库:
import torch
import torch.nn
import torch.optim
import torch.profiler
import torch.utils.data
import torchvision.datasets
import torchvision.models
import torchvision.transforms as T
准备数据集
transform = T.Compose(
[T.Resize(224),
T.ToTensor(),
T.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
train_set = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_set, batch_size=32, shuffle=True)
模型定义
device = torch.device("cuda:0")
model = torchvision.models.resnet18(weights='IMAGENET1K_V1').cuda(device)
criterion = torch.nn.CrossEntropyLoss().cuda(device)
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
model.train()
模型训练
def train(data):
inputs, labels = data[0].to(device=device), data[1].to(device=device)
outputs = model(inputs)
loss = criterion(outputs, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
2. 使用Profiler记录轨迹
some useful parameters are as follow:
schedule
: 参数例如wait=1,warmup=1,active=3,repeat=1
(profiler 会跳过第一个step/iteration,在第二个iter热身,记录三个iter。). In total, the cycle repeats once. Each cycle is called a “span” in TensorBoard plugin.
在wait
阶段,profiler 不生效,在warmup
阶段,proliler 开始工作但不记录结果,是为了减少开销,proliling 的开始开销很大,会影响结果。
on_trace_ready
: 在每个cylce结束时调用,例如使用torch.profiler.tensorboard_trace_handler
来时生成Tensorboard使用的结果文件,在Profiling后,结果文件存储在./log/resnet18
中。
record_shapes
:是否记录输入张亮的形状
profile_memory
: 追踪张量空间申请和释放。
with_stack
:记录算子的代码信息,如果在vscode中集成TensorBoard, 单击可以跳转到特定行。
https://code.visualstudio.com/docs/datascience/pytorch-support#_tensorboard-integration
以上下文管理器启动/停止:
with torch.profiler.profile(
schedule=torch.profiler.schedule(wait=1, warmup=1, active=3, repeat=1),
on_trace_ready=torch.profiler.tensorboard_trace_handler('./log/resnet18'),
record_shapes=True,
profile_memory=True,
with_stack=True
) as prof:
for step, batch_data in enumerate(train_loader):
prof.step() # Need to call this at each step to notify profiler of steps' boundary.
if step >= 1 + 1 + 3:
break
train(batch_data)
也可以以非上下文管理器启动/停止:
prof = torch.profiler.profile(
schedule=torch.profiler.schedule(wait=1, warmup=1, active=3, repeat=1),
on_trace_ready=torch.profiler.tensorboard_trace_handler('./log/resnet18'),
record_shapes=True,
with_stack=True)
prof.start()
for step, batch_data in enumerate(train_loader):
prof.step()
if step >= 1 + 1 + 3:
break
train(batch_data)
prof.stop()
3. 运行profiler
4. 使用Tensorboard展示结果
安装Pytorch Profiler TensorBoard Plugin
pip install torch_tb_profiler
登录TensorBoard
tensorboard --logdir=./log
打开TensorBoard
http://localhost:6006/#pytorch_profiler