使用 TensorBoard 可视化模型、数据和训练

白酒永远的神

已于 2024-06-01 22:10:49 修改

阅读量106

点赞数

文章标签： pytorch

于 2024-04-07 08:48:38 首次发布

Visualizing Models, Data, and Training with TensorBoard

In the 60 Minute Blitz, we show you how to load in data, feed it through a model we define as a subclass of nn.Module, train this model on training data, and test it on test data. To see what’s happening, we print out some statistics as the model is training to get a sense for whether training is progressing. However, we can do much better than that: PyTorch integrates with TensorBoard, a tool designed for visualizing the results of neural network training runs. This tutorial illustrates some of its functionality, using the Fashion-MNIST dataset which can be read into PyTorch using torchvision.datasets.

In this tutorial, we’ll learn how to:

Read in data and with appropriate transforms (nearly identical to the prior tutorial).
Set up TensorBoard.
Write to TensorBoard.
Inspect a model architecture using TensorBoard.
Use TensorBoard to create interactive versions of the visualizations we created in last tutorial, with less code

Specifically, on point #5, we’ll see:

A couple of ways to inspect our training data
How to track our model’s performance as it trains
How to assess our model’s performance once it is trained.

We’ll begin with similar boilerplate code as in the CIFAR-10 tutorial:

在60 分钟速成中，我们将向您展示如何加载数据、将数据输入我们定义为 nn.Module 子类的模型、在训练数据上训练该模型以及在测试数据上测试该模型。为了了解发生了什么，我们会在模型训练时打印出一些统计数据，以了解训练是否正在进行。不过，我们可以做得更好：PyTorch 与 TensorBoard 集成，TensorBoard 是一款用于可视化神经网络训练结果的工具。本教程使用 Fashion-MNIST 数据集说明了它的部分功能，可以使用 torchvision.datasets 将该数据集读入 PyTorch。

在本教程中，我们将学习如何

读入数据并进行适当的变换（与上一教程几乎相同）。
设置 TensorBoard。
写入 TensorBoard。
使用 TensorBoard 检查模型架构。
使用 TensorBoard 创建我们在上一教程中创建的交互式可视化版本，代码量更少

具体来说，关于第 5 点，我们拭目以待：

检查训练数据的几种方法
如何在训练过程中跟踪模型的性能
训练完成后如何评估模型的性能。

我们将从 CIFAR-10 教程中类似的模板代码开始：

# imports
import matplotlib.pyplot as plt
import numpy as np

import torch
import torchvision
import torchvision.transforms as transforms

import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

# transforms
transform = transforms.Compose(
    [transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))])

# datasets
trainset = torchvision.datasets.FashionMNIST('./data',
    download=True,
    train=True,
    transform=transform)
testset = torchvision.datasets.FashionMNIST('./data',
    download=True,
    train=False,
    transform=transform)

# dataloaders
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                        shuffle=True, num_workers=2)


testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                        shuffle=False, num_workers=2)

# constant for classes
classes = ('T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
        'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle Boot')

# helper function to show an image
# (used in the `plot_classes_preds` function below)
def matplotlib_imshow(img, one_channel=False):
    if one_channel:
        img = img.mean(dim=0)
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    if one_channel:
        plt.imshow(npimg, cmap="Greys")
    else:
        plt.imshow(np.transpose(npimg, (1, 2, 0)))

We’ll define a similar model architecture from that tutorial, making only minor modifications to account for the fact that the images are now one channel instead of three and 28x28 instead of 32x32:

我们将从该教程中定义一个类似的模型架构，只需稍作修改即可，因为图像现在是单通道而不是三通道，28x28 而不是 32x32：

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 4 * 4, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 4 * 4)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()

We’ll define the same optimizer and criterion from before:

我们将定义与之前相同的优化器和标准：

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

1. TensorBoard setup

Now we’ll set up TensorBoard, importing tensorboard from torch.utils and defining a SummaryWriter, our key object for writing information to TensorBoard.

现在，我们将设置 TensorBoard，从 torch.utils 中导入 tensorboard，并定义摘要写入器（SummaryWriter），这是我们向 TensorBoard 写入信息的关键对象。

from torch.utils.tensorboard import SummaryWriter

# default `log_dir` is "runs" - we'll be more specific here
writer = SummaryWriter('runs/fashion_mnist_experiment_1')

Note that this line alone creates a runs/fashion_mnist_experiment_1 folder.

请注意，仅这一行就创建了 runs/fashion_mnist_experiment_1 文件夹。

2. Writing to TensorBoard

Now let’s write an image to our TensorBoard - specifically, a grid - using make_grid.

现在，让我们使用 make_grid 向我们的 TensorBoard 写入一幅图像，具体来说，写入一个网格。

# get some random training images
dataiter = iter(trainloader)
images, labels = next(dataiter)

# create grid of images
img_grid = torchvision.utils.make_grid(images)

# show images
matplotlib_imshow(img_grid, one_channel=True)

# write to tensorboard
writer.add_image('four_fashion_mnist_images', img_grid)

Now running

现在运行

tensorboard --logdir=runs

from the command line and then navigating to http://localhost:6006 should show the following.

然后导航到 http://localhost:6006，应该会显示如下内容。

../_static/img/tensorboard_first_view.png

Now you know how to use TensorBoard! This example, however, could be done in a Jupyter Notebook - where TensorBoard really excels is in creating interactive visualizations. We’ll cover one of those next, and several more by the end of the tutorial.

现在你知道如何使用 TensorBoard 了吧！不过，这个例子也可以在 Jupyter Notebook 中完成–TensorBoard 的真正优势在于创建交互式可视化。接下来我们将介绍其中一个，教程结束时还将介绍其他几个。

3. Inspect the model using TensorBoard

One of TensorBoard’s strengths is its ability to visualize complex model structures. Let’s visualize the model we built.

TensorBoard 的优势之一是能够可视化复杂的模型结构。让我们来可视化一下我们建立的模型。

writer.add_graph(net, images)
writer.close()

Now upon refreshing TensorBoard you should see a “Graphs” tab that looks like this:

现在，刷新 TensorBoard 后，你会看到一个 "Graphs（图表）"选项卡，看起来像这样：

../_static/img/tensorboard_model_viz.png

Go ahead and double click on “Net” to see it expand, seeing a detailed view of the individual operations that make up the model.

TensorBoard has a very handy feature for visualizing high dimensional data such as image data in a lower dimensional space; we’ll cover this next.

双击 “Net”（网络），它就会展开，显示构成模型的各个操作的详细视图。

TensorBoard 有一个非常方便的功能，用于在低维空间中可视化高维数据（如图像数据）；我们将在下一章节介绍。

4. Adding a “Projector” to TensorBoard

We can visualize the lower dimensional representation of higher dimensional data via the add_embedding method

我们可以通过 add_embedding 方法将高维数据的低维表示可视化

# helper function
def select_n_random(data, labels, n=100):
    '''
    Selects n random datapoints and their corresponding labels from a dataset
    '''
    assert len(data) == len(labels)

    perm = torch.randperm(len(data))
    return data[perm][:n], labels[perm][:n]

# select random images and their target indices
images, labels = select_n_random(trainset.data, trainset.targets)

# get the class labels for each image
class_labels = [classes[lab] for lab in labels]

# log embeddings
features = images.view(-1, 28 * 28)
writer.add_embedding(features,
                    metadata=class_labels,
                    label_img=images.unsqueeze(1))
writer.close()

Now in the “Projector” tab of TensorBoard, you can see these 100 images - each of which is 784 dimensional - projected down into three dimensional space. Furthermore, this is interactive: you can click and drag to rotate the three dimensional projection. Finally, a couple of tips to make the visualization easier to see: select “color: label” on the top left, as well as enabling “night mode”, which will make the images easier to see since their background is white:

现在，在 TensorBoard 的 "投影仪 "选项卡中，你可以看到这 100 张图片被投影到三维空间中，每张图片都有 784 个维度。此外，这还是交互式的：你可以点击并拖动来旋转三维投影。最后，有几个小贴士可以让可视化更容易看清：选择左上角的 “颜色：标签”，以及启用 “夜间模式”，这将使图像更容易看清，因为它们的背景是白色的：

../_static/img/tensorboard_projector.png

Now we’ve thoroughly inspected our data, let’s show how TensorBoard can make tracking model training and evaluation clearer, starting with training.

现在，我们已经彻底检查了我们的数据，让我们从训练开始，展示 TensorBoard 如何使跟踪模型的训练和评估更加清晰。

5. Tracking model training with TensorBoard

In the previous example, we simply printed the model’s running loss every 2000 iterations. Now, we’ll instead log the running loss to TensorBoard, along with a view into the predictions the model is making via the plot_classes_preds function.

在上一个示例中，我们只是每迭代 2000 次打印模型的运行损失。现在，我们将把运行损失记录到 TensorBoard 上，并通过 plot_classes_preds 函数查看模型的预测结果。

# helper functions

def images_to_probs(net, images):
    '''
    Generates predictions and corresponding probabilities from a trained
    network and a list of images
    '''
    output = net(images)
    # convert output probabilities to predicted class
    _, preds_tensor = torch.max(output, 1)
    preds = np.squeeze(preds_tensor.numpy())
    return preds, [F.softmax(el, dim=0)[i].item() for i, el in zip(preds, output)]


def plot_classes_preds(net, images, labels):
    '''
    Generates matplotlib Figure using a trained network, along with images
    and labels from a batch, that shows the network's top prediction along
    with its probability, alongside the actual label, coloring this
    information based on whether the prediction was correct or not.
    Uses the "images_to_probs" function.
    '''
    preds, probs = images_to_probs(net, images)
    # plot the images in the batch, along with predicted and true labels
    fig = plt.figure(figsize=(12, 48))
    for idx in np.arange(4):
        ax = fig.add_subplot(1, 4, idx+1, xticks=[], yticks=[])
        matplotlib_imshow(images[idx], one_channel=True)
        ax.set_title("{0}, {1:.1f}%\n(label: {2})".format(
            classes[preds[idx]],
            probs[idx] * 100.0,
            classes[labels[idx]]),
                    color=("green" if preds[idx]==labels[idx].item() else "red"))
    return fig

Finally, let’s train the model using the same model training code from the prior tutorial, but writing results to TensorBoard every 1000 batches instead of printing to console; this is done using the add_scalar function.

In addition, as we train, we’ll generate an image showing the model’s predictions vs. the actual results on the four images included in that batch.

最后，让我们使用之前教程中的相同模型训练代码来训练模型，但每 1000 次将结果写入 TensorBoard，而不是打印到控制台；这是使用 add_scalar 函数完成的。

此外，在训练过程中，我们将生成一张图片，显示模型的预测结果与该批次中四张图片的实际结果。

running_loss = 0.0
for epoch in range(1):  # loop over the dataset multiple times

    for i, data in enumerate(trainloader, 0):

        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        if i % 1000 == 999:    # every 1000 mini-batches...

            # ...log the running loss
            writer.add_scalar('training loss',
                            running_loss / 1000,
                            epoch * len(trainloader) + i)

            # ...log a Matplotlib Figure showing the model's predictions on a
            # random mini-batch
            writer.add_figure('predictions vs. actuals',
                            plot_classes_preds(net, inputs, labels),
                            global_step=epoch * len(trainloader) + i)
            running_loss = 0.0
print('Finished Training')

You can now look at the scalars tab to see the running loss plotted over the 15,000 iterations of training:

现在，您可以查看“标量”选项卡，查看 15000 次迭代训练的运行损失曲线：

../_static/img/tensorboard_scalar_runs.png

In addition, we can look at the predictions the model made on arbitrary batches throughout learning. See the “Images” tab and scroll down under the “predictions vs. actuals” visualization to see this; this shows us that, for example, after just 3000 training iterations, the model was already able to distinguish between visually distinct classes such as shirts, sneakers, and coats, though it isn’t as confident as it becomes later on in training:

此外，我们还可以查看模型在整个学习过程中对任意批次所做的预测。请参见 "图像 "选项卡，并向下滚动 "预测与实际对比 "可视化图标，即可看到这一点；例如，这向我们展示了在仅仅经过 3000 次训练迭代后，模型就已经能够区分衬衫、运动鞋和大衣等视觉上截然不同的类别，尽管它并不像训练后期那样自信：

../_static/img/tensorboard_images.png

In the prior tutorial, we looked at per-class accuracy once the model had been trained; here, we’ll use TensorBoard to plot precision-recall curves (good explanation here) for each class.

在之前的教程中，我们查看了模型训练完成后的每类精度；在这里，我们将使用 TensorBoard 绘制每类的精度-召回曲线（这里有很好的解释）。

6. Assessing trained models with TensorBoard

# 1. gets the probability predictions in a test_size x num_classes Tensor
# 2. gets the preds in a test_size Tensor
# takes ~10 seconds to run
class_probs = []
class_label = []
with torch.no_grad():
    for data in testloader:
        images, labels = data
        output = net(images)
        class_probs_batch = [F.softmax(el, dim=0) for el in output]

        class_probs.append(class_probs_batch)
        class_label.append(labels)

test_probs = torch.cat([torch.stack(batch) for batch in class_probs])
test_label = torch.cat(class_label)

# helper function
def add_pr_curve_tensorboard(class_index, test_probs, test_label, global_step=0):
    '''
    Takes in a "class_index" from 0 to 9 and plots the corresponding
    precision-recall curve
    '''
    tensorboard_truth = test_label == class_index
    tensorboard_probs = test_probs[:, class_index]

    writer.add_pr_curve(classes[class_index],
                        tensorboard_truth,
                        tensorboard_probs,
                        global_step=global_step)
    writer.close()

# plot all the pr curves
for i in range(len(classes)):
    add_pr_curve_tensorboard(i, test_probs, test_label)

You will now see a “PR Curves” tab that contains the precision-recall curves for each class. Go ahead and poke around; you’ll see that on some classes the model has nearly 100% “area under the curve”, whereas on others this area is lower:

现在您将看到一个 "PR 曲线 "选项卡，其中包含每个类别的精确度-召回曲线。你会发现，在某些类别中，模型的 "曲线下面积 "接近 100%，而在另一些类别中，这一面积则较小：

../_static/img/tensorboard_pr_curves.png

And that’s an intro to TensorBoard and PyTorch’s integration with it. Of course, you could do everything TensorBoard does in your Jupyter Notebook, but with TensorBoard, you gets visuals that are interactive by default.

这就是 TensorBoard 和 PyTorch 集成的介绍。当然，你也可以在 Jupyter Notebook 中完成 TensorBoard 所做的一切，但使用 TensorBoard，你将获得默认的交互式可视化效果。