【pytorch笔记】（六）可视化

Kevin Davis

已于 2022-06-26 16:02:28 修改

阅读量1k

点赞数 2

分类专栏： pytorch 文章标签： pytorch 深度学习机器学习

于 2022-06-24 15:33:14 首次发布

本文链接：https://blog.csdn.net/weixin_44590417/article/details/125446492

版权

pytorch 专栏收录该内容

8 篇文章 2 订阅

订阅专栏

本文主要讲述pytorch可视化的一些内容，以下为本文目录：

1. TensorBoard 使用指南
2. torchinfo
3. profiler

1. TensorBoard 使用指南

随着深度学习的发展，网络模型越来越复杂，难以确定模型每一层的输入输出结构，如果能清晰了解模型结构和数据变化，将有助于我们更加高效地开发模型。TensorBoard 就是为此而生，他可以将代码运行过程中你感兴趣的数据保存在一个文件夹里，然后再读取文件夹中的数据，用浏览器显示出来。除了可视化模型结构外，TensorBoard 还可以记录并可视化训练过程的 loss 变化、可视化图像、连续变量、参数分布等，既是一个记录员，又是一位画家。

1.1 安装

在命令行中输入以下命令即可安装：

pip install tensorboardX

当然也可以使用 pytorch 自带的 tensorboard，这样就可以不用额外安装tensorboard了。

1.2 启动

tensorboard 的启动流程可以分为以下几步：

创建记录器writer

# 方式一：下载安装使用tensorboardX

  from tensorboardX import SummaryWriter

# 方式二：使用pytorch自带的tensorboard

  from torch.utils.tensorboard import SummaryWriter  

# 创建记录器
  log_dir = '/path/to/logs'  # 日志文件输出路径，可以自定义
  writer = SummaryWriter(log_dir)

写入并保存日志文件

  writer.add_graph(model, input_to_model=torch.rand(1, 3, 512, 512))
  writer.close()

在对应环境的命令行中输入以下命令，启动tensorboard

  tensorboard --log_dir=/path/to/logs --port=xxxx   #"/path/to/logs"对应前面的 log_dir
  # port为外部访问tensorboard的端口号，如果不是在服务器远程使用的话则不需要配置port

打开可视化界面

复制粘贴步骤3得到的网址，在浏览器中打开：http://localhost:6006/

举个栗子，假设我们要查看 resnet34模型的结构。

第一步，下载好resnet模型：

import torchvision.models as models
import os
os.environ['TORCH_HOME']='E:\pytorch\Data'  #修改模型保存路径，下载好后重新加载模型就会在这个目录下加载了

# pretrained = True表示下载使用预训练得到的权重，False表示不适用预训练得到的权重
resnet34 = models.resnet34(pretrained=True) 
print(resnet34)

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (2): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer2): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
	...

  (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
  (fc): Linear(in_features=512, out_features=1000, bias=True)
)

第二步，创建记录器，写入保存好日志文件：

import torch
import torch.nn as nn

# 方式一：下载安装使用tensorboardX
from tensorboardX import SummaryWriter

#方式二：使用pytorch自带的tensorboard
# from torch.utils.tensorboard import SummaryWriter  

writer = SummaryWriter('./runs')
writer.add_graph(resnet34, input_to_model=torch.rand(1, 3, 512, 512))
writer.close()

第三步：在当前环境对应的anaconda prompt 中输入以下命令，启动tensorboard：

tensorboard --logdir=runs

第四步，复制上一步命令行执行后得到的网址，粘贴到浏览器中即可查看tensorboard：

双击可展开模块，更多具体介绍可以查看这篇博客。

1.3 图像可视化

利用tensorboard可以方便地展示图片。

单张图片：add_image
多张图片：add_images
多合一：先用torchvision.utils.make_grid将多张图片拼成一张，然后用writer.add_image显示

以torchvision的 CIFAR10 数据集为例：

import torchvision
from tensorboardX import SummaryWriter
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

transform1 = transforms.Compose([transforms.ToTensor()])

train_data = datasets.CIFAR10(".", train=True, download=True, transform=transform1)
test_data = datasets.CIFAR10(".", train=False, download=True, transform=transform1)

batch_size = 256
train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size)

images, labels = next(iter(train_loader))
print(images.shape, labels.shape)

# 查看单张图片
writer = SummaryWriter('./picture')
writer.add_image('images[0]', images[0])
writer.close()

Files already downloaded and verified
Files already downloaded and verified
torch.Size([256, 3, 32, 32]) torch.Size([256])

请添加图片描述

将多张图片拼接成一张图片，中间用黑色网格分割：

writer = SummaryWriter('./picture')
img_grid = torchvision.utils.make_grid(images)
writer.add_image('imgaes_grid', img_grid)
writer.close()

请添加图片描述

将多张图片直接写入：

writer = SummaryWriter('./picture')
writer.add_images('images', images, global_step=0)
writer.close()

请添加图片描述

接下来我们修改一下 resnet34 的输出，然后跑一跑 CIFAR10 分类任务：

import numpy as np
import time

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print('Device: ', device)

model = resnet34
model.fc = nn.Linear(512, 10)  #修改模型

model = model.to(device)
writer = SummaryWriter('./runs')
writer.add_graph(model, input_to_model=images.to(device))
writer.close()

loss_fn = nn.CrossEntropyLoss()
lr = 1e-3
max_epochs = 5
optimizer = torch.optim.Adam(model.parameters(), lr=lr)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer=optimizer, step_size=5, gamma=0.8)

def train(max_epochs, save_dir):
    writer = SummaryWriter(save_dir)
    for epoch in range(max_epochs):
        model.train()
        train_loss, val_loss = 0, 0
        pred_label, true_label = [], []
        start_time = time.time()
        for images, labels in train_loader:
            images = images.to(device)
            labels = labels.to(device)
            optimizer.zero_grad()
            out = model(images)
            pred = torch.argmax(out, 1)
            loss = loss_fn(out, labels)
            loss.backward()
            optimizer.step()
            train_loss += loss.item() * images.size(0)
            pred_label.append(pred.cpu().data.numpy())
            true_label.append(labels.cpu().data.numpy())
        train_loss = train_loss / len(train_loader.dataset)
        true_label, pred_label = np.concatenate(true_label), np.concatenate(pred_label)
        train_acc = np.sum(true_label == pred_label) / len(pred_label)
        scheduler.step()
        lr0 = scheduler.get_last_lr()
        end_time = time.time()
        cost_time = end_time - start_time
        writer.add_scalar('train_loss', train_loss, epoch)
        writer.add_scalar('learning rate', lr0[-1], epoch)
        writer.add_scalar('Train_acc', train_acc, epoch)

        with torch.no_grad():
            model.eval()
            pred_labels, true_labels = [], []
            for images, labels in train_loader:
                images = images.to(device)
                labels = labels.to(device)
                optimizer.zero_grad()
                out = model(images)
                pred = torch.argmax(out, 1)
                loss = loss_fn(out, labels)
                val_loss += loss.item() * images.size(0)
                pred_labels.append(pred.cpu().data.numpy())
                true_labels.append(labels.cpu().data.numpy())
            val_loss = val_loss / len(train_loader.dataset)
            true_labels, pred_labels = np.concatenate(true_labels), np.concatenate(pred_labels)
            val_acc = np.sum(true_labels == pred_labels) / len(pred_labels)
            writer.add_scalar('Val_loss', val_loss, epoch)
            writer.add_scalar('Val_acc', val_acc, epoch)
            writer.close()

        print('Epoch: {}, Train_loss: {:.6f}, Train_acc:{:.2f}%, Val_loss: {:.6f}, Val_acc: {:.2f}%, time: {:.3f}s'.format(epoch, 
             train_loss, train_acc*100, val_loss, val_acc*100, cost_time))

Device:  cuda:0

%%time
import datetime

t0 = datetime.datetime.now().strftime('%Y-%m-%d-%H-%M')
save_dir = './runs/' + t0 + '/'
if not os.path.isdir(save_dir):
    os.makedirs(save_dir)
print('save_dir: ', save_dir)

train(max_epochs=max_epochs, save_dir=save_dir)

save_dir:  .runs/2022-06-24-12-20/
Epoch: 0, Train_loss: 0.137615, Train_acc:95.85, Val_loss: 0.066162, Val_acc: 97.97%, time: 50.250s
Epoch: 1, Train_loss: 0.080220, Train_acc:97.26, Val_loss: 0.066573, Val_acc: 97.76%, time: 53.099s
Epoch: 2, Train_loss: 0.069566, Train_acc:97.70, Val_loss: 0.065918, Val_acc: 97.72%, time: 49.805s
Epoch: 3, Train_loss: 0.069826, Train_acc:97.61, Val_loss: 0.074443, Val_acc: 97.47%, time: 54.373s
Epoch: 4, Train_loss: 0.065152, Train_acc:97.75, Val_loss: 0.065349, Val_acc: 97.85%, time: 55.126s
CPU times: total: 5min 52s
Wall time: 6min 23s

可以看到resnet34 的恐怖性能，无论是训练集还是验证集上都获得了97%左右的精度。

在这里插入图片描述

tensorboard 可以可视化不同时候跑出来的结果。其中绿色线是 resnet34 跑出来的结果，才跑了5个epoch，准确率却能一直稳定在97%以上，非常恐怖。

2. torchinfo

torchinfo 能够非常方便地输出模型的详细信息，比如每一层输出的shape，以及参数量。

安装方法：

# 方法一
pip install torch info

# 方法二
conda install -c conda-forge torchinfo

torchinfo 使用起来非常简单，只需调用torchinfo.summary()即可，输入参数为summary(model, input_size=[batch_size, channel, height, width])。

from torchinfo import summary
summary(model, (1, 3, 32, 32))  #1:batch_size, 3: 图片的通道数，32：图片高宽

==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
ResNet                                   [1, 10]                   --
├─Conv2d: 1-1                            [1, 64, 16, 16]           9,408
├─BatchNorm2d: 1-2                       [1, 64, 16, 16]           128
├─ReLU: 1-3                              [1, 64, 16, 16]           --
├─MaxPool2d: 1-4                         [1, 64, 8, 8]             --
├─Sequential: 1-5                        [1, 64, 8, 8]             --
│    └─BasicBlock: 2-1                   [1, 64, 8, 8]             --
│    │    └─Conv2d: 3-1                  [1, 64, 8, 8]             36,864
│    │    └─BatchNorm2d: 3-2             [1, 64, 8, 8]             128
│    │    └─ReLU: 3-3                    [1, 64, 8, 8]             --
│    │    └─Conv2d: 3-4                  [1, 64, 8, 8]             36,864
│    │    └─BatchNorm2d: 3-5             [1, 64, 8, 8]             128
│    │    └─ReLU: 3-6                    [1, 64, 8, 8]             --
│    └─BasicBlock: 2-2                   [1, 64, 8, 8]             --
│    │    └─Conv2d: 3-7                  [1, 64, 8, 8]             36,864
│    │    └─BatchNorm2d: 3-8             [1, 64, 8, 8]             128
│    │    └─ReLU: 3-9                    [1, 64, 8, 8]             --
│    │    └─Conv2d: 3-10                 [1, 64, 8, 8]             36,864
│    │    └─BatchNorm2d: 3-11            [1, 64, 8, 8]             128
│    │    └─ReLU: 3-12                   [1, 64, 8, 8]             --
│    └─BasicBlock: 2-3                   [1, 64, 8, 8]             --
│    │    └─Conv2d: 3-13                 [1, 64, 8, 8]             36,864
│    │    └─BatchNorm2d: 3-14            [1, 64, 8, 8]             128
│    │    └─ReLU: 3-15                   [1, 64, 8, 8]             --
│    │    └─Conv2d: 3-16                 [1, 64, 8, 8]             36,864
│    │    └─BatchNorm2d: 3-17            [1, 64, 8, 8]             128
│    │    └─ReLU: 3-18                   [1, 64, 8, 8]             --
├─Sequential: 1-6                        [1, 128, 4, 4]            --
│    └─BasicBlock: 2-4                   [1, 128, 4, 4]            --
│    │    └─Conv2d: 3-19                 [1, 128, 4, 4]            73,728
│    │    └─BatchNorm2d: 3-20            [1, 128, 4, 4]            256
│    │    └─ReLU: 3-21                   [1, 128, 4, 4]            --
│    │    └─Conv2d: 3-22                 [1, 128, 4, 4]            147,456
│    │    └─BatchNorm2d: 3-23            [1, 128, 4, 4]            256
│    │    └─Sequential: 3-24             [1, 128, 4, 4]            8,448
│    │    └─ReLU: 3-25                   [1, 128, 4, 4]            --
│    └─BasicBlock: 2-5                   [1, 128, 4, 4]            --
│    │    └─Conv2d: 3-26                 [1, 128, 4, 4]            147,456
│    │    └─BatchNorm2d: 3-27            [1, 128, 4, 4]            256
│    │    └─ReLU: 3-28                   [1, 128, 4, 4]            --
│    │    └─Conv2d: 3-29                 [1, 128, 4, 4]            147,456
│    │    └─BatchNorm2d: 3-30            [1, 128, 4, 4]            256
│    │    └─ReLU: 3-31                   [1, 128, 4, 4]            --
│    └─BasicBlock: 2-6                   [1, 128, 4, 4]            --
│    │    └─Conv2d: 3-32                 [1, 128, 4, 4]            147,456
│    │    └─BatchNorm2d: 3-33            [1, 128, 4, 4]            256
│    │    └─ReLU: 3-34                   [1, 128, 4, 4]            --
│    │    └─Conv2d: 3-35                 [1, 128, 4, 4]            147,456
│    │    └─BatchNorm2d: 3-36            [1, 128, 4, 4]            256
│    │    └─ReLU: 3-37                   [1, 128, 4, 4]            --
│    └─BasicBlock: 2-7                   [1, 128, 4, 4]            --
│    │    └─Conv2d: 3-38                 [1, 128, 4, 4]            147,456
│    │    └─BatchNorm2d: 3-39            [1, 128, 4, 4]            256
│    │    └─ReLU: 3-40                   [1, 128, 4, 4]            --
│    │    └─Conv2d: 3-41                 [1, 128, 4, 4]            147,456
│    │    └─BatchNorm2d: 3-42            [1, 128, 4, 4]            256
│    │    └─ReLU: 3-43                   [1, 128, 4, 4]            --
├─Sequential: 1-7                        [1, 256, 2, 2]            --
│    └─BasicBlock: 2-8                   [1, 256, 2, 2]            --
│    │    └─Conv2d: 3-44                 [1, 256, 2, 2]            294,912
│    │    └─BatchNorm2d: 3-45            [1, 256, 2, 2]            512
│    │    └─ReLU: 3-46                   [1, 256, 2, 2]            --
│    │    └─Conv2d: 3-47                 [1, 256, 2, 2]            589,824
│    │    └─BatchNorm2d: 3-48            [1, 256, 2, 2]            512
│    │    └─Sequential: 3-49             [1, 256, 2, 2]            33,280
│    │    └─ReLU: 3-50                   [1, 256, 2, 2]            --
│    └─BasicBlock: 2-9                   [1, 256, 2, 2]            --
│    │    └─Conv2d: 3-51                 [1, 256, 2, 2]            589,824
│    │    └─BatchNorm2d: 3-52            [1, 256, 2, 2]            512
│    │    └─ReLU: 3-53                   [1, 256, 2, 2]            --
│    │    └─Conv2d: 3-54                 [1, 256, 2, 2]            589,824
│    │    └─BatchNorm2d: 3-55            [1, 256, 2, 2]            512
│    │    └─ReLU: 3-56                   [1, 256, 2, 2]            --
│    └─BasicBlock: 2-10                  [1, 256, 2, 2]            --
│    │    └─Conv2d: 3-57                 [1, 256, 2, 2]            589,824
│    │    └─BatchNorm2d: 3-58            [1, 256, 2, 2]            512
│    │    └─ReLU: 3-59                   [1, 256, 2, 2]            --
│    │    └─Conv2d: 3-60                 [1, 256, 2, 2]            589,824
│    │    └─BatchNorm2d: 3-61            [1, 256, 2, 2]            512
│    │    └─ReLU: 3-62                   [1, 256, 2, 2]            --
│    └─BasicBlock: 2-11                  [1, 256, 2, 2]            --
│    │    └─Conv2d: 3-63                 [1, 256, 2, 2]            589,824
│    │    └─BatchNorm2d: 3-64            [1, 256, 2, 2]            512
│    │    └─ReLU: 3-65                   [1, 256, 2, 2]            --
│    │    └─Conv2d: 3-66                 [1, 256, 2, 2]            589,824
│    │    └─BatchNorm2d: 3-67            [1, 256, 2, 2]            512
│    │    └─ReLU: 3-68                   [1, 256, 2, 2]            --
│    └─BasicBlock: 2-12                  [1, 256, 2, 2]            --
│    │    └─Conv2d: 3-69                 [1, 256, 2, 2]            589,824
│    │    └─BatchNorm2d: 3-70            [1, 256, 2, 2]            512
│    │    └─ReLU: 3-71                   [1, 256, 2, 2]            --
│    │    └─Conv2d: 3-72                 [1, 256, 2, 2]            589,824
│    │    └─BatchNorm2d: 3-73            [1, 256, 2, 2]            512
│    │    └─ReLU: 3-74                   [1, 256, 2, 2]            --
│    └─BasicBlock: 2-13                  [1, 256, 2, 2]            --
│    │    └─Conv2d: 3-75                 [1, 256, 2, 2]            589,824
│    │    └─BatchNorm2d: 3-76            [1, 256, 2, 2]            512
│    │    └─ReLU: 3-77                   [1, 256, 2, 2]            --
│    │    └─Conv2d: 3-78                 [1, 256, 2, 2]            589,824
│    │    └─BatchNorm2d: 3-79            [1, 256, 2, 2]            512
│    │    └─ReLU: 3-80                   [1, 256, 2, 2]            --
├─Sequential: 1-8                        [1, 512, 1, 1]            --
│    └─BasicBlock: 2-14                  [1, 512, 1, 1]            --
│    │    └─Conv2d: 3-81                 [1, 512, 1, 1]            1,179,648
│    │    └─BatchNorm2d: 3-82            [1, 512, 1, 1]            1,024
│    │    └─ReLU: 3-83                   [1, 512, 1, 1]            --
│    │    └─Conv2d: 3-84                 [1, 512, 1, 1]            2,359,296
│    │    └─BatchNorm2d: 3-85            [1, 512, 1, 1]            1,024
│    │    └─Sequential: 3-86             [1, 512, 1, 1]            132,096
│    │    └─ReLU: 3-87                   [1, 512, 1, 1]            --
│    └─BasicBlock: 2-15                  [1, 512, 1, 1]            --
│    │    └─Conv2d: 3-88                 [1, 512, 1, 1]            2,359,296
│    │    └─BatchNorm2d: 3-89            [1, 512, 1, 1]            1,024
│    │    └─ReLU: 3-90                   [1, 512, 1, 1]            --
│    │    └─Conv2d: 3-91                 [1, 512, 1, 1]            2,359,296
│    │    └─BatchNorm2d: 3-92            [1, 512, 1, 1]            1,024
│    │    └─ReLU: 3-93                   [1, 512, 1, 1]            --
│    └─BasicBlock: 2-16                  [1, 512, 1, 1]            --
│    │    └─Conv2d: 3-94                 [1, 512, 1, 1]            2,359,296
│    │    └─BatchNorm2d: 3-95            [1, 512, 1, 1]            1,024
│    │    └─ReLU: 3-96                   [1, 512, 1, 1]            --
│    │    └─Conv2d: 3-97                 [1, 512, 1, 1]            2,359,296
│    │    └─BatchNorm2d: 3-98            [1, 512, 1, 1]            1,024
│    │    └─ReLU: 3-99                   [1, 512, 1, 1]            --
├─AdaptiveAvgPool2d: 1-9                 [1, 512, 1, 1]            --
├─Linear: 1-10                           [1, 10]                   5,130
==========================================================================================
Total params: 21,289,802
Trainable params: 21,289,802
Non-trainable params: 0
Total mult-adds (M): 74.78
==========================================================================================
Input size (MB): 0.01
Forward/backward pass size (MB): 1.22
Params size (MB): 85.16
Estimated Total Size (MB): 86.39
==========================================================================================

3. profiler

tensorboard 能够输出很多信息，但注意需要在训练过程中写好记录日志。

torchinfo能够输出模型每一层的详细信息及参数量。

除了这两个之外，我们还可以用 torch 自带的autograd.profiler来记录模型每一层所消耗的计算资源信息，如计算时长等。使用起来非常简单，用前文的数据和模型举个栗子：

with torch.autograd.profiler.profile(use_cuda=True) as prof:
    out = model(images.to(device))

print(prof)

---------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                             Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls  
---------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                         aten::to         0.19%      29.000us         8.36%       1.302ms       1.302ms       3.000us         0.00%       2.626ms       2.626ms             1  
                   aten::_to_copy         0.31%      48.000us         8.17%       1.273ms       1.273ms       3.000us         0.00%       2.623ms       2.623ms             1  
              aten::empty_strided         0.13%      21.000us         0.13%      21.000us      21.000us       1.000us         0.00%       1.000us       1.000us             1  
                      aten::copy_         7.73%       1.204ms         7.73%       1.204ms       1.204ms       2.619ms         4.01%       2.619ms       2.619ms             1  
                     aten::conv2d         0.10%      15.000us         1.75%     273.000us     273.000us       3.000us         0.00%       1.617ms       1.617ms             1  
                aten::convolution         0.22%      34.000us         1.66%     258.000us     258.000us       2.000us         0.00%       1.614ms       1.614ms             1  
               aten::_convolution         0.15%      24.000us         1.44%     224.000us     224.000us       2.000us         0.00%       1.612ms       1.612ms             1  
          aten::cudnn_convolution         1.24%     193.000us         1.28%     200.000us     200.000us       1.609ms         2.46%       1.610ms       1.610ms             1  
                      aten::empty         0.04%       7.000us         0.04%       7.000us       7.000us       1.000us         0.00%       1.000us       1.000us             1  
                 aten::batch_norm         0.07%      11.000us         0.81%     126.000us     126.000us       3.000us         0.00%     387.000us     387.000us             1  
     aten::_batch_norm_impl_index         0.08%      13.000us         0.74%     115.000us     115.000us       3.000us         0.00%     384.000us     384.000us             1  
           aten::cudnn_batch_norm         0.42%      66.000us         0.65%     102.000us     102.000us     374.000us         0.57%     381.000us     381.000us             1  
                 aten::empty_like         0.06%       9.000us         0.12%      19.000us      19.000us       2.000us         0.00%       3.000us       3.000us             1  
                      aten::empty         0.06%      10.000us         0.06%      10.000us      10.000us       1.000us         0.00%       1.000us       1.000us             1  
                       aten::view         0.04%       7.000us         0.04%       7.000us       7.000us       1.000us         0.00%       1.000us       1.000us             1  
                      aten::empty         0.03%       4.000us         0.03%       4.000us       4.000us       1.000us         0.00%       1.000us       1.000us             1  
                      aten::empty         0.02%       3.000us         0.02%       3.000us       3.000us       1.000us         0.00%       1.000us       1.000us             1  
                      aten::empty         0.02%       3.000us         0.02%       3.000us       3.000us       1.000us         0.00%       1.000us       1.000us             1  
                      aten::relu_         1.68%     262.000us         2.09%     325.000us     325.000us       2.000us         0.00%     368.000us     368.000us             1  
                 aten::clamp_min_         0.22%      34.000us         0.40%      63.000us      63.000us       2.000us         0.00%     366.000us     366.000us             1  
                  aten::clamp_min         0.19%      29.000us         0.19%      29.000us      29.000us     364.000us         0.56%     364.000us     364.000us             1  
                 aten::max_pool2d         0.07%      11.000us         0.37%      58.000us      58.000us       2.000us         0.00%     434.000us     434.000us             1  
    aten::max_pool2d_with_indices         0.30%      47.000us         0.30%      47.000us      47.000us     432.000us         0.66%     432.000us     432.000us             1  
                     aten::conv2d         0.06%      10.000us         1.49%     232.000us     232.000us       2.000us         0.00%     668.000us     668.000us             1  
                aten::convolution         0.12%      19.000us         1.42%     222.000us     222.000us       2.000us         0.00%     666.000us     666.000us             1  
               aten::_convolution         0.12%      19.000us         1.30%     203.000us     203.000us       2.000us         0.00%     664.000us     664.000us             1  
          aten::cudnn_convolution         1.12%     175.000us         1.18%     184.000us     184.000us     661.000us         1.01%     662.000us     662.000us             1  
                      aten::empty         0.06%       9.000us         0.06%       9.000us       9.000us       1.000us         0.00%       1.000us       1.000us             1  
                 aten::batch_norm         0.06%      10.000us         0.80%     124.000us     124.000us       2.000us         0.00%     126.000us     126.000us             1  
     aten::_batch_norm_impl_index         0.08%      12.000us         0.73%     114.000us     114.000us       2.000us         0.00%     124.000us     124.000us             1  
           aten::cudnn_batch_norm         0.44%      68.000us         0.65%     102.000us     102.000us     115.000us         0.18%     122.000us     122.000us             1  
                 aten::empty_like         0.06%       9.000us         0.12%      18.000us      18.000us       2.000us         0.00%       3.000us       3.000us             1  
                      aten::empty         0.06%       9.000us         0.06%       9.000us       9.000us       1.000us         0.00%       1.000us       1.000us             1  
                       aten::view         0.04%       6.000us         0.04%       6.000us       6.000us       1.000us         0.00%       1.000us       1.000us             1  
                      aten::empty         0.03%       4.000us         0.03%       4.000us       4.000us       1.000us         0.00%       1.000us       1.000us             1  
                      aten::empty         0.02%       3.000us         0.02%       3.000us       3.000us       1.000us         0.00%       1.000us       1.000us             1  
                      aten::empty         0.02%       3.000us         0.02%       3.000us       3.000us       1.000us         0.00%       1.000us       1.000us             1  
                      aten::relu_         0.11%      17.000us         0.26%      41.000us      41.000us       2.000us         0.00%      96.000us      96.000us             1  
                 aten::clamp_min_         0.04%       7.000us         0.15%      24.000us      24.000us       2.000us         0.00%      94.000us      94.000us             1  
                  aten::clamp_min         0.11%      17.000us         0.11%      17.000us      17.000us      92.000us         0.14%      92.000us      92.000us             1  
                     aten::conv2d         0.08%      12.000us         1.15%     179.000us     179.000us       4.000us         0.01%     690.000us     690.000us             1  
                aten::convolution         0.11%      17.000us         1.07%     167.000us     167.000us       2.000us         0.00%     686.000us     686.000us             1  
               aten::_convolution         0.10%      16.000us         0.96%     150.000us     150.000us       4.000us         0.01%     684.000us     684.000us             1  
          aten::cudnn_convolution         0.81%     126.000us         0.86%     134.000us     134.000us     679.000us         1.04%     680.000us     680.000us             1  
                      aten::empty         0.05%       8.000us         0.05%       8.000us       8.000us       1.000us         0.00%       1.000us       1.000us             1  
                 aten::batch_norm         0.06%       9.000us         1.17%     183.000us     183.000us       2.000us         0.00%     137.000us     137.000us             1  
     aten::_batch_norm_impl_index         0.09%      14.000us         1.12%     174.000us     174.000us       2.000us         0.00%     135.000us     135.000us             1  
           aten::cudnn_batch_norm         0.80%     125.000us         1.03%     160.000us     160.000us     126.000us         0.19%     133.000us     133.000us             1  
                 aten::empty_like         0.06%       9.000us         0.11%      17.000us      17.000us       2.000us         0.00%       3.000us       3.000us             1  
                      aten::empty         0.05%       8.000us         0.05%       8.000us       8.000us       1.000us         0.00%       1.000us       1.000us             1  
                       aten::view         0.04%       6.000us         0.04%       6.000us       6.000us       1.000us         0.00%       1.000us       1.000us             1  
                      aten::empty         0.03%       4.000us         0.03%       4.000us       4.000us       1.000us         0.00%       1.000us       1.000us             1  
                      aten::empty         0.03%       4.000us         0.03%       4.000us       4.000us       1.000us         0.00%       1.000us       1.000us             1  
                      aten::empty         0.03%       4.000us         0.03%       4.000us       4.000us       1.000us         0.00%       1.000us       1.000us             1  
                       aten::add_         0.20%      31.000us         0.20%      31.000us      31.000us     134.000us         0.21%     134.000us     134.000us             1  
                      aten::relu_         0.10%      16.000us         0.24%      38.000us      38.000us       2.000us         0.00%      96.000us      96.000us             1  
                 aten::clamp_min_         0.05%       8.000us         0.14%      22.000us      22.000us       2.000us         0.00%      94.000us      94.000us             1  
                  aten::clamp_min         0.09%      14.000us         0.09%      14.000us      14.000us      92.000us         0.14%      92.000us      92.000us             1  
                     aten::conv2d         0.06%       9.000us         1.21%     188.000us     188.000us       2.000us         0.00%     670.000us     670.000us             1  
                aten::convolution         0.12%      18.000us         1.15%     179.000us     179.000us       2.000us         0.00%     668.000us     668.000us             1  
               aten::_convolution         0.10%      16.000us         1.03%     161.000us     161.000us       2.000us         0.00%     666.000us     666.000us             1  
          aten::cudnn_convolution         0.87%     136.000us         0.93%     145.000us     145.000us     663.000us         1.02%     664.000us     664.000us             1  
                      aten::empty         0.06%       9.000us         0.06%       9.000us       9.000us       1.000us         0.00%       1.000us       1.000us             1  
                 aten::batch_norm         0.06%       9.000us         0.72%     112.000us     112.000us       2.000us         0.00%     131.000us     131.000us             1  
     aten::_batch_norm_impl_index         0.07%      11.000us         0.66%     103.000us     103.000us       4.000us         0.01%     129.000us     129.000us             1  
           aten::cudnn_batch_norm         0.38%      59.000us         0.59%      92.000us      92.000us     116.000us         0.18%     125.000us     125.000us             1  
                 aten::empty_like         0.06%      10.000us         0.12%      18.000us      18.000us       3.000us         0.00%       4.000us       4.000us             1  
                      aten::empty         0.05%       8.000us         0.05%       8.000us       8.000us       1.000us         0.00%       1.000us       1.000us             1  
                       aten::view         0.03%       5.000us         0.03%       5.000us       5.000us       1.000us         0.00%       1.000us       1.000us             1  
                      aten::empty         0.03%       4.000us         0.03%       4.000us       4.000us       1.000us         0.00%       1.000us       1.000us             1  
                      aten::empty         0.02%       3.000us         0.02%       3.000us       3.000us       2.000us         0.00%       2.000us       2.000us             1  
                      aten::empty         0.02%       3.000us         0.02%       3.000us       3.000us       1.000us         0.00%       1.000us       1.000us             1  
                      aten::relu_         0.10%      16.000us         0.25%      39.000us      39.000us       2.000us         0.00%      96.000us      96.000us             1  
                 aten::clamp_min_         0.04%       7.000us         0.15%      23.000us      23.000us       2.000us         0.00%      94.000us      94.000us             1  
                  aten::clamp_min         0.10%      16.000us         0.10%      16.000us      16.000us      92.000us         0.14%      92.000us      92.000us             1  
                     aten::conv2d         0.08%      13.000us         1.05%     164.000us     164.000us       2.000us         0.00%     659.000us     659.000us             1  
                aten::convolution         0.10%      16.000us         0.97%     151.000us     151.000us       2.000us         0.00%     657.000us     657.000us             1  
               aten::_convolution         0.10%      15.000us         0.87%     135.000us     135.000us       2.000us         0.00%     655.000us     655.000us             1  
          aten::cudnn_convolution         0.73%     113.000us         0.77%     120.000us     120.000us     652.000us         1.00%     653.000us     653.000us             1  
                      aten::empty         0.04%       7.000us         0.04%       7.000us       7.000us       1.000us         0.00%       1.000us       1.000us             1  
                 aten::batch_norm         0.06%      10.000us         0.72%     112.000us     112.000us       3.000us         0.00%     129.000us     129.000us             1  
     aten::_batch_norm_impl_index         0.07%      11.000us         0.65%     102.000us     102.000us       2.000us         0.00%     126.000us     126.000us             1  
           aten::cudnn_batch_norm         0.38%      59.000us         0.58%      91.000us      91.000us     117.000us         0.18%     124.000us     124.000us             1  
                 aten::empty_like         0.05%       8.000us         0.10%      16.000us      16.000us       2.000us         0.00%       3.000us       3.000us             1  
                      aten::empty         0.05%       8.000us         0.05%       8.000us       8.000us       1.000us         0.00%       1.000us       1.000us             1  
                       aten::view         0.03%       5.000us         0.03%       5.000us       5.000us       1.000us         0.00%       1.000us       1.000us             1  
                      aten::empty         0.03%       4.000us         0.03%       4.000us       4.000us       1.000us         0.00%       1.000us       1.000us             1  
                      aten::empty         0.03%       4.000us         0.03%       4.000us       4.000us       1.000us         0.00%       1.000us       1.000us             1  
                      aten::empty         0.02%       3.000us         0.02%       3.000us       3.000us       1.000us         0.00%       1.000us       1.000us             1  
                       aten::add_         0.17%      26.000us         0.17%      26.000us      26.000us     133.000us         0.20%     133.000us     133.000us             1  
                      aten::relu_         0.09%      14.000us         0.21%      33.000us      33.000us       3.000us         0.00%      96.000us      96.000us             1  
                 aten::clamp_min_         0.04%       7.000us         0.12%      19.000us      19.000us       2.000us         0.00%      93.000us      93.000us             1  
                  aten::clamp_min         0.08%      12.000us         0.08%      12.000us      12.000us      91.000us         0.14%      91.000us      91.000us             1  
                     aten::conv2d         0.06%       9.000us         1.29%     201.000us     201.000us       2.000us         0.00%     669.000us     669.000us             1  
                aten::convolution         0.10%      16.000us         1.23%     192.000us     192.000us       2.000us         0.00%     667.000us     667.000us             1  
               aten::_convolution         0.10%      16.000us         1.13%     176.000us     176.000us       2.000us         0.00%     665.000us     665.000us             1  
          aten::cudnn_convolution         0.98%     153.000us         1.03%     160.000us     160.000us     662.000us         1.01%     663.000us     663.000us             1  
                      aten::empty         0.04%       7.000us         0.04%       7.000us       7.000us       1.000us         0.00%       1.000us       1.000us             1  
                 aten::batch_norm         0.06%      10.000us         0.72%     112.000us     112.000us       2.000us         0.00%     131.000us     131.000us             1  
     aten::_batch_norm_impl_index         0.08%      12.000us         0.65%     102.000us     102.000us       3.000us         0.00%     129.000us     129.000us             1  
---------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 15.582ms
Self CUDA time total: 65.290ms