PyTorch 快速入门

白酒永远的神

已于 2024-06-01 22:10:04 修改

阅读量124

点赞数

文章标签： pytorch 人工智能

于 2024-04-04 16:39:35 首次发布

原文链接：https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html

版权

Learn the Basics

Authors: Suraj Subramanian, Seth Juarez, Cassie Breviu, Dmitry Soshnikov, Ari Bornstein

Most machine learning workflows involve working with data, creating models, optimizing model parameters, and saving the trained models. This tutorial introduces you to a complete ML workflow implemented in PyTorch, with links to learn more about each of these concepts.

We’ll use the FashionMNIST dataset to train a neural network that predicts if an input image belongs to one of the following classes: T-shirt/top, Trouser, Pullover, Dress, Coat, Sandal, Shirt, Sneaker, Bag, or Ankle boot.

This tutorial assumes a basic familiarity with Python and Deep Learning concepts.

大多数机器学习工作流涉及处理数据、创建模型、优化模型参数和保存训练好的模型。本教程将向您介绍用 PyTorch 实现的完整 ML 工作流，并提供链接以了解有关这些概念的更多信息。

我们将使用 FashionMNIST 数据集来训练一个神经网络，以预测输入图像是否属于以下类别之一：T 恤/上衣、裤子、套头衫、连衣裙、外套、凉鞋、衬衫、运动鞋、包或踝靴。

本教程假定您已基本熟悉 Python 和深度学习概念。

Running the Tutorial Code

You can run this tutorial in a couple of ways:

In the cloud: This is the easiest way to get started! Each section has a “Run in Microsoft Learn” and “Run in Google Colab” link at the top, which opens an integrated notebook in Microsoft Learn or Google Colab, respectively, with the code in a fully-hosted environment.
Locally: This option requires you to setup PyTorch and TorchVision first on your local machine (installation instructions). Download the notebook or copy the code into your favorite IDE.

您可以通过几种方式运行本教程：

在云中：这是最简单的入门方式！每个部分的顶部都有一个 "在 Microsoft Learn 中运行 "和 "在 Google Colab 中运行 "链接，可分别在 Microsoft Learn 或 Google Colab 中打开一个集成 notebook ，在完全托管的环境中运行代码。
本地运行：此选项要求您首先在本地计算机上安装 PyTorch 和 TorchVision（安装说明）。下载 notebook 或将代码复制到您最喜欢的IDE中。

How to Use this Guide

If you’re familiar with other deep learning frameworks, check out the 0. Quickstart first to quickly familiarize yourself with PyTorch’s API.

If you’re new to deep learning frameworks, head right into the first section of our step-by-step guide: 1. Tensors.

如果你熟悉其他深度学习框架，请先查看 0.Quickstart 以快速熟悉 PyTorch 的 API。

如果你是深度学习框架的新手，请直接进入我们的分步指南的第一部分：1.张量

Total running time of the script: ( 0 minutes 0.000 seconds)

Quickstart

This section runs through the API for common tasks in machine learning. Refer to the links in each section to dive deeper.

本节将介绍机器学习中常见任务的API。如需深入了解，请参阅各节中的链接。

Working with data

PyTorch has two primitives to work with data: torch.utils.data.DataLoader and torch.utils.data.Dataset. Dataset stores the samples and their corresponding labels, and DataLoader wraps an iterable around the Dataset.

PyTorch 有两个处理数据的原语：torch.utils.data.DataLoader 和 torch.utils.data.Dataset。Dataset 存储样本及其相应的标签，而 DataLoader 则在 Dataset 周围封装一个可迭代器。

import torch
from torch import nn
from torch.utils.data import DataLoader # 可迭代器，支持自动批处理、采样、洗牌和多进程数据加载
from torchvision import datasets # 存储样本及其相应的标签
from torchvision.transforms import ToTensor

PyTorch offers domain-specific libraries such as TorchText, TorchVision, and TorchAudio, all of which include datasets. For this tutorial, we will be using a TorchVision dataset.

The torchvision.datasets module contains Dataset objects for many real-world vision data like CIFAR, COCO (full list here). In this tutorial, we use the FashionMNIST dataset. Every TorchVision Dataset includes two arguments: transform and target_transform to modify the samples and labels respectively.

PyTorch 提供特定领域的库，如 TorchText、TorchVision 和 TorchAudio，所有这些库都包含数据集。在本教程中，我们将使用 TorchVision 数据集。

torchvision.datasets 模块包含许多真实世界视觉数据的Dataset对象，如 CIFAR、COCO（完整列表在这里）。在本教程中，我们使用的是 FashionMNIST 数据集。每个 TorchVision Dataset 都包含两个参数：transform 和 target_transform，分别用于修改样本和标签。

# 从开放数据集下载训练数据
training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(), # 用于修改样本
)

# 从开放数据集下载测试数据
test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor(),
)

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data/FashionMNIST/raw/train-images-idx3-ubyte.gz

  0%|          | 0/26421880 [00:00<?, ?it/s]
  0%|          | 65536/26421880 [00:00<01:11, 367961.00it/s]
  1%|          | 229376/26421880 [00:00<00:38, 688254.03it/s]
  3%|2         | 688128/26421880 [00:00<00:13, 1928866.69it/s]
  6%|6         | 1605632/26421880 [00:00<00:07, 3449518.99it/s]
 15%|#4        | 3932160/26421880 [00:00<00:02, 8761904.87it/s]
 28%|##8       | 7405568/26421880 [00:00<00:01, 13375685.60it/s]
 42%|####1     | 10977280/26421880 [00:00<00:00, 18928710.86it/s]
 56%|#####5    | 14745600/26421880 [00:01<00:00, 20197909.53it/s]
 70%|######9   | 18448384/26421880 [00:01<00:00, 24204136.90it/s]
 85%|########5 | 22511616/26421880 [00:01<00:00, 24111285.35it/s]
100%|##########| 26421880/26421880 [00:01<00:00, 17271630.60it/s]
Extracting data/FashionMNIST/raw/train-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw/train-labels-idx1-ubyte.gz

  0%|          | 0/29515 [00:00<?, ?it/s]
100%|##########| 29515/29515 [00:00<00:00, 329492.33it/s]
Extracting data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz

  0%|          | 0/4422102 [00:00<?, ?it/s]
  1%|1         | 65536/4422102 [00:00<00:11, 364808.25it/s]
  5%|5         | 229376/4422102 [00:00<00:06, 685825.69it/s]
 19%|#9        | 851968/4422102 [00:00<00:01, 1951409.48it/s]
 67%|######6   | 2949120/4422102 [00:00<00:00, 6308677.83it/s]
100%|##########| 4422102/4422102 [00:00<00:00, 6104655.37it/s]
Extracting data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz

  0%|          | 0/5148 [00:00<?, ?it/s]
100%|##########| 5148/5148 [00:00<00:00, 39401965.31it/s]
Extracting data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw

We pass the Dataset as an argument to DataLoader. This wraps an iterable over our dataset, and supports automatic batching, sampling, shuffling and multiprocess data loading. Here we define a batch size of 64, i.e. each element in the dataloader iterable will return a batch of 64 features and labels.

我们将 DataSet 作为参数传递给 DataLoader。它将数据集封装成迭代器，支持自动批处理、采样、洗牌和多进程数据加载。在这里，我们将批量大小定义为 64，也就是说，可迭代的数据加载器中的每个元素都将返回一批 64个特征和标签。

batch_size = 64

# 创建数据加载器
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

for X, y in test_dataloader: # 每次循环返回64个特征和标签
    print(f"Shape of X [N, C, H, W]: {X.shape}")
    print(f"Shape of y: {y.shape} {y.dtype}")
    break

Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64

Read more about loading data in PyTorch.

Creating Models

To define a neural network in PyTorch, we create a class that inherits from nn.Module. We define the layers of the network in the __init__ function and specify how data will pass through the network in the forward function. To accelerate operations in the neural network, we move it to the GPU or MPS if available.

要在 PyTorch 中定义一个神经网络，我们需要创建一个继承自 nn.Module 的类。我们在 __init__ 函数中定义网络的层数，并在 forward 函数中指定数据如何通过网络。为了加速神经网络中的操作，我们将其移至 GPU 或 MPS（如果可用）。

# 获取用于训练的 cpu、gpu 或 mps 设备
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using {device} device")

# 定义模型
class NeuralNetwork(nn.Module):
    def __init__(self): # 定义网络的层数
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x): # 指定数据如何通过网络
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork().to(device)
print(model)

Using cuda device
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)

Read more about building neural networks in PyTorch.

Optimizing the Model Parameters

To train a model, we need a loss function and an optimizer.

要训练一个模型，我们需要一个损失函数和一个优化器。

loss_fn = nn.CrossEntropyLoss() # 损失函数
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3) # 优化器

In a single training loop, the model makes predictions on the training dataset (fed to it in batches), and backpropagates the prediction error to adjust the model’s parameters.

在单次训练循环中，模型对训练数据集（分批输入）进行预测，并通过反向传播预测误差来调整模型参数。

def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader): 
        X, y = X.to(device), y.to(device)
		
        # 模型对训练数据集（分批输入）进行预测
        pred = model(X)
        loss = loss_fn(pred, y) # 计算预测误差

        # 并通过反向传播预测误差来调整模型参数
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 100 == 0:
            loss, current = loss.item(), (batch + 1) * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]") # 打印损失

We also check the model’s performance against the test dataset to ensure it is learning.

我们还会根据测试数据集检查模型的性能，以确保它在学习。

# 根据测试数据集检查模型的性能，以确保它在学习
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    # 打印准确率
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

The training process is conducted over several iterations (epochs). During each epoch, the model learns parameters to make better predictions. We print the model’s accuracy and loss at each epoch; we’d like to see the accuracy increase and the loss decrease with every epoch.

训练过程经过多次迭代（epochs）。在每个迭代期间，模型都会学习参数，以便做出更好的预测。我们打印模型在每个epoch的准确率和损失；我们希望看到准确率随着每个epoch的增加而增加，损失随着每个epoch的增加而减少。

epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.303494  [   64/60000]
loss: 2.294637  [ 6464/60000]
loss: 2.277102  [12864/60000]
loss: 2.269977  [19264/60000]
loss: 2.254235  [25664/60000]
loss: 2.237146  [32064/60000]
loss: 2.231055  [38464/60000]
loss: 2.205037  [44864/60000]
loss: 2.203240  [51264/60000]
loss: 2.170889  [57664/60000]
Test Error:
 Accuracy: 53.9%, Avg loss: 2.168588

Epoch 2
-------------------------------
loss: 2.177787  [   64/60000]
loss: 2.168083  [ 6464/60000]
loss: 2.114910  [12864/60000]
loss: 2.130412  [19264/60000]
loss: 2.087473  [25664/60000]
loss: 2.039670  [32064/60000]
loss: 2.054274  [38464/60000]
loss: 1.985457  [44864/60000]
loss: 1.996023  [51264/60000]
loss: 1.917241  [57664/60000]
Test Error:
 Accuracy: 60.2%, Avg loss: 1.920374

Epoch 3
-------------------------------
loss: 1.951705  [   64/60000]
loss: 1.919516  [ 6464/60000]
loss: 1.808730  [12864/60000]
loss: 1.846550  [19264/60000]
loss: 1.740618  [25664/60000]
loss: 1.698733  [32064/60000]
loss: 1.708889  [38464/60000]
loss: 1.614436  [44864/60000]
loss: 1.646475  [51264/60000]
loss: 1.524308  [57664/60000]
Test Error:
 Accuracy: 61.4%, Avg loss: 1.547092

Epoch 4
-------------------------------
loss: 1.612695  [   64/60000]
loss: 1.570870  [ 6464/60000]
loss: 1.424730  [12864/60000]
loss: 1.489542  [19264/60000]
loss: 1.367256  [25664/60000]
loss: 1.373464  [32064/60000]
loss: 1.376744  [38464/60000]
loss: 1.304962  [44864/60000]
loss: 1.347154  [51264/60000]
loss: 1.230661  [57664/60000]
Test Error:
 Accuracy: 62.7%, Avg loss: 1.260891

Epoch 5
-------------------------------
loss: 1.337803  [   64/60000]
loss: 1.313278  [ 6464/60000]
loss: 1.151837  [12864/60000]
loss: 1.252142  [19264/60000]
loss: 1.123048  [25664/60000]
loss: 1.159531  [32064/60000]
loss: 1.175011  [38464/60000]
loss: 1.115554  [44864/60000]
loss: 1.160974  [51264/60000]
loss: 1.062730  [57664/60000]
Test Error:
 Accuracy: 64.6%, Avg loss: 1.087374

Done!

Saving Models

A common way to save a model is to serialize the internal state dictionary (containing the model parameters).

保存模型的常用方法是序列化内部状态字典（包含模型参数）。

torch.save(model.state_dict(), "model.pth") # 将内部状态字典序列化
print("Saved PyTorch Model State to model.pth")

Saved PyTorch Model State to model.pth

Loading Models

The process for loading a model includes re-creating the model structure and loading the state dictionary into it.

加载模型的过程包括重新创建模型结构和加载状态字典。

model = NeuralNetwork().to(device) # 重新创建模型结构
model.load_state_dict(torch.load("model.pth")) # 加载状态字典

<All keys matched successfully>

This model can now be used to make predictions.

现在可以利用这一模型进行预测。

classes = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]

model.eval()
x, y = test_data[0][0], test_data[0][1]
with torch.no_grad():
    x = x.to(device)
    pred = model(x) # 利用模型进行预测
    predicted, actual = classes[pred[0].argmax(0)], classes[y]
    print(f'Predicted: "{predicted}", Actual: "{actual}"')

Predicted: "Ankle boot", Actual: "Ankle boot"

Read more about Saving & Loading your model.

Total running time of the script: ( 1 minutes 1.011 seconds)

Tensors

Tensors are a specialized data structure that are very similar to arrays and matrices. In PyTorch, we use tensors to encode the inputs and outputs of a model, as well as the model’s parameters.

Tensors are similar to NumPy’s ndarrays, except that tensors can run on GPUs or other hardware accelerators. In fact, tensors and NumPy arrays can often share the same underlying memory, eliminating the need to copy data (see Bridge with NumPy). Tensors are also optimized for automatic differentiation (we’ll see more about that later in the Autograd section). If you’re familiar with ndarrays, you’ll be right at home with the Tensor API. If not, follow along!

张量是一种专门的数据结构，与数组和矩阵非常相似。在 PyTorch 中，我们使用张量来编码模型的输入和输出以及模型的参数。

张量类似于 NumPy 的ndarrays，只不过张量可以在 GPU 或其他硬件加速器上运行。事实上，Tensors 和 NumPy 数组通常可以共享相同的底层内存，无需复制数据（请参阅与 NumPy 的桥接）。张量还针对自动微分进行了优化（稍后我们将在 Autograd 部分了解更多相关内容）。如果你熟悉ndarrays，那么使用 Tensor API 就会得心应手。如果不熟悉，请继续学习！

import torch
import numpy as np

# 张量
# 1. 可以在 GPU 或其他硬件加速器上运行
# 2. 针对自动微分进行了优化

Initializing a Tensor

Tensors can be initialized in various ways. Take a look at the following examples:

可以通过多种方式初始化张量。请看下面的示例：

Directly from data

Tensors can be created directly from data. The data type is automatically inferred.

可以直接从数据中创建张量。数据类型会自动推断。

data = [[1, 2],[3, 4]] # 从数据中创建张量
x_data = torch.tensor(data)

From a NumPy array

Tensors can be created from NumPy arrays (and vice versa - see Bridge with NumPy).

可以从 NumPy 数组创建张量（反之亦然，请参见与 NumPy 的桥接）。

np_array = np.array(data) #从 NumPy 数组创建张量
x_np = torch.from_numpy(np_array)

From another tensor:

The new tensor retains the properties (shape, datatype) of the argument tensor, unless explicitly overridden.

除非明确重载，否则新张量将保留参数张量的属性（形状、数据类型）。

x_ones = torch.ones_like(x_data) # 保留了 x_data 的形状、数据类型
print(f"Ones Tensor: \n {x_ones} \n")

x_rand = torch.rand_like(x_data, dtype=torch.float) # 覆盖 x_data 的数据类型
print(f"Random Tensor: \n {x_rand} \n")

Ones Tensor:
 tensor([[1, 1],
        [1, 1]])

Random Tensor:
 tensor([[0.8823, 0.9150],
        [0.3829, 0.9593]])

With random or constant values:

shape is a tuple of tensor dimensions. In the functions below, it determines the dimensionality of the output tensor.

shape 是张量维度的元组。在下面的函数中，它决定了输出张量的维度。

shape = (2,3,) # 张量维度
rand_tensor = torch.rand(shape)
ones_tensor = torch.ones(shape)
zeros_tensor = torch.zeros(shape)

print(f"Random Tensor: \n {rand_tensor} \n")
print(f"Ones Tensor: \n {ones_tensor} \n")
print(f"Zeros Tensor: \n {zeros_tensor}")

Random Tensor:
 tensor([[0.3904, 0.6009, 0.2566],
        [0.7936, 0.9408, 0.1332]])

Ones Tensor:
 tensor([[1., 1., 1.],
        [1., 1., 1.]])

Zeros Tensor:
 tensor([[0., 0., 0.],
        [0., 0., 0.]])

Attributes of a Tensor

Tensor attributes describe their shape, datatype, and the device on which they are stored.

张量属性描述了张量的形状、数据类型以及存储张量的设备。

tensor = torch.rand(3,4)

print(f"Shape of tensor: {tensor.shape}") # 形状
print(f"Datatype of tensor: {tensor.dtype}") # 数据类型
print(f"Device tensor is stored on: {tensor.device}") # 存储张量的设备

Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu

Operations on Tensors

Over 100 tensor operations, including arithmetic, linear algebra, matrix manipulation (transposing, indexing, slicing), sampling and more are comprehensively described here.

Each of these operations can be run on the GPU (at typically higher speeds than on a CPU). If you’re using Colab, allocate a GPU by going to Runtime > Change runtime type > GPU.

By default, tensors are created on the CPU. We need to explicitly move tensors to the GPU using .to method (after checking for GPU availability). Keep in mind that copying large tensors across devices can be expensive in terms of time and memory!

这里全面介绍了 100 多种张量运算，包括算术、线性代数、矩阵操作（转置、索引、切片）、采样等。

这些操作都可以在 GPU 上运行（速度通常高于 CPU）。如果您正在使用 Colab，请通过运行时 > 更改运行时类型 > GPU 可以分配 GPU。

默认情况下，张量在 CPU 上创建。我们需要使用 .to 方法（在检查 GPU 可用性后）将张量显式移动到 GPU 上。请记住，跨设备复制大型张量可能会耗费大量时间和内存！

# 如果有的话，将张量移动到 GPU 上
if torch.cuda.is_available():
    tensor = tensor.to("cuda")

Try out some of the operations from the list. If you’re familiar with the NumPy API, you’ll find the Tensor API a breeze to use.

试试列表中的一些操作。如果你熟悉 NumPy API，你会发现 Tensor API 的使用轻而易举。

Standard numpy-like indexing and slicing:

tensor = torch.ones(4, 4)
print(f"First row: {tensor[0]}")
print(f"First column: {tensor[:, 0]}") 
print(f"Last column: {tensor[..., -1]}")
tensor[:,1] = 0
print(tensor)

First row: tensor([1., 1., 1., 1.])
First column: tensor([1., 1., 1., 1.])
Last column: tensor([1., 1., 1., 1.])
tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])

Joining tensors You can use torch.cat to concatenate a sequence of tensors along a given dimension. See also torch.stack, another tensor joining operator that is subtly different from torch.cat.

连接张量 你可以使用 torch.cat 沿着给定维度连接一系列张量。另请参阅 torch.stack，这是另一个与 torch.cat 有细微差别的张量连接操作符。

t1 = torch.cat([tensor, tensor, tensor], dim=1) # 水平拼接
print(t1)

tensor([[1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.]])

Arithmetic operations

# 这将计算两个张量之间的矩阵乘法，y1、y2、y3 的值相同
# tensor.T返回张量的转置
y1 = tensor @ tensor.T # @形式
y2 = tensor.matmul(tensor.T) # tensor.matmul形式

y3 = torch.rand_like(y1)
torch.matmul(tensor, tensor.T, out=y3) # torch.matmul形式


# 这样就可以计算出元素与元素之间的乘积， z1、z2、z3 的值相同
z1 = tensor * tensor # *形式
z2 = tensor.mul(tensor) # tensor.mul形式

z3 = torch.rand_like(tensor)
torch.mul(tensor, tensor, out=z3) # torch.mul形式

tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])

Single-element tensors If you have a one-element tensor, for example by aggregating all values of a tensor into one value, you can convert it to a Python numerical value using item():

单元素张量 如果您有一个单元素张量，例如将张量的所有值聚合成一个值，您可以使用 item() 将其转换为 Python 数值：

agg = tensor.sum() # 将张量的所有值聚合成一个值
agg_item = agg.item() # 使用item()将其转换为 Python 数值
print(agg_item, type(agg_item))

12.0 <class 'float'>

In-place operations Operations that store the result into the operand are called in-place. They are denoted by a _ suffix. For example: x.copy_(y), x.t_(), will change x.

原地操作 将结果存储到操作数中的操作称为就地操作。它们用 _ 后缀表示。例如：x.copy_(y), x.t_()，将改变 x。

print(f"{tensor} \n")
tensor.add_(5) # 原地操作
print(tensor)

tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])

tensor([[6., 5., 6., 6.],
        [6., 5., 6., 6.],
        [6., 5., 6., 6.],
        [6., 5., 6., 6.]])

NOTE

In-place operations save some memory, but can be problematic when computing derivatives because of an immediate loss of history. Hence, their use is discouraged.

原地操作可以节省一些内存，但在计算导数时会出现问题，因为会立即丢失历史记录。因此，我们不鼓励使用这种方法。

Bridge with NumPy

Tensors on the CPU and NumPy arrays can share their underlying memory locations, and changing one will change the other.

CPU 上的张量和 NumPy 数组可以共享底层内存位置，改变其中一个就会改变另一个。

Tensor to NumPy array

t = torch.ones(5)
print(f"t: {t}")
n = t.numpy() # tensor to numpy
print(f"n: {n}")

t: tensor([1., 1., 1., 1., 1.])
n: [1. 1. 1. 1. 1.]

A change in the tensor reflects in the NumPy array.

张量的变化反映在 NumPy 数组中。

t.add_(1)
print(f"t: {t}")
print(f"n: {n}")

t: tensor([2., 2., 2., 2., 2.])
n: [2. 2. 2. 2. 2.]

NumPy array to Tensor

n = np.ones(5)
t = torch.from_numpy(n) # numpy to tensor

Changes in the NumPy array reflects in the tensor.

NumPy 数组的变化会反映在张量中。

np.add(n, 1, out=n)
print(f"t: {t}")
print(f"n: {n}")

t: tensor([2., 2., 2., 2., 2.], dtype=torch.float64)
n: [2. 2. 2. 2. 2.]

Total running time of the script: ( 0 minutes 0.023 seconds)

Datasets & DataLoaders

Code for processing data samples can get messy and hard to maintain; we ideally want our dataset code to be decoupled from our model training code for better readability and modularity. PyTorch provides two data primitives: torch.utils.data.DataLoader and torch.utils.data.Dataset that allow you to use pre-loaded datasets as well as your own data. Dataset stores the samples and their corresponding labels, and DataLoader wraps an iterable around the Dataset to enable easy access to the samples.

PyTorch domain libraries provide a number of pre-loaded datasets (such as FashionMNIST) that subclass torch.utils.data.Dataset and implement functions specific to the particular data. They can be used to prototype and benchmark your model. You can find them here: Image Datasets, Text Datasets, and Audio Datasets

处理数据样本的代码可能会变得杂乱无章且难以维护；理想情况下，我们希望数据集代码与模型训练代码分离，以获得更好的可读性和模块化。PyTorch 提供了两个数据原语：torch.utils.data.DataLoader 和 torch.utils.data.Dataset，允许您使用预加载的数据集和您自己的数据。Dataset 用于存储样本及其相应的标签，而 DataLoader 则在 Dataset 周围封装了一个可迭代器，以方便访问样本。

PyTorch 领域库提供了大量预加载数据集（如 FashionMNIST），这些数据集是 torch.utils.data.Dataset 的子类，并实现了特定数据的功能。这些数据集可用于模型原型和基准测试。您可以在这里找到它们：图像数据集、文本数据集和音频数据集

Loading a Dataset

Here is an example of how to load the Fashion-MNIST dataset from TorchVision. Fashion-MNIST is a dataset of Zalando’s article images consisting of 60,000 training examples and 10,000 test examples. Each example comprises a 28×28 grayscale image and an associated label from one of 10 classes.

We load the FashionMNIST Dataset with the following parameters:

root is the path where the train/test data is stored,train specifies training or test dataset,download=True downloads the data from the internet if it’s not available at root.transform and target_transform specify the feature and label transformations

以下是如何从 TorchVision Fashion-MNIST 数据集加载的示例。Fashion-MNIST 是一个 Zalando 文章图像数据集，由 60,000 个训练示例和 10,000 个测试示例组成。每个示例包括一张 28×28 灰度图像和 10 个类别中的一个相关标签。

我们使用以下参数加载 FashionMNIST 数据集：

root 是存储训练/测试数据的路径、train 指定训练或测试数据集、download=True 如果 root 目录下没有数据，则从互联网上下载。transform 和 target_transform 指定特征和标签变换

import torch
from torch.utils.data import Dataset
from torchvision import datasets
from torchvision.transforms import ToTensor
import matplotlib.pyplot as plt


training_data = datasets.FashionMNIST(
    root="data", # 存储训练/测试数据的路径
    train=True, # 指定训练或测试数据集
    download=True, #  如果 `root` 目录下没有数据，则从互联网上下载
    transform=ToTensor() # 指定特征变换
)

test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data/FashionMNIST/raw/train-images-idx3-ubyte.gz

  0%|          | 0/26421880 [00:00<?, ?it/s]
  0%|          | 65536/26421880 [00:00<01:12, 363990.88it/s]
  1%|          | 229376/26421880 [00:00<00:38, 683573.11it/s]
  3%|3         | 851968/26421880 [00:00<00:13, 1944634.43it/s]
 12%|#1        | 3080192/26421880 [00:00<00:03, 6040607.64it/s]
 32%|###2      | 8519680/26421880 [00:00<00:01, 14719993.85it/s]
 54%|#####3    | 14155776/26421880 [00:01<00:00, 20325888.81it/s]
 76%|#######6  | 20086784/26421880 [00:01<00:00, 24409936.31it/s]
 99%|#########8| 26050560/26421880 [00:01<00:00, 27101101.09it/s]
100%|##########| 26421880/26421880 [00:01<00:00, 18253974.95it/s]
Extracting data/FashionMNIST/raw/train-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw/train-labels-idx1-ubyte.gz

  0%|          | 0/29515 [00:00<?, ?it/s]
100%|##########| 29515/29515 [00:00<00:00, 329402.91it/s]
Extracting data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz

  0%|          | 0/4422102 [00:00<?, ?it/s]
  1%|1         | 65536/4422102 [00:00<00:12, 362655.38it/s]
  5%|5         | 229376/4422102 [00:00<00:06, 692338.46it/s]
 21%|##        | 917504/4422102 [00:00<00:01, 2675097.95it/s]
 44%|####3     | 1933312/4422102 [00:00<00:00, 4101123.31it/s]
100%|##########| 4422102/4422102 [00:00<00:00, 6092933.86it/s]
Extracting data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz

  0%|          | 0/5148 [00:00<?, ?it/s]
100%|##########| 5148/5148 [00:00<00:00, 45172127.60it/s]
Extracting data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw

Iterating and Visualizing the Dataset

We can index Datasets manually like a list: training_data[index]. We use matplotlib to visualize some samples in our training data.

我们可以像列表一样手动为 Datasets 编制索引：training_data[index]。我们使用 matplotlib 来可视化训练数据中的一些样本。

labels_map = {
    0: "T-Shirt",
    1: "Trouser",
    2: "Pullover",
    3: "Dress",
    4: "Coat",
    5: "Sandal",
    6: "Shirt",
    7: "Sneaker",
    8: "Bag",
    9: "Ankle Boot",
}
figure = plt.figure(figsize=(8, 8)) # 用matplotlib来可视化训练数据中的一些样本
cols, rows = 3, 3
for i in range(1, cols * rows + 1):
    sample_idx = torch.randint(len(training_data), size=(1,)).item()
    img, label = training_data[sample_idx]
    figure.add_subplot(rows, cols, i)
    plt.title(labels_map[label])
    plt.axis("off")
    plt.imshow(img.squeeze(), cmap="gray")
plt.show()

Ankle Boot, Shirt, Bag, Ankle Boot, Trouser, Sandal, Coat, Sandal, Pullover

Creating a Custom Dataset for your files

A custom Dataset class must implement three functions: __init__, __len__, and __getitem__. Take a look at this implementation; the FashionMNIST images are stored in a directory img_dir, and their labels are stored separately in a CSV file annotations_file.

In the next sections, we’ll break down what’s happening in each of these functions.

自定义 Dataset 类必须实现三个函数：__init__、__len__ 和 __getitem__ 三个函数。看看这个实现；FashionMNIST 图像存储在 img_dir 目录中，它们的标签分别存储在 CSV 文件 annotations_file 中。

在接下来的章节中，我们将逐一介绍这些函数中发生的事情。

import os
import pandas as pd
from torchvision.io import read_image

class CustomImageDataset(Dataset):
    def __init__(self, annotations_file, img_dir, transform=None, target_transform=None):
        self.img_labels = pd.read_csv(annotations_file) # 标签存储位置
        self.img_dir = img_dir # 图像存储位置
        self.transform = transform
        self.target_transform = target_transform # 指定标签变换

    def __len__(self):
        return len(self.img_labels)

    def __getitem__(self, idx):
        img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0])
        image = read_image(img_path)
        label = self.img_labels.iloc[idx, 1]
        if self.transform:
            image = self.transform(image)
        if self.target_transform:
            label = self.target_transform(label)
        return image, label

`init`

The __init__ function is run once when instantiating the Dataset object. We initialize the directory containing the images, the annotations file, and both transforms (covered in more detail in the next section).

The labels.csv file looks like:

在实例化数据集对象时，会运行一次 __init__ 函数。我们初始化包含图像的目录、注解文件和两个变换（下一节将详细介绍）。

labels.csv文件看起来像这样

tshirt1.jpg, 0
tshirt2.jpg, 0
......
ankleboot999.jpg, 9

def __init__(self, annotations_file, img_dir, transform=None, target_transform=None):
    self.img_labels = pd.read_csv(annotations_file)
    self.img_dir = img_dir
    self.transform = transform
    self.target_transform = target_transform

`len`

The __len__ function returns the number of samples in our dataset.

__len__ 函数返回数据集中的样本数。

Example:

def __len__(self):
    return len(self.img_labels) # 数据集中的样本数

`getitem`

The __getitem__ function loads and returns a sample from the dataset at the given index idx. Based on the index, it identifies the image’s location on disk, converts that to a tensor using read_image, retrieves the corresponding label from the csv data in self.img_labels, calls the transform functions on them (if applicable), and returns the tensor image and corresponding label in a tuple.

__getitem__ 函数根据给定的索引 idx 从数据集中加载并返回一个样本。根据索引，函数会识别图像在磁盘上的位置，使用 read_image 将其转换为张量图像，从 self.img_labels 中的 csv 数据中获取相应的标签，调用变换函数（如果适用），然后以元组形式返回张量图像和相应的标签。

def __getitem__(self, idx):
    img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0]) # 图像在磁盘上的位置
    image = read_image(img_path) # 转换为张量图像
    label = self.img_labels.iloc[idx, 1] # 获取相应的标签
    if self.transform:
        image = self.transform(image) # 调用变换函数
    if self.target_transform:
        label = self.target_transform(label) # 调用变换函数
    return image, label

Preparing your data for training with DataLoaders

The Dataset retrieves our dataset’s features and labels one sample at a time. While training a model, we typically want to pass samples in “minibatches”, reshuffle the data at every epoch to reduce model overfitting, and use Python’s multiprocessing to speed up data retrieval.

DataLoader is an iterable that abstracts this complexity for us in an easy API.

Dataset一次检索一个样本的特征和标签。在训练模型时，我们通常希望以 "minibatches "的形式传递样本，每隔一段时间重新调整数据以减少模型的过拟合，并使用 Python 的 multiprocessing 加快数据检索速度。

DataLoader 是一个迭代器，它通过简单的 API 为我们抽象了这些复杂性。

from torch.utils.data import DataLoader

train_dataloader = DataLoader(training_data, batch_size=64, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=64, shuffle=True)

Iterate through the DataLoader

We have loaded that dataset into the DataLoader and can iterate through the dataset as needed. Each iteration below returns a batch of train_features and train_labels (containing batch_size=64 features and labels respectively). Because we specified shuffle=True, after we iterate over all batches the data is shuffled (for finer-grained control over the data loading order, take a look at Samplers).

我们已将该数据集加载到 DataLoader 中，并可根据需要对数据集进行迭代。下面的每次迭代都会返回一批 train_features 和 train_labels（分别包含 batch_size=64 个特征和标签）。由于我们指定了 shuffle=True，因此在迭代完所有批次后，数据将被洗牌（如需更精细地控制数据加载顺序，请参阅采样器）。

# 显示图像和标签
train_features, train_labels = next(iter(train_dataloader))
print(f"Feature batch shape: {train_features.size()}")
print(f"Labels batch shape: {train_labels.size()}")
img = train_features[0].squeeze()
label = train_labels[0]
plt.imshow(img, cmap="gray")
plt.show()
print(f"Label: {label}")

data tutorial

Feature batch shape: torch.Size([64, 1, 28, 28])
Labels batch shape: torch.Size([64])
Label: 5

Transforms

Data does not always come in its final processed form that is required for training machine learning algorithms. We use transforms to perform some manipulation of the data and make it suitable for training.

All TorchVision datasets have two parameters -transform to modify the features and target_transform to modify the labels - that accept callables containing the transformation logic. The torchvision.transforms module offers several commonly-used transforms out of the box.

The FashionMNIST features are in PIL Image format, and the labels are integers. For training, we need the features as normalized tensors, and the labels as one-hot encoded tensors. To make these transformations, we use ToTensor and Lambda.

数据并不总是以训练机器学习算法所需的最终处理形式出现。我们使用变换对数据进行一些处理，使其适合训练。

所有 TorchVision 数据集都有两个参数-transform（用于修改特征）和 target_transform（用于修改标签）–可接受包含转换逻辑的可调用数据。torchvision.transforms 模块提供了几种常用的转换。

FashionMNIST 的特征是 PIL 图像格式，标签是整数。在训练时，我们需要将特征转换为归一化张量，将标签转换为one-hot编码张量。为了进行这些转换，我们使用了 ToTensor 和 Lambda。

import torch
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda

ds = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
     # 将 PIL 图像或 NumPy ndarray 转换为 `FloatTensor`，并在 [0., 1.] 范围内缩放图像的像素强度值
    transform=ToTensor(),
    # 将整数转化为one-hot编码张量
    target_transform=Lambda(lambda y: torch.zeros(10, dtype=torch.float).scatter_(0, torch.tensor(y), value=1))
)

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data/FashionMNIST/raw/train-images-idx3-ubyte.gz

  0%|          | 0/26421880 [00:00<?, ?it/s]
  0%|          | 65536/26421880 [00:00<01:12, 363749.08it/s]
  1%|          | 229376/26421880 [00:00<00:38, 686922.68it/s]
  4%|3         | 950272/26421880 [00:00<00:11, 2205402.40it/s]
 14%|#4        | 3768320/26421880 [00:00<00:02, 8263766.42it/s]
 26%|##6       | 6946816/26421880 [00:00<00:01, 13245967.27it/s]
 47%|####7     | 12451840/26421880 [00:00<00:00, 19778170.65it/s]
 69%|######8   | 18153472/26421880 [00:01<00:00, 26194289.57it/s]
 80%|########  | 21233664/26421880 [00:01<00:00, 25191295.94it/s]
 99%|#########9| 26214400/26421880 [00:01<00:00, 28627989.10it/s]
100%|##########| 26421880/26421880 [00:01<00:00, 18909729.61it/s]
Extracting data/FashionMNIST/raw/train-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw/train-labels-idx1-ubyte.gz

  0%|          | 0/29515 [00:00<?, ?it/s]
100%|##########| 29515/29515 [00:00<00:00, 326951.31it/s]
Extracting data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz

  0%|          | 0/4422102 [00:00<?, ?it/s]
  1%|1         | 65536/4422102 [00:00<00:11, 364569.71it/s]
  5%|5         | 229376/4422102 [00:00<00:06, 684559.05it/s]
 19%|#9        | 851968/4422102 [00:00<00:01, 1950341.95it/s]
 67%|######7   | 2981888/4422102 [00:00<00:00, 5843037.32it/s]
100%|##########| 4422102/4422102 [00:00<00:00, 6099505.99it/s]
Extracting data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz

  0%|          | 0/5148 [00:00<?, ?it/s]
100%|##########| 5148/5148 [00:00<00:00, 35867569.75it/s]
Extracting data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw

ToTensor()

ToTensor converts a PIL image or NumPy ndarray into a FloatTensor. and scales the image’s pixel intensity values in the range [0., 1.]

ToTensor 将 PIL 图像或 NumPy ndarray 转换为 FloatTensor，并在 [0., 1.] 范围内缩放图像的像素强度值。

Lambda Transforms

Lambda transforms apply any user-defined lambda function. Here, we define a function to turn the integer into a one-hot encoded tensor. It first creates a zero tensor of size 10 (the number of labels in our dataset) and calls scatter_ which assigns a value=1 on the index as given by the label y.

Lambda 变换应用任何用户定义的 lambda 函数。在这里，我们定义了一个将整数转化为one-hot编码张量的函数。它首先创建一个大小为 10（数据集中标签的数量）的零张量，然后调用 scatter_，在标签 y 给定的索引赋值 value=1。

target_transform = Lambda(lambda y: torch.zeros(
    10, dtype=torch.float).scatter_(dim=0, index=torch.tensor(y), value=1))

Build the Neural Network

Neural networks comprise of layers/modules that perform operations on data. The torch.nn namespace provides all the building blocks you need to build your own neural network. Every module in PyTorch subclasses the nn.Module. A neural network is a module itself that consists of other modules (layers). This nested structure allows for building and managing complex architectures easily.

In the following sections, we’ll build a neural network to classify images in the FashionMNIST dataset.

神经网络由对数据执行操作的层/模块组成。torch.nn 命名空间提供了构建自己的神经网络所需的所有构件。PyTorch 中的每个模块都是 nn.Module 的子类。神经网络本身就是一个由其他模块（层）组成的模块。这种嵌套结构可以轻松构建和管理复杂的架构。

在下面的章节中，我们将构建一个神经网络来对 FashionMNIST 数据集中的图像进行分类。

import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

Get Device for Training

We want to be able to train our model on a hardware accelerator like the GPU or MPS, if available. Let’s check to see if torch.cuda or torch.backends.mps are available, otherwise we use the CPU.

我们希望能在 GPU 或 MPS 等硬件加速器（如果可用）上训练我们的模型。让我们检查一下 torch.cuda 或 torch.backends.mps 是否可用，否则我们就使用 CPU。

device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using {device} device")

Using cuda device

Define the Class

We define our neural network by subclassing nn.Module, and initialize the neural network layers in __init__. Every nn.Module subclass implements the operations on input data in the forward method.

我们通过子类化 nn.Module 来定义神经网络，并在 __init__ 中初始化神经网络层。每个 nn.Module 子类都在 forward 方法中实现对输入数据的操作。

class NeuralNetwork(nn.Module): # 继承nn.Module
    def __init__(self): # 初始化神经网络层
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x): # 输入数据的操作
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

We create an instance of NeuralNetwork, and move it to the device, and print its structure.

我们创建一个 NeuralNetwork 实例，将其移动到 device 上并打印其结构。

model = NeuralNetwork().to(device) # 创建一个NeuralNetwork实例，将其移动到device上
print(model) # 打印结构

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)

To use the model, we pass it the input data. This executes the model’s forward, along with some background operations. Do not call model.forward() directly!

Calling the model on the input returns a 2-dimensional tensor with dim=0 corresponding to each output of 10 raw predicted values for each class, and dim=1 corresponding to the individual values of each output. We get the prediction probabilities by passing it through an instance of the nn.Softmax module.

要使用模型，我们需要将输入数据传递给它。这将执行模型的 forward 以及一些后台操作。请不要直接调用 model.forward()！

在输入数据上调用模型会返回一个二维张量，其中 dim=0 对应于每个类的 10 个原始预测值的每个输出，dim=1 对应于每个输出的各个值。我们通过 nn.Softmax 模块的实例来获取预测概率。

X = torch.rand(1, 28, 28, device=device)
logits = model(X)
pred_probab = nn.Softmax(dim=1)(logits) # 获取预测概率
y_pred = pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")

Predicted class: tensor([7], device='cuda:0')

Model Layers

Let’s break down the layers in the FashionMNIST model. To illustrate it, we will take a sample minibatch of 3 images of size 28x28 and see what happens to it as we pass it through the network.

让我们来分解一下 FashionMNIST 模型的各个层。为了说明这一点，我们将以 3 幅大小为 28x28 的图像为样本，看看在通过网络时会发生什么。

input_image = torch.rand(3,28,28) # 3 幅大小为 28x28 的图像为样本
print(input_image.size())

torch.Size([3, 28, 28])

nn.Flatten

We initialize the nn.Flatten layer to convert each 2D 28x28 image into a contiguous array of 784 pixel values ( the minibatch dimension (at dim=0) is maintained).

我们对 nn.Flatten 层进行初始化，将每幅 28x28 的二维图像转换成一个包含 784 个像素值的连续数组（保持最小批次维度（dim=0））。

flatten = nn.Flatten() # 初始化nn.Flatten层
flat_image = flatten(input_image) # 将28x28转换为784
print(flat_image.size())

torch.Size([3, 784])

nn.Linear

The linear layer is a module that applies a linear transformation on the input using its stored weights and biases.

线性层是一个利用其存储的权重和偏置对输入进行线性变换的模块。

layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1 = layer1(flat_image) # 利用其存储的权重和偏置对输入进行线性变换
print(hidden1.size())

torch.Size([3, 20])

nn.ReLU

Non-linear activations are what create the complex mappings between the model’s inputs and outputs. They are applied after linear transformations to introduce nonlinearity, helping neural networks learn a wide variety of phenomena.

In this model, we use nn.ReLU between our linear layers, but there’s other activations to introduce non-linearity in your model.

非线性激活可以在模型的输入和输出之间建立复杂的映射关系。它们应用于线性变换之后，以引入非线性，帮助神经网络学习各种现象。

在这个模型中，我们在线性层之间使用了 nn.ReLU，但也有其他激活方式可以在模型中引入非线性。

print(f"Before ReLU: {hidden1}\n\n")
hidden1 = nn.ReLU()(hidden1) # 非线性激活
print(f"After ReLU: {hidden1}")

Before ReLU: tensor([[ 0.4158, -0.0130, -0.1144,  0.3960,  0.1476, -0.0690, -0.0269,  0.2690,
          0.1353,  0.1975,  0.4484,  0.0753,  0.4455,  0.5321, -0.1692,  0.4504,
          0.2476, -0.1787, -0.2754,  0.2462],
        [ 0.2326,  0.0623, -0.2984,  0.2878,  0.2767, -0.5434, -0.5051,  0.4339,
          0.0302,  0.1634,  0.5649, -0.0055,  0.2025,  0.4473, -0.2333,  0.6611,
          0.1883, -0.1250,  0.0820,  0.2778],
        [ 0.3325,  0.2654,  0.1091,  0.0651,  0.3425, -0.3880, -0.0152,  0.2298,
          0.3872,  0.0342,  0.8503,  0.0937,  0.1796,  0.5007, -0.1897,  0.4030,
          0.1189, -0.3237,  0.2048,  0.4343]], grad_fn=<AddmmBackward0>)


After ReLU: tensor([[0.4158, 0.0000, 0.0000, 0.3960, 0.1476, 0.0000, 0.0000, 0.2690, 0.1353,
         0.1975, 0.4484, 0.0753, 0.4455, 0.5321, 0.0000, 0.4504, 0.2476, 0.0000,
         0.0000, 0.2462],
        [0.2326, 0.0623, 0.0000, 0.2878, 0.2767, 0.0000, 0.0000, 0.4339, 0.0302,
         0.1634, 0.5649, 0.0000, 0.2025, 0.4473, 0.0000, 0.6611, 0.1883, 0.0000,
         0.0820, 0.2778],
        [0.3325, 0.2654, 0.1091, 0.0651, 0.3425, 0.0000, 0.0000, 0.2298, 0.3872,
         0.0342, 0.8503, 0.0937, 0.1796, 0.5007, 0.0000, 0.4030, 0.1189, 0.0000,
         0.2048, 0.4343]], grad_fn=<ReluBackward0>)

nn.Sequential

nn.Sequential is an ordered container of modules. The data is passed through all the modules in the same order as defined. You can use sequential containers to put together a quick network like seq_modules.

nn.Sequential 是一个有序的模块容器。数据按照定义的相同顺序通过所有模块。您可以使用顺序容器来组建类似 seq_modules 的快速网络。

seq_modules = nn.Sequential( # 有序的模块容器
    flatten,
    layer1,
    nn.ReLU(),
    nn.Linear(20, 10)
)
input_image = torch.rand(3,28,28)
logits = seq_modules(input_image) # 原始值，单位为 [-infty，infty]

nn.Softmax

The last linear layer of the neural network returns logits - raw values in [-infty, infty] - which are passed to the nn.Softmax module. The logits are scaled to values [0, 1] representing the model’s predicted probabilities for each class. dim parameter indicates the dimension along which the values must sum to 1.

神经网络的最后一层线性层返回 logits（原始值，单位为 [-infty，infty] ），并将其传递给 nn.Softmax 模块。dim参数表示数值必须相加为 1 的维度。

softmax = nn.Softmax(dim=1)
pred_probab = softmax(logits)

Model Parameters

Many layers inside a neural network are parameterized, i.e. have associated weights and biases that are optimized during training. Subclassing nn.Module automatically tracks all fields defined inside your model object, and makes all parameters accessible using your model’s parameters() or named_parameters() methods.

In this example, we iterate over each parameter, and print its size and a preview of its values.

神经网络中的许多层都是参数化的，即在训练过程中优化相关的权重和偏置。子类化 nn.Module 会自动跟踪模型对象中定义的所有字段，并使用模型的 parameters() 或 named_parameters() 方法访问所有参数。

在本例中，我们遍历了每个参数，并打印了参数的大小及其值的预览。

print(f"Model structure: {model}\n\n")

for name, param in model.named_parameters(): # 遍历参数，预览
    print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")

Model structure: NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values : tensor([[ 0.0273,  0.0296, -0.0084,  ..., -0.0142,  0.0093,  0.0135],
        [-0.0188, -0.0354,  0.0187,  ..., -0.0106, -0.0001,  0.0115]],
       device='cuda:0', grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values : tensor([-0.0155, -0.0327], device='cuda:0', grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values : tensor([[ 0.0116,  0.0293, -0.0280,  ...,  0.0334, -0.0078,  0.0298],
        [ 0.0095,  0.0038,  0.0009,  ..., -0.0365, -0.0011, -0.0221]],
       device='cuda:0', grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.2.bias | Size: torch.Size([512]) | Values : tensor([ 0.0148, -0.0256], device='cuda:0', grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.4.weight | Size: torch.Size([10, 512]) | Values : tensor([[-0.0147, -0.0229,  0.0180,  ..., -0.0013,  0.0177,  0.0070],
        [-0.0202, -0.0417, -0.0279,  ..., -0.0441,  0.0185, -0.0268]],
       device='cuda:0', grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.4.bias | Size: torch.Size([10]) | Values : tensor([ 0.0070, -0.0411], device='cuda:0', grad_fn=<SliceBackward0>)

Automatic Differentiation with `torch.autograd`

When training neural networks, the most frequently used algorithm is back propagation. In this algorithm, parameters (model weights) are adjusted according to the gradient of the loss function with respect to the given parameter.

To compute those gradients, PyTorch has a built-in differentiation engine called torch.autograd. It supports automatic computation of gradient for any computational graph.

Consider the simplest one-layer neural network, with input x, parameters w and b, and some loss function. It can be defined in PyTorch in the following manner:

在训练神经网络时，最常用的算法是反向传播。在这种算法中，参数（模型权重）根据损失函数相对于给定参数的梯度进行调整。

为了计算这些梯度，PyTorch 内置了一个名为 torch.autograd 的微分引擎。它支持自动计算任何计算图的梯度。

考虑最简单的单层神经网络，输入 x、参数 w 和 b 以及一些损失函数。它可以用以下方式在 PyTorch 中定义：

import torch

x = torch.ones(5)  # 输入张量
y = torch.zeros(3)  # 期望输出
w = torch.randn(5, 3, requires_grad=True) # 参数（优化）
b = torch.randn(3, requires_grad=True) # 偏移（优化）
z = torch.matmul(x, w)+b # 实际输出
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y) # 实际和期望之间的差距

Tensors, Functions and Computational graph

This code defines the following computational graph:

该代码定义了以下计算图：

In this network, w and b are parameters, which we need to optimize. Thus, we need to be able to compute the gradients of loss function with respect to those variables. In order to do that, we set the requires_grad property of those tensors.

在这个网络中，w 和 b 是我们需要优化的参数。因此，我们需要计算损失函数相对于这些变量的梯度。为此，我们设置了这些张量的 requires_grad 属性。

NOTE

You can set the value of requires_grad when creating a tensor, or later by using x.requires_grad_(True) method.

您可以在创建张量时设置 requires_grad 的值，也可以稍后使用 x.requires_grad_(True) 方法来设置。

A function that we apply to tensors to construct computational graph is in fact an object of class Function. This object knows how to compute the function in the forward direction, and also how to compute its derivative during the backward propagation step. A reference to the backward propagation function is stored in grad_fn property of a tensor. You can find more information of Function in the documentation.

我们应用张量来构建计算图的函数实际上是一个 Function 类对象。该对象知道如何在前向计算函数，以及如何在后向传播步骤中计算其导数。后向传播函数的引用存储在张量的 grad_fn 属性中。有关 Function 的更多信息，请参阅文档。

print(f"Gradient function for z = {z.grad_fn}")
print(f"Gradient function for loss = {loss.grad_fn}")

Gradient function for z = <AddBackward0 object at 0x7f774e56a140>
Gradient function for loss = <BinaryCrossEntropyWithLogitsBackward0 object at 0x7f779c4dcc10>

Computing Gradients

To optimize weights of parameters in the neural network, we need to compute the derivatives of our loss function with respect to parameters, namely, we need ∂loss/∂w and ∂loss/∂b under some fixed values of x and y. To compute those derivatives, we call loss.backward(), and then retrieve the values from w.grad and b.grad:

为了优化神经网络中的参数权重，我们需要计算损失函数相对于参数的导数，即在某些固定的 x 和 y 值下，我们需要 ∂loss/∂w 和 ∂loss/∂b。要计算这些导数，我们需要调用 loss.backward()，然后从 w.grad 和 b.grad 中获取数值：

loss.backward()
print(w.grad)
print(b.grad)

tensor([[0.3313, 0.0626, 0.2530],
        [0.3313, 0.0626, 0.2530],
        [0.3313, 0.0626, 0.2530],
        [0.3313, 0.0626, 0.2530],
        [0.3313, 0.0626, 0.2530]])
tensor([0.3313, 0.0626, 0.2530])

NOTE

We can only obtain the grad properties for the leaf nodes of the computational graph, which have requires_grad property set to True. For all other nodes in our graph, gradients will not be available.
We can only perform gradient calculations using backward once on a given graph, for performance reasons. If we need to do several backward calls on the same graph, we need to pass retain_graph=True to the backward call.
我们只能获取计算图中叶子节点的 grad 属性，这些节点的 requires_grad 属性设置为 True。对于计算图中的所有其他节点，梯度属性将不可用。
出于性能考虑，我们只能在给定图形上使用一次 backward 操作执行梯度计算。如果我们需要在同一图形上执行多次 backward 调用，则需要向 backward 调用传递 retain_graph=True 属性。

Disabling Gradient Tracking

By default, all tensors with requires_grad=True are tracking their computational history and support gradient computation. However, there are some cases when we do not need to do that, for example, when we have trained the model and just want to apply it to some input data, i.e. we only want to do forward computations through the network. We can stop tracking computations by surrounding our computation code with torch.no_grad() block:

默认情况下，所有 requires_grad=True 的张量都会跟踪其计算历史并支持梯度计算。不过，在某些情况下，我们并不需要这样做，例如，当我们已经训练好模型，只想将其应用于某些输入数据时，也就是说，我们只想通过网络进行前向计算。我们可以用 torch.no_grad() 代码块包围我们的计算代码，从而停止跟踪计算：

z = torch.matmul(x, w)+b
print(z.requires_grad)

with torch.no_grad(): # 停止跟踪其计算历史并支持梯度计算
    z = torch.matmul(x, w)+b
print(z.requires_grad)

True
False

Another way to achieve the same result is to use the detach() method on the tensor:

另一种实现相同效果的方法是在张量上使用 detach() 方法：

z = torch.matmul(x, w)+b
z_det = z.detach()
print(z_det.requires_grad)

False

There are reasons you might want to disable gradient tracking:

To mark some parameters in your neural network as frozen parameters.
To speed up computations when you are only doing forward pass, because computations on tensors that do not track gradients would be more efficient.

您可能出于某些原因需要禁用梯度跟踪：

将神经网络中的某些参数标记为冻结参数。
在只进行前向传递时加快计算速度，因为不跟踪梯度的张量计算效率更高。

More on Computational Graphs

Conceptually, autograd keeps a record of data (tensors) and all executed operations (along with the resulting new tensors) in a directed acyclic graph (DAG) consisting of Function objects. In this DAG, leaves are the input tensors, roots are the output tensors. By tracing this graph from roots to leaves, you can automatically compute the gradients using the chain rule.

In a forward pass, autograd does two things simultaneously:

run the requested operation to compute a resulting tensor
maintain the operation’s gradient function in the DAG.

The backward pass kicks off when .backward() is called on the DAG root. autograd then:

computes the gradients from each .grad_fn,
accumulates them in the respective tensor’s .grad attribute
using the chain rule, propagates all the way to the leaf tensors.

从概念上讲，autograd 将数据（张量）和所有已执行的操作（以及产生的新张量）记录在由 Function 对象组成的有向无环图（DAG）中。在这个 DAG 中，叶是输入张量，根是输出张量。通过从根追踪到叶的图，可以使用链式规则自动计算梯度。

在前向传递中，autograd 会同时做两件事：

运行所请求的运算，计算得到的张量
在 DAG 中保留操作的梯度函数。

在 DAG 根上调用.backward()时，后向传递开始。然后，autograd：

计算每个 .grad_fn 的梯度、
将其累积到相应张量的 .grad 属性中
利用链式规则，一直传播到叶子张量。

NOTE

DAGs are dynamic in PyTorch An important thing to note is that the graph is recreated from scratch; after each .backward() call, autograd starts populating a new graph. This is exactly what allows you to use control flow statements in your model; you can change the shape, size and operations at every iteration if needed.

在 PyTorch 中，需要注意的是DAG 是动态的，图形是从头开始创建的；每次调用 .backward() 之后，autograd 都会开始填充一个新的图形。这正是你在模型中使用控制流语句的原因；如果需要，你可以在每次迭代时改变图形的形状、大小和操作。

Optional Reading: Tensor Gradients and Jacobian Products

In many cases, we have a scalar loss function, and we need to compute the gradient with respect to some parameters. However, there are cases when the output function is an arbitrary tensor. In this case, PyTorch allows you to compute so-called Jacobian product, and not the actual gradient.

For a vector function �⃗=�(�⃗)y=f(x), where �⃗=⟨�1,…,��⟩x=⟨x1,…,x**n⟩ and �⃗=⟨�1,…,��⟩y=⟨y1,…,y**m⟩, a gradient of �⃗y with respect to �⃗x is given by Jacobian matrix:

�=(∂�1∂�1⋯∂�1∂��⋮⋱⋮∂��∂�1⋯∂��∂��)J=∂x1∂y1⋮∂x1∂y**m⋯⋱⋯∂x**n∂y1⋮∂x**n∂y**m

Instead of computing the Jacobian matrix itself, PyTorch allows you to compute Jacobian Product ��⋅�v**T⋅J for a given input vector �=(�1…��)v=(v1…v**m). This is achieved by calling backward with �v as an argument. The size of �v should be the same as the size of the original tensor, with respect to which we want to compute the product:

在很多情况下，我们有一个标量损失函数，需要计算相对于某些参数的梯度。然而，在某些情况下，输出函数是一个任意的张量。在这种情况下，PyTorch 允许你计算所谓的雅各布乘积，而不是实际的梯度。

对于矢量函数 y =f( x )，其中 x =⟨x 1,…x n，y =⟨y 1,…y m⟩，y 关于 x 的梯度为 x 由雅各布矩阵给出：

PyTorch 不计算雅各布矩阵本身，而是允许你计算给定输入向量 v=(v 1 …v m ) 的雅各布积 v T⋅J 。这可以通过调用以 v 为参数的 backward 来实现。v 的大小 v 的大小应与我们要计算乘积的原始张量的大小相同：

inp = torch.eye(4, 5, requires_grad=True)
out = (inp+1).pow(2).t()
out.backward(torch.ones_like(out), retain_graph=True)
print(f"First call\n{inp.grad}")
out.backward(torch.ones_like(out), retain_graph=True)
print(f"\nSecond call\n{inp.grad}")
inp.grad.zero_()
out.backward(torch.ones_like(out), retain_graph=True)
print(f"\nCall after zeroing gradients\n{inp.grad}")

First call
tensor([[4., 2., 2., 2., 2.],
        [2., 4., 2., 2., 2.],
        [2., 2., 4., 2., 2.],
        [2., 2., 2., 4., 2.]])

Second call
tensor([[8., 4., 4., 4., 4.],
        [4., 8., 4., 4., 4.],
        [4., 4., 8., 4., 4.],
        [4., 4., 4., 8., 4.]])

Call after zeroing gradients
tensor([[4., 2., 2., 2., 2.],
        [2., 4., 2., 2., 2.],
        [2., 2., 4., 2., 2.],
        [2., 2., 2., 4., 2.]])

Notice that when we call backward for the second time with the same argument, the value of the gradient is different. This happens because when doing backward propagation, PyTorch accumulates the gradients, i.e. the value of computed gradients is added to the grad property of all leaf nodes of computational graph. If you want to compute the proper gradients, you need to zero out the grad property before. In real-life training an optimizer helps us to do this.

请注意，当我们使用相同的参数第二次调用 backward 时，梯度值是不同的。这是因为在进行 backward 时，PyTorch 会累积梯度，也就是说，计算出的梯度值会添加到计算图中所有叶节点的 grad 属性中。如果你想计算适当的梯度，就需要先将 grad 属性清零。在实际训练中，优化器可以帮助我们做到这一点。

NOTE

Previously we were calling backward() function without parameters. This is essentially equivalent to calling backward(torch.tensor(1.0)), which is a useful way to compute the gradients in case of a scalar-valued function, such as loss during neural network training.

在此之前，我们调用的是不带参数的 backward() 函数。这本质上等同于调用 backward(torrent.tensor(1.0))，这是在标量值函数（如神经网络训练过程中的损失）情况下计算梯度的有用方法。

Optimizing Model Parameters

Now that we have a model and data it’s time to train, validate and test our model by optimizing its parameters on our data. Training a model is an iterative process; in each iteration the model makes a guess about the output, calculates the error in its guess (loss), collects the derivatives of the error with respect to its parameters (as we saw in the previous section), and optimizes these parameters using gradient descent. For a more detailed walkthrough of this process, check out this video on backpropagation from 3Blue1Brown.

现在我们有了模型和数据，是时候通过优化数据参数来训练、验证和测试我们的模型了。训练模型是一个迭代的过程；在每一次迭代中，模型都会对输出进行猜测，计算猜测的误差（损失），收集误差相对于其参数的导数（正如我们在上一节)中所看到的），并使用梯度下降法优化这些参数。如需了解这一过程的更多细节，请观看 3Blue1Brown 提供的反向传播视频。

Prerequisite Code

We load the code from the previous sections on Datasets & DataLoaders and Build Model.

我们从前面的数据集和数据加载器以及构建模型部分加载代码。

import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

train_dataloader = DataLoader(training_data, batch_size=64)
test_dataloader = DataLoader(test_data, batch_size=64)

class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork()

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data/FashionMNIST/raw/train-images-idx3-ubyte.gz

  0%|          | 0/26421880 [00:00<?, ?it/s]
  0%|          | 65536/26421880 [00:00<01:12, 364111.90it/s]
  1%|          | 229376/26421880 [00:00<00:38, 681479.44it/s]
  3%|3         | 917504/26421880 [00:00<00:12, 2103174.71it/s]
 13%|#3        | 3440640/26421880 [00:00<00:03, 6759570.42it/s]
 31%|###1      | 8290304/26421880 [00:00<00:01, 16651828.79it/s]
 44%|####3     | 11599872/26421880 [00:00<00:00, 17594527.13it/s]
 66%|######6   | 17465344/26421880 [00:01<00:00, 22838080.63it/s]
 87%|########7 | 23101440/26421880 [00:01<00:00, 30171764.12it/s]
100%|##########| 26421880/26421880 [00:01<00:00, 18202100.72it/s]
Extracting data/FashionMNIST/raw/train-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw/train-labels-idx1-ubyte.gz

  0%|          | 0/29515 [00:00<?, ?it/s]
100%|##########| 29515/29515 [00:00<00:00, 325831.74it/s]
Extracting data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz

  0%|          | 0/4422102 [00:00<?, ?it/s]
  1%|1         | 65536/4422102 [00:00<00:12, 361485.49it/s]
  5%|5         | 229376/4422102 [00:00<00:06, 682060.98it/s]
 21%|##1       | 950272/4422102 [00:00<00:01, 2185867.00it/s]
 87%|########6 | 3833856/4422102 [00:00<00:00, 7596041.62it/s]
100%|##########| 4422102/4422102 [00:00<00:00, 6075106.42it/s]
Extracting data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz

  0%|          | 0/5148 [00:00<?, ?it/s]
100%|##########| 5148/5148 [00:00<00:00, 41683932.42it/s]
Extracting data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw

Hyperparameters

Hyperparameters are adjustable parameters that let you control the model optimization process. Different hyperparameter values can impact model training and convergence rates (read more about hyperparameter tuning)

We define the following hyperparameters for training:

Number of Epochs - the number times to iterate over the dataset
Batch Size - the number of data samples propagated through the network before the parameters are updated
Learning Rate - how much to update models parameters at each batch/epoch. Smaller values yield slow learning speed, while large values may result in unpredictable behavior during training.

超参数是可调参数，可让您控制模型优化过程。不同的超参数值会影响模型训练和收敛速度（了解更多有关超参数调整的信息）

我们为训练定义了以下超参数：

Epoch数 - 在数据集上迭代的次数
Batch大小–参数更新前通过网络传播的数据样本数量
学习率 - 每个batch/epoch更新模型参数的数量。数值越小，学习速度越慢，而数值过大则可能导致训练过程中出现不可预测的行为。

learning_rate = 1e-3 # 每个批次/时区更新模型参数的数
batch_size = 64 # 参数更新前通过网络传播的数据样本数量
epochs = 5 # 在数据集上迭代的次数

Optimization Loop

Once we set our hyperparameters, we can then train and optimize our model with an optimization loop. Each iteration of the optimization loop is called an epoch.

Each epoch consists of two main parts:

The Train Loop - iterate over the training dataset and try to converge to optimal parameters.
The Validation/Test Loop - iterate over the test dataset to check if model performance is improving.

Let’s briefly familiarize ourselves with some of the concepts used in the training loop. Jump ahead to see the Full Implementation of the optimization loop.

设置好超参数后，我们就可以通过优化循环来训练和优化模型。优化循环的每次迭代称为一个epoch。

每个epoch由两个主要部分组成：

训练循环–迭代训练数据集，并尝试收敛到最佳参数。
验证/测试循环–迭代测试数据集，检查模型性能是否在提高。

让我们简单熟悉一下训练循环中使用的一些概念。跳转查看优化循环的完整实现。

Loss Function

When presented with some training data, our untrained network is likely not to give the correct answer. Loss function measures the degree of dissimilarity of obtained result to the target value, and it is the loss function that we want to minimize during training. To calculate the loss we make a prediction using the inputs of our given data sample and compare it against the true data label value.

Common loss functions include nn.MSELoss (Mean Square Error) for regression tasks, and nn.NLLLoss (Negative Log Likelihood) for classification. nn.CrossEntropyLoss combines nn.LogSoftmax and nn.NLLLoss.

We pass our model’s output logits to nn.CrossEntropyLoss, which will normalize the logits and compute the prediction error.

当遇到一些训练数据时，我们未经训练的网络很可能无法给出正确答案。损失函数衡量的是获得的结果与目标值的不相似程度，我们希望在训练过程中将损失函数最小化。为了计算损失，我们使用给定数据样本的输入进行预测，并与真实的数据标签值进行比较。

常见的损失函数包括用于回归任务的 nn.MSELoss（均方误差）和用于分类任务的 nn.NLLLoss（负对数似然）。nn.CrossEntropyLoss 结合了 nn.LogSoftmax 和 nn.NLLLoss。

我们将模型的输出对数传递给 nn.CrossEntropyLoss，它将对对数标准化并计算预测误差。

loss_fn = nn.CrossEntropyLoss() # 初始化损失函数

Optimizer

Optimization is the process of adjusting model parameters to reduce model error in each training step. Optimization algorithms define how this process is performed (in this example we use Stochastic Gradient Descent). All optimization logic is encapsulated in the optimizer object. Here, we use the SGD optimizer; additionally, there are many different optimizers available in PyTorch such as ADAM and RMSProp, that work better for different kinds of models and data.

We initialize the optimizer by registering the model’s parameters that need to be trained, and passing in the learning rate hyperparameter.

优化是在每个训练步骤中调整模型参数以减少模型误差的过程。优化算法定义了如何执行这一过程（在本例中，我们使用随机梯度下降算法）。所有优化逻辑都封装在优化器对象中。在这里，我们使用 SGD 优化器；此外，PyTorch 中还有许多不同的优化器，如 ADAM 和 RMSProp，它们对不同类型的模型和数据效果更好。

我们通过注册需要训练的模型参数并传递学习率超参数来初始化优化器。

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate) # 初始化优化器

Inside the training loop, optimization happens in three steps:

Call optimizer.zero_grad() to reset the gradients of model parameters. Gradients by default add up; to prevent double-counting, we explicitly zero them at each iteration.
Backpropagate the prediction loss with a call to loss.backward(). PyTorch deposits the gradients of the loss w.r.t. each parameter.
Once we have our gradients, we call optimizer.step() to adjust the parameters by the gradients collected in the backward pass.

在训练循环中，优化分三步进行：

调用 optimizer.zero_grad() 重置模型参数的梯度。梯度默认是累加的；为了防止重复计算，我们在每次迭代时都会明确地将梯度清零。
调用 loss.backward() 反向传播预测损失。PyTorch 会存入每个参数的损失梯度。
获得梯度后，我们调用 optimizer.step() 根据反向传播中收集的梯度调整参数。

Full Implementation

We define train_loop that loops over our optimization code, and test_loop that evaluates the model’s performance against our test data.

我们定义了 train_loop（循环优化代码）和 test_loop（根据测试数据评估模型性能）。

def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    # 将模型设置为训练模式--这对批量归一化和滤除层非常重要
    # 在这种情况下没有必要，但为最佳做法而添加
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        # 计算预测值和损失
        pred = model(X)
        loss = loss_fn(pred, y)

        loss.backward() # 反向传播预测损失
        optimizer.step() # 根据反向传播中收集的梯度调整参数
        optimizer.zero_grad() # 重置模型参数的梯度

        if batch % 100 == 0:
            loss, current = loss.item(), batch * batch_size + len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")


def test_loop(dataloader, model, loss_fn):
    # 将模型设置为评估模式--这对批量归一化和滤波层非常重要
    # 在这种情况下没有必要，但为最佳做法而添加
    model.eval()
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss, correct = 0, 0

    # 使用 torch.no_grad() 对模型进行评估，可确保在测试模式下不计算梯度
    # 还能减少不必要的梯度计算，并降低requires_grad=True 的张量的内存使用量。
    with torch.no_grad():
        for X, y in dataloader:
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

We initialize the loss function and optimizer, and pass it to train_loop and test_loop. Feel free to increase the number of epochs to track the model’s improving performance.

我们将初始化损失函数和优化器，并将其传递给 train_loop 和 test_loop。请随意增加epoch次数，以跟踪模型性能的提高。

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

epochs = 10
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer) # 数据，模型、损失函数、优化器
    test_loop(test_dataloader, model, loss_fn) # 数据，模型、损失函数
print("Done!")

Epoch 1
-------------------------------
loss: 2.298730  [   64/60000]
loss: 2.289123  [ 6464/60000]
loss: 2.273286  [12864/60000]
loss: 2.269406  [19264/60000]
loss: 2.249603  [25664/60000]
loss: 2.229407  [32064/60000]
loss: 2.227368  [38464/60000]
loss: 2.204261  [44864/60000]
loss: 2.206193  [51264/60000]
loss: 2.166651  [57664/60000]
Test Error:
 Accuracy: 50.9%, Avg loss: 2.166725

Epoch 2
-------------------------------
loss: 2.176750  [   64/60000]
loss: 2.169595  [ 6464/60000]
loss: 2.117500  [12864/60000]
loss: 2.129272  [19264/60000]
loss: 2.079674  [25664/60000]
loss: 2.032928  [32064/60000]
loss: 2.050115  [38464/60000]
loss: 1.985236  [44864/60000]
loss: 1.987887  [51264/60000]
loss: 1.907162  [57664/60000]
Test Error:
 Accuracy: 55.9%, Avg loss: 1.915486

Epoch 3
-------------------------------
loss: 1.951612  [   64/60000]
loss: 1.928685  [ 6464/60000]
loss: 1.815709  [12864/60000]
loss: 1.841552  [19264/60000]
loss: 1.732467  [25664/60000]
loss: 1.692914  [32064/60000]
loss: 1.701714  [38464/60000]
loss: 1.610632  [44864/60000]
loss: 1.632870  [51264/60000]
loss: 1.514263  [57664/60000]
Test Error:
 Accuracy: 58.8%, Avg loss: 1.541525

Epoch 4
-------------------------------
loss: 1.616448  [   64/60000]
loss: 1.582892  [ 6464/60000]
loss: 1.427595  [12864/60000]
loss: 1.487950  [19264/60000]
loss: 1.359332  [25664/60000]
loss: 1.364817  [32064/60000]
loss: 1.371491  [38464/60000]
loss: 1.298706  [44864/60000]
loss: 1.336201  [51264/60000]
loss: 1.232145  [57664/60000]
Test Error:
 Accuracy: 62.2%, Avg loss: 1.260237

Epoch 5
-------------------------------
loss: 1.345538  [   64/60000]
loss: 1.327798  [ 6464/60000]
loss: 1.153802  [12864/60000]
loss: 1.254829  [19264/60000]
loss: 1.117322  [25664/60000]
loss: 1.153248  [32064/60000]
loss: 1.171765  [38464/60000]
loss: 1.110263  [44864/60000]
loss: 1.154467  [51264/60000]
loss: 1.070921  [57664/60000]
Test Error:
 Accuracy: 64.1%, Avg loss: 1.089831

Epoch 6
-------------------------------
loss: 1.166889  [   64/60000]
loss: 1.170514  [ 6464/60000]
loss: 0.979435  [12864/60000]
loss: 1.113774  [19264/60000]
loss: 0.973411  [25664/60000]
loss: 1.015192  [32064/60000]
loss: 1.051113  [38464/60000]
loss: 0.993591  [44864/60000]
loss: 1.039709  [51264/60000]
loss: 0.971077  [57664/60000]
Test Error:
 Accuracy: 65.8%, Avg loss: 0.982440

Epoch 7
-------------------------------
loss: 1.045165  [   64/60000]
loss: 1.070583  [ 6464/60000]
loss: 0.862304  [12864/60000]
loss: 1.022265  [19264/60000]
loss: 0.885213  [25664/60000]
loss: 0.919528  [32064/60000]
loss: 0.972762  [38464/60000]
loss: 0.918728  [44864/60000]
loss: 0.961629  [51264/60000]
loss: 0.904379  [57664/60000]
Test Error:
 Accuracy: 66.9%, Avg loss: 0.910167

Epoch 8
-------------------------------
loss: 0.956964  [   64/60000]
loss: 1.002171  [ 6464/60000]
loss: 0.779057  [12864/60000]
loss: 0.958409  [19264/60000]
loss: 0.827240  [25664/60000]
loss: 0.850262  [32064/60000]
loss: 0.917320  [38464/60000]
loss: 0.868384  [44864/60000]
loss: 0.905506  [51264/60000]
loss: 0.856353  [57664/60000]
Test Error:
 Accuracy: 68.3%, Avg loss: 0.858248

Epoch 9
-------------------------------
loss: 0.889765  [   64/60000]
loss: 0.951220  [ 6464/60000]
loss: 0.717035  [12864/60000]
loss: 0.911042  [19264/60000]
loss: 0.786085  [25664/60000]
loss: 0.798370  [32064/60000]
loss: 0.874939  [38464/60000]
loss: 0.832796  [44864/60000]
loss: 0.863254  [51264/60000]
loss: 0.819742  [57664/60000]
Test Error:
 Accuracy: 69.5%, Avg loss: 0.818780

Epoch 10
-------------------------------
loss: 0.836395  [   64/60000]
loss: 0.910220  [ 6464/60000]
loss: 0.668506  [12864/60000]
loss: 0.874338  [19264/60000]
loss: 0.754805  [25664/60000]
loss: 0.758453  [32064/60000]
loss: 0.840451  [38464/60000]
loss: 0.806153  [44864/60000]
loss: 0.830360  [51264/60000]
loss: 0.790281  [57664/60000]
Test Error:
 Accuracy: 71.0%, Avg loss: 0.787271

Done!

Save and Load the Model

In this section we will look at how to persist model state with saving, loading and running model predictions.

在本节中，我们将了解如何通过保存、加载和运行模型预测来保持模型状态。

import torch
import torchvision.models as models

Saving and Loading Model Weights

PyTorch models store the learned parameters in an internal state dictionary, called state_dict. These can be persisted via the torch.save method:

PyTorch 模型将学习到的参数存储在内部状态字典中，称为 state_dict。这些参数可以通过 torch.save 方法持久化：

model = models.vgg16(weights='IMAGENET1K_V1')
torch.save(model.state_dict(), 'model_weights.pth')

Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /root/.cache/torch/hub/checkpoints/vgg16-397923af.pth

  0%|          | 0.00/528M [00:00<?, ?B/s]
  2%|2         | 13.2M/528M [00:00<00:03, 138MB/s]
  5%|5         | 26.9M/528M [00:00<00:03, 142MB/s]
  8%|7         | 40.7M/528M [00:00<00:03, 143MB/s]
 10%|#         | 54.5M/528M [00:00<00:03, 144MB/s]
 13%|#2        | 68.2M/528M [00:00<00:03, 144MB/s]
 16%|#5        | 82.0M/528M [00:00<00:03, 144MB/s]
 18%|#8        | 95.8M/528M [00:00<00:03, 144MB/s]
 21%|##        | 110M/528M [00:00<00:03, 144MB/s]
 23%|##3       | 123M/528M [00:00<00:02, 144MB/s]
 26%|##6       | 137M/528M [00:01<00:02, 145MB/s]
 29%|##8       | 151M/528M [00:01<00:02, 145MB/s]
 31%|###1      | 165M/528M [00:01<00:02, 144MB/s]
 34%|###3      | 179M/528M [00:01<00:02, 144MB/s]
 36%|###6      | 192M/528M [00:01<00:02, 126MB/s]
 39%|###9      | 206M/528M [00:01<00:02, 130MB/s]
 42%|####1     | 219M/528M [00:01<00:02, 133MB/s]
 44%|####4     | 233M/528M [00:01<00:02, 136MB/s]
 47%|####6     | 246M/528M [00:01<00:02, 137MB/s]
 49%|####9     | 260M/528M [00:01<00:02, 139MB/s]
 52%|#####1    | 274M/528M [00:02<00:01, 141MB/s]
 54%|#####4    | 288M/528M [00:02<00:01, 142MB/s]
 57%|#####7    | 301M/528M [00:02<00:01, 143MB/s]
 60%|#####9    | 315M/528M [00:02<00:01, 144MB/s]
 62%|######2   | 329M/528M [00:02<00:01, 144MB/s]
 65%|######4   | 343M/528M [00:02<00:01, 144MB/s]
 68%|######7   | 357M/528M [00:02<00:01, 144MB/s]
 70%|#######   | 371M/528M [00:02<00:01, 145MB/s]
 73%|#######2  | 385M/528M [00:02<00:01, 145MB/s]
 76%|#######5  | 398M/528M [00:02<00:00, 145MB/s]
 78%|#######8  | 412M/528M [00:03<00:00, 145MB/s]
 81%|########  | 426M/528M [00:03<00:00, 145MB/s]
 83%|########3 | 440M/528M [00:03<00:00, 145MB/s]
 86%|########6 | 454M/528M [00:03<00:00, 145MB/s]
 89%|########8 | 468M/528M [00:03<00:00, 145MB/s]
 91%|#########1| 482M/528M [00:03<00:00, 145MB/s]
 94%|#########3| 496M/528M [00:03<00:00, 145MB/s]
 97%|#########6| 509M/528M [00:03<00:00, 145MB/s]
 99%|#########9| 523M/528M [00:03<00:00, 145MB/s]
100%|##########| 528M/528M [00:03<00:00, 143MB/s]

To load model weights, you need to create an instance of the same model first, and then load the parameters using load_state_dict() method.

要加载模型权重，需要先创建一个相同模型的实例，然后使用 load_state_dict() 方法加载参数。

model = models.vgg16() # 创建一个相同模型的实例
model.load_state_dict(torch.load('model_weights.pth')) # 加载参数
model.eval()

VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (18): ReLU(inplace=True)
    (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (20): ReLU(inplace=True)
    (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (22): ReLU(inplace=True)
    (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (25): ReLU(inplace=True)
    (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (27): ReLU(inplace=True)
    (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (29): ReLU(inplace=True)
    (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
  (classifier): Sequential(
    (0): Linear(in_features=25088, out_features=4096, bias=True)
    (1): ReLU(inplace=True)
    (2): Dropout(p=0.5, inplace=False)
    (3): Linear(in_features=4096, out_features=4096, bias=True)
    (4): ReLU(inplace=True)
    (5): Dropout(p=0.5, inplace=False)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

NOTE

be sure to call model.eval() method before inferencing to set the dropout and batch normalization layers to evaluation mode. Failing to do this will yield inconsistent inference results.

请务必在推理之前调用 model.eval() 方法，将滤除层和批次归一化层设置为评估模式。否则会导致推理结果不一致。

Saving and Loading Models with Shapes

When loading model weights, we needed to instantiate the model class first, because the class defines the structure of a network. We might want to save the structure of this class together with the model, in which case we can pass model (and not model.state_dict()) to the saving function:

在加载模型权重时，我们需要先实例化模型类，因为该类定义了网络的结构。我们可能希望将该类的结构与模型一起保存，在这种情况下，我们可以将 model（而不是 model.state_dict()）传递给保存函数：

torch.save(model, 'model.pth')

We can then load the model like this:

model = torch.load('model.pth')

NOTE

This approach uses Python pickle module when serializing the model, thus it relies on the actual class definition to be available when loading the model.

这种方法在序列化模型时使用 Python pickle 模块，因此在加载模型时依赖于实际的类定义。