【PyTorch】教程：学习基础知识-(5) 构建模型

黄金旺铺

于 2023-01-17 17:23:01 发布

阅读量341

点赞数 1

分类专栏： PyTorch 文章标签： pytorch 深度学习学习

by: JW

本文链接：https://blog.csdn.net/zhoujinwang/article/details/128718777

版权

PyTorch 专栏收录该内容

46 篇文章 10 订阅

订阅专栏

BUILD THE NEURAL NETWORK (构建神经网络)

神经网络由 layers/modules 组成，torch.nn 提供了所有的你需要构建自己的神经网络的 blocks ，每个 module 都在 PyTorch 子类 nn.Module 找到。神经网络本身就是一个 module , 由其他的 modules (layers) 组成，这种嵌套的结构允许轻松的构建和管理复杂的框架结构。

在下面的部分中，我们将构建一个神经网络来对 FashionMNIST 数据集中的图像进行分类。

Get Device for Training

如果 GPU 可用的话，我们希望可以用 GPU 训练自己的模型。用 torch.cuda 判断是否可用，否则就使用 CPU 。

import os 
import torch 
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device {device}")

Using device cuda

Define the Class (定义神经网络类)

我们定义一个神经网络继承 nn.Module, 用 __init__ 初始化神经网络层，每一个 nn.Module 子类都实现了对数据进行前向推理的操作。

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLu(),
            nn.Linear(512, 512),
            nn.ReLu(),
            nn.Linear(512, 10)
        )
  
    def forward(self, x):
        out = self.flatten(x)
        out = self.linear_relu_stack(out)
        return out

创建一个 NeuralNetwork 的实例，并且将它移动到 GPU 上，打印它的网络结构。

model = NeuralNetwork().to(device)
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)

为了使用该模型，我们将输入数据传递给它，这将执行模型的 forward, 以及一些后台操作，不要直接调用 model.forward() 方法。

在输入上调用模型返回一个 2D tensor， dim=0 对应每个类的 10 个原始预测值的每个输出， dim=1 对应于每个输出的单个值，通过 nn.Softmax module 获得预测概率。

X = torch.rand(1, 28, 28, device=device)
logits = model(X)
pred_prob = nn.Softmax(dim=1)(logits)
y_pred = pred_prob.argmax(dim=1, keepdim=True)
print(f"Pred class is: 	{y_pred}")

Pred class is: 	tensor([[2]], device='cuda:0')

Model Layers

让我们分解 FashionMNIST 模型 NeuralNetwork 中的层次。为了说明它，我们将取 3 张大小为 28x28 的小批量样本图像，看看当我们通过网络时它会发生什么。

input_img = torch.rand(3, 28, 28)
print(input_img.shape)

torch.Size([3, 28, 28])

nn.Flatten

我们初始化 nn.Flatten 层，将 2D 28*28 图像转换成 784 个像素值的连续数组（mimibatch 维度保持在 dim=0 ）。

flatten = nn.Flatten()
flat_img = flatten(input_img)
print(flat_img.size())

torch.Size([3, 784])

nn.Linear

linear layer使用 weights 和 biases 对输入进行线性变换。

layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1 = layer1(flat_img)
print(hidden1.size())

torch.Size([3, 20])

nn.ReLU

nn.ReLU 非线性激活是在模型的输入和输出之间创建复杂映射的原因。它们应用于线性变换后引入非线性，帮助神经网络学习各种各样的现象。

在这个模型中，我们使用 nn.ReLU 在线性变换层之间，但是在我们的模型中也可以引入其他的非线性变换激活。

print(f"Before ReLU: {hidden1}\n\n")
hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU: {hidden1}\n\n")

Before ReLU: tensor([[-0.0367,  0.3163,  0.0881,  0.2232,  0.1157, -0.3647,  0.0805,  0.0090,
          0.0461, -0.1162, -0.3473,  0.2841, -0.1417, -0.2945,  0.1798,  0.5501,
         -0.2469, -0.0063,  0.0307, -0.1507],
        [ 0.5187, -0.0185, -0.0256,  0.7534,  0.1288, -0.2804,  0.1453,  0.1756,
         -0.0184, -0.2140, -0.2859,  0.1720, -0.2303, -0.4770,  0.3340,  0.1503,
         -0.0094, -0.1594, -0.1525, -0.3292],
        [ 0.0602,  0.2002,  0.4833,  0.5924,  0.2490, -0.1533,  0.2179,  0.1196,
          0.1260, -0.3004,  0.0269,  0.1738, -0.1820, -0.1302, -0.0691,  0.0008,
         -0.4413,  0.2133, -0.4409, -0.6039]], grad_fn=<AddmmBackward>)


After ReLU: tensor([[0.0000, 0.3163, 0.0881, 0.2232, 0.1157, 0.0000, 0.0805, 0.0090, 0.0461,
         0.0000, 0.0000, 0.2841, 0.0000, 0.0000, 0.1798, 0.5501, 0.0000, 0.0000,
         0.0307, 0.0000],
        [0.5187, 0.0000, 0.0000, 0.7534, 0.1288, 0.0000, 0.1453, 0.1756, 0.0000,
         0.0000, 0.0000, 0.1720, 0.0000, 0.0000, 0.3340, 0.1503, 0.0000, 0.0000,
         0.0000, 0.0000],
        [0.0602, 0.2002, 0.4833, 0.5924, 0.2490, 0.0000, 0.2179, 0.1196, 0.1260,
         0.0000, 0.0269, 0.1738, 0.0000, 0.0000, 0.0000, 0.0008, 0.0000, 0.2133,
         0.0000, 0.0000]], grad_fn=<ReluBackward0>)

nn.Sequential

nn.Sequential 是 modules 的有序容器。数据按照定义的顺序在所有模块中传递，可以使用顺序容器就像 seq_modules 那样组合一个快速网络。

seq_modules = nn.Sequential(
    flatten,
    layer1,
    nn.ReLU(),
    nn.Linear(in_features=20, out_features=10)
)

input_img = torch.rand(3, 28, 28)
logits = seq_modules(input_img)
print(logits)

tensor([[-0.1824, -0.1404, -0.1742,  0.0127,  0.0511,  0.0693, -0.3450, -0.3160,
         -0.0233,  0.3087],
        [-0.1278, -0.0717, -0.3314,  0.0193,  0.0067,  0.0356, -0.2519, -0.1700,
          0.0270,  0.1292],
        [-0.0425, -0.2031, -0.1655,  0.0073, -0.0375,  0.0483, -0.4135, -0.2170,
         -0.0867,  0.1949]], grad_fn=<AddmmBackward>)

nn.Softmax

nn.Softmax 神经网络的最后一个线性层返回的 logits ( [-infty, infty] ) , 这些值被传递给 nn.Softmax 模块，并被缩放到 [0, 1] ，表示模型对每个类的预测概率。 dim 参数表示该维度的所有值之和必须为 1 。

softmax = nn.Softmax(dim=1)
pred_prob = softmax(logits)
print(pred_prob)

tensor([[0.0881, 0.0919, 0.0889, 0.1071, 0.1113, 0.1134, 0.0749, 0.0771, 0.1033,
         0.1440],
        [0.0938, 0.0993, 0.0766, 0.1087, 0.1074, 0.1105, 0.0829, 0.0900, 0.1096,
         0.1213],
        [0.1037, 0.0883, 0.0917, 0.1090, 0.1042, 0.1136, 0.0716, 0.0871, 0.0992,
         0.1315]], grad_fn=<SoftmaxBackward>)

Model Parameters

神经网络中的许多层都是参数化的，包括训练时优化的权重和偏差，nn.Module 跟踪模型中定义的所有字段，所有的参数都可以通过模型的 parameters() 或 named_parameters() 方法访问。

在本例中，我们遍历每个参数，并打印其大小和其值的预览。

print(f"Model Structure:  {model}\n\n")

for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values: {param[:2]}\n")

Output exceeds the size limit. Open the full output data in a text editor
Model Structure:  NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values: tensor([[-0.0337,  0.0242,  0.0015,  ...,  0.0034, -0.0111,  0.0071],
        [-0.0075,  0.0029,  0.0037,  ...,  0.0149,  0.0340, -0.0011]],
       device='cuda:0', grad_fn=<SliceBackward>)

Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values: tensor([-0.0159, -0.0343], device='cuda:0', grad_fn=<SliceBackward>)

Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values: tensor([[ 0.0062, -0.0328,  0.0075,  ..., -0.0160, -0.0285, -0.0083],
        [ 0.0205, -0.0440, -0.0046,  ..., -0.0333, -0.0344,  0.0399]],
       device='cuda:0', grad_fn=<SliceBackward>)

Layer: linear_relu_stack.2.bias | Size: torch.Size([512]) | Values: tensor([-0.0050,  0.0112], device='cuda:0', grad_fn=<SliceBackward>)

Layer: linear_relu_stack.4.weight | Size: torch.Size([10, 512]) | Values: tensor([[-0.0248, -0.0091, -0.0212,  ..., -0.0166, -0.0043,  0.0404],
...
       device='cuda:0', grad_fn=<SliceBackward>)

Layer: linear_relu_stack.4.bias | Size: torch.Size([10]) | Values: tensor([-0.0019,  0.0224], device='cuda:0', grad_fn=<SliceBackward>)