PyTorch官方Tutorials
跟着PyTorch官方Tutorials码的,便于理解自己稍有改动代码并添加注释,IDE用的jupyter notebook
链接:Build Model
BUILD THE NEURAL NETWORK
Neural networks comprise of layers/modules that perform operations on data. The torch.nn namespace provides all the building blocks you need to build your own neural network. Every module in PyTorch subclasses the nn.Module. A neural network is a module itself that consists of other modules (layers). This nested(嵌套) structure allows for building and managing complex architectures easily.
神经网络由对数据进行操作的层/模块构成,torch.nn提供构建模型得所有基础块,pytorch中的所有模块都是nn.Module的子类.神经网络本身就是一个由其他模块/层构成的模块,这种嵌套结构使得构建与管理复杂结构更加简单.
import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets,transforms
Get Device for Training
We want to be able to train our model on a hardware accelerator like the GPU, if it is available. Let’s check to see if torch.cuda is available, else we continue to use the CPU.
通过torch.cuda来查看GPU是否能用
#device为字符串类型 描述使用的设备
#通过torch.cuda.is_available()来判断它是'gpu'还是'cpu'
device='cuda' if torch.cuda.is_available() else 'cpu'
print(f"Using {device} device")
Using cuda device
Define the Class
We define our neural network by subclassing nn.Module, and initialize the neural network layers in init
. Every nn.Module
subclass implements the operations on input data in the >forward
method.
通过继承nn.Module来自定义神经网络 并用__init__方法对其进行初始化.
nn.Module的每个子类都实现了在forward方法中对输入数据的操作
#从nn.Module类继承实现自定义神经网络类NeuralNetwork
class NeuralNetwork(nn.Module):
#首先定义__init__(self) 即类的初始化函数
def __init__(self):
#先调用父类的__init__()函数
super(NeuralNetwork,self).__init__()
#torch.nn.Flatten(start_dim=1,end_dim=-1)
#Flattens a contiguous range of dims into a tensor.
#For use with Sequential.
#将相邻维度的数据推平后给Sequential中的层使用
self.flatten=nn.Flatten()
#通过nn.Sequential自定义网络结构
self.linear_relu_stack=nn.Sequential(
nn.Linear(28*28,512),
nn.ReLU(),
nn.Linear(512,512),
nn.ReLU(),
nn.Linear(512,18),
nn.ReLU()
)
#定义forward方法
def forward(self,x):
x=self.flatten(x)
logits=self.linear_relu_stack(x)
return logits
x=torch.rand(2,3,4)
print('x=',x)
print(x.size(),'\n')
x_flatten=torch.flatten(x,start_dim=1,end_dim=-1)
print('x_flatten=',x_flatten)
print(x_flatten.size())
#第一个维度2不动 后面3*4合并成 1*(3*4)=1*12
x= tensor([[[0.1505, 0.8544, 0.2890, 0.9660],
[0.2210, 0.8575, 0.7099, 0.2883],
[0.2932, 0.9873, 0.0312, 0.6511]],
[[0.0240, 0.4212, 0.3542, 0.1598],
[0.6857, 0.8240, 0.8469, 0.9505],
[0.5800, 0.0189, 0.6557, 0.3006]]])
torch.Size([2, 3, 4])
x_flatten= tensor([[0.1505, 0.8544, 0.2890, 0.9660, 0.2210, 0.8575, 0.7099, 0.2883, 0.2932,
0.9873, 0.0312, 0.6511],
[0.0240, 0.4212, 0.3542, 0.1598, 0.6857, 0.8240, 0.8469, 0.9505, 0.5800,
0.0189, 0.6557, 0.3006]])
torch.Size([2, 12])
We create an instance of NeuralNetwork
, and move it to the device
, and print its structure.
#NeuralNetwork().to(device)将模型转移到GPU
model=NeuralNetwork().to(device)
print(model)
NeuralNetwork(
(flatten): Flatten(start_dim=1, end_dim=-1)
(linear_relu_stack): Sequential(
(0): Linear(in_features=784, out_features=512, bias=True)
(1): ReLU()
(2): Linear(in_features=512, out_features=512, bias=True)
(3): ReLU()
(4): Linear(in_features=512, out_features=18, bias=True)
(5): ReLU()
)
)
To use the model, we pass it the input data. This executes the model’s forward
, along with some background operations. Do not call model.forward() directly!
Calling the model on the input returns a 10-dimensional tensor with raw predicted values for each class. We get the prediction probabilities by passing it through an instance of the nn.Softmax
module.
使用模型的时候需要传入数据来执行forward方法,不要直接调用model.forward()方法
调用模型后返回一个表示每个类别预测值的10维tensor,通过将其传入一个nn.Softmax模块的实例中可获得预测可能值
#X为随机生成的输入数据 维度为 1*28*28 得与模型相符合
X=torch.rand(1,28,28,device=device)
#print(X,'\n')
#logits记录模型的输出 其实这里相当于做了一次forward运算 进行了一次预测
logits=model(X)
#print(logits)
#使用softmax进行归一化 转化为预测多类别的概率
pred_probab=nn.Softmax(dim=1)(logits)
#取得概率最大的类别
y_pred=pred_probab.argmax(1)
print(f"Predicted class:{y_pred}")
Predicted class:tensor([11], device='cuda:0')
Model Layers
Let’s break down the layers in the FashionMNIST model. To illustrate it, we will take a sample minibatch of 3 images of size 28x28 and see what happens to it as we pass it through the network.
分析FashionMNIST模型里的各层 例子采用 3张28*28的图片 作为一个minibatch
input_image=torch.rand(3,28,28) #3*28*28 个数*(大小)
print(input_image.size())
torch.Size([3, 28, 28])
nn.Flatten
We initialize the nn.Flatten layer to convert each 2D 28x28 image into a contiguous array of 784 pixel values ( the minibatch dimension (at dim=0) is maintained).
将28*28的2D图像flatten成1*784的像素值 维度0保留 即还是3张图片 但是每张图片都被排成了以为像素序列
flatten=nn.Flatten() #python中的函数别名 默认start_dim=1 end_dim=-1
flat_image=flatten(input_image)
print(flat_image.size())
torch.Size([3, 784])
nn.Linear
The linear layer is a module that applies a linear transformation on the input using its stored weights and biases.
linear layer是一个用其自身存储的weight和biase提供线性变换的模块
#layer1即Linear层 输入28*28 输出1*20 不用管结构
layer1=nn.Linear(in_features=28*28,out_features=20)
hidden1=layer1(flat_image)
print(hidden1.size())
torch.Size([3, 20])
nn.ReLU
Non-linear activations are what create the complex mappings between the model’s inputs and outputs. They are applied after linear transformations to introduce nonlinearity, helping neural networks learn a wide variety of phenomena.
In this model, we use nn.ReLU between our linear layers, but there’s other activations to introduce non-linearity in your model.
非线性激活函数在模型输入与输出之间建立复杂的映射 用在线性变换之后来引入非线性 以此帮助神经网络学习广泛的现象
此模型中采用nn.ReLU激活函数
print(f"Before ReLu:{hidden1}\n\n")
hidden1=nn.ReLU()(hidden1) #注意这里有个空括号
print(f"After ReLU:{hidden1}")
Before ReLu:tensor([[ 0.0562, -0.6841, -0.6011, 0.4285, -0.0422, -0.0473, 0.2338, -0.0574,
-0.4924, -0.4876, 0.3885, 0.0639, -0.1186, -0.1115, 0.0675, -0.0918,
0.0319, -0.3195, -0.4133, 0.4193],
[ 0.1271, -0.5600, -0.8632, 0.1173, 0.1402, 0.1011, 0.4458, 0.1926,
-0.1148, -0.2310, 0.1389, 0.2500, -0.0864, -0.3241, 0.0325, -0.2667,
0.1259, -0.1373, -0.0743, 0.1030],
[ 0.2604, -0.1026, -0.8967, 0.0455, 0.2370, 0.1956, 0.2883, -0.1166,
-0.1664, -0.3983, 0.2908, 0.3343, -0.1429, -0.2169, 0.4081, -0.5183,
-0.1458, -0.2448, -0.2097, 0.1872]], grad_fn=<AddmmBackward>)
After ReLU:tensor([[0.0562, 0.0000, 0.0000, 0.4285, 0.0000, 0.0000, 0.2338, 0.0000, 0.0000,
0.0000, 0.3885, 0.0639, 0.0000, 0.0000, 0.0675, 0.0000, 0.0319, 0.0000,
0.0000, 0.4193],
[0.1271, 0.0000, 0.0000, 0.1173, 0.1402, 0.1011, 0.4458, 0.1926, 0.0000,
0.0000, 0.1389, 0.2500, 0.0000, 0.0000, 0.0325, 0.0000, 0.1259, 0.0000,
0.0000, 0.1030],
[0.2604, 0.0000, 0.0000, 0.0455, 0.2370, 0.1956, 0.2883, 0.0000, 0.0000,
0.0000, 0.2908, 0.3343, 0.0000, 0.0000, 0.4081, 0.0000, 0.0000, 0.0000,
0.0000, 0.1872]], grad_fn=<ReluBackward0>)
nn.Sequential
nn.Sequential is an ordered container of modules. The data is passed through all the modules in the same order as defined. You can use sequential containers to put together a quick network like seq_modules
.
nn.Sequential是一个有序的模块容器 里面的各层是按定义顺序排列的 可以用sequential快速构建一个网络 更一般的方式是继承nn.Module
class NruralNetwork(nn.Module): def \__init\__(self): ... def forward(self,x): ...
#sequential ()里的都是nn子类
seq_modules=nn.Sequential(
flatten, #flatten也视为一层 flatten=nn.Flatten()
layer1, #layer1=nn.Linear(in_features=28*28,out_features=20)
nn.ReLU(),
nn.Linear(20,10)
)
input_image=torch.rand(3,28,28)
#forward
logits=seq_modules(input_image)
nn.Softmax
The last linear layer of the neural network returns logits - raw values in [-infty, infty] - which are passed to the nn.Softmax module. The logits are scaled to values [0, 1] representing the model’s predicted probabilities for each class. dim
parameter indicates the dimension along which the values must sum to 1.
softmax dim参数表示哪个维度上进行归一化 不bb
softmax=nn.Softmax(dim=1) #dim=1表示列归一
pred_prob=softmax(logits)
print(pred_prob)
tensor([[0.0953, 0.0922, 0.1320, 0.0826, 0.0976, 0.1100, 0.1049, 0.0827, 0.1014,
0.1014],
[0.1094, 0.0903, 0.1260, 0.0826, 0.0950, 0.0930, 0.0946, 0.0857, 0.1062,
0.1170],
[0.0979, 0.0936, 0.1332, 0.0793, 0.0941, 0.1042, 0.1033, 0.0775, 0.1059,
0.1110]], grad_fn=<SoftmaxBackward>)
Model Parameters
Many layers inside a neural network are parameterized, i.e. have associated weights and biases that are optimized during training. Subclassing nn.Module automatically tracks all fields defined inside your model object, and makes all parameters accessible using your model’s parameters()
or named_parameters()
methods.
In this example, we iterate over each parameter, and print its size and a preview of its values.
神经网络中的很多层都被参数化 即具有在训练期间优化的相关权重和偏差
nn.Module自动跟踪模型对象中所有字段 使用 poarameters() 或者named_parameters() 可以访问
In this example, we iterate over each parameter, and print its size and a preview of its values.
#模型结构
print("Model structure: ",model,'\n\n')
#依次输出每层的名字以及其中的参数信息
for name,param in model.named_parameters():
print(f"Layer:{name} | Size:{param.size()} | Values:{param[:2]}\n")
Model structure: NeuralNetwork(
(flatten): Flatten(start_dim=1, end_dim=-1)
(linear_relu_stack): Sequential(
(0): Linear(in_features=784, out_features=512, bias=True)
(1): ReLU()
(2): Linear(in_features=512, out_features=512, bias=True)
(3): ReLU()
(4): Linear(in_features=512, out_features=18, bias=True)
(5): ReLU()
)
)
Layer:linear_relu_stack.0.weight | Size:torch.Size([512, 784]) | Values:tensor([[-0.0259, -0.0041, 0.0299, ..., 0.0338, 0.0087, 0.0309],
[ 0.0182, -0.0128, -0.0069, ..., 0.0219, -0.0320, -0.0054]],
device='cuda:0', grad_fn=<SliceBackward>)
Layer:linear_relu_stack.0.bias | Size:torch.Size([512]) | Values:tensor([-0.0240, 0.0098], device='cuda:0', grad_fn=<SliceBackward>)
Layer:linear_relu_stack.2.weight | Size:torch.Size([512, 512]) | Values:tensor([[ 0.0301, 0.0419, 0.0098, ..., -0.0432, -0.0189, -0.0029],
[ 0.0058, -0.0219, -0.0248, ..., -0.0194, 0.0007, -0.0389]],
device='cuda:0', grad_fn=<SliceBackward>)
Layer:linear_relu_stack.2.bias | Size:torch.Size([512]) | Values:tensor([0.0291, 0.0154], device='cuda:0', grad_fn=<SliceBackward>)
Layer:linear_relu_stack.4.weight | Size:torch.Size([18, 512]) | Values:tensor([[-0.0376, 0.0047, 0.0374, ..., 0.0339, -0.0352, 0.0246],
[-0.0196, 0.0343, -0.0233, ..., 0.0016, -0.0200, -0.0071]],
device='cuda:0', grad_fn=<SliceBackward>)
Layer:linear_relu_stack.4.bias | Size:torch.Size([18]) | Values:tensor([0.0419, 0.0282], device='cuda:0', grad_fn=<SliceBackward>)