1.前言
欢迎来到Day5,今天无神带大家阅读Pytorch官方文档,使用pytorch构建前向神经网络。
2.前向神经网络
和Resnet这种有后层向前层输入信息的非前向神经网络,前向神经网络没有这种直接的信息交互,只有向前一步步处理输入的向量值。
这种类型的网络很简单,从输入一层层向前处理,产生输出值,如图所示一个分类字母图片的前向神经网络。
3.经典网络训练的过程
A typical training procedure for a neural network is as follows:
-
Define the neural network that has some learnable parameters (or weights)
-
Iterate over a dataset of inputs
-
Process input through the network
-
Compute the loss (how far is the output from being correct)
-
Propagate gradients back into the network’s parameters
-
Update the weights of the network, typically using a simple update rule:
weight = weight - learning_rate * gradient
1.定义有可学习的参数的模型
学习即为从历史数据中获取经验改善模型的参数,没有可以在训练(学习)过程中调整的参数,可以认为这个模型是朽木,怎么训练也是无用之功。
2.在模型上迭代数据
可以认为是定义数据预处理
3.将数据喂给模型
4.计算损失
损失即为模型的输出值和预测值的差别,根据损失,我们衡量模型在这组数据集上表现出的性能,并根据之计算梯度,以便更新参数。
5.更新模型的参数
4.具体例子(上代码环节)
4.1定义网络
import torch
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# 1 input image channel, 6 output channels, 5x5 square convolution
# kernel
self.conv1 = nn.Conv2d(1, 6, 5)
self.conv2 = nn.Conv2d(6, 16, 5)
# an affine operation: y = Wx + b
self.fc1 = nn.Linear(16 * 5 * 5, 120) # 5*5 from image dimension
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
# Max pooling over a (2, 2) window
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
# If the size is a square, you can specify with a single number
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
x = torch.flatten(x, 1) # flatten all dimensions except the batch dimension
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
print(net)
打印的测试结果为:Net(
(conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
(conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
(fc1): Linear(in_features=400, out_features=120, bias=True)
(fc2): Linear(in_features=120, out_features=84, bias=True)
(fc3): Linear(in_features=84, out_features=10, bias=True)
)
params = list(net.parameters())
print(len(params))
print(params[0].size()) # conv1's .weight
提取模型对象的参数将之打印查看:
tensor([[ 0.1453, -0.0590, -0.0065, 0.0905, 0.0146, -0.0805, -0.1211, -0.0394,
-0.0181, -0.0136]], grad_fn=<AddmmBackward0>)
将一个32*32的样本输入模型中,输出一个十分类的结果,哪一维最大,即为预测结果
input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)
tensor([[ 0.1453, -0.0590, -0.0065, 0.0905, 0.0146, -0.0805, -0.1211, -0.0394,
-0.0181, -0.0136]], grad_fn=<AddmmBackward0>)
用MSEloss
output = net(input)
target = torch.randn(10) # a dummy target, for example
target = target.view(1, -1) # make it the same shape as output
criterion = nn.MSELoss()
loss = criterion(output, target)
print(loss)
清空梯度缓存并且在一次反向传播后查看存储的梯度:
net.zero_grad() # zeroes the gradient buffers of all parameters
print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)
loss.backward()
print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)
测试结果:conv1.bias.grad before backward
None
conv1.bias.grad after backward
tensor([ 0.0081, -0.0080, -0.0039, 0.0150, 0.0003, -0.0105])
如果想用不同的优化器规则such as SGD, Nesterov-SGD, Adam, RMSProp, etc.去更新参数,定义optimer(优化器):
import torch.optim as optim
# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)
# in your training loop:
optimizer.zero_grad() # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step() # Does the update