在本博客中,只摘录了一部分内容,删去了与前面的博客重复的部分。
1.PyTorch: Tensors 与 autograd
A fully-connected ReLU network with one hidden layer and no biases, trained to
predict y from x by minimizing squared Euclidean distance.
This implementation computes the forward pass using operations on PyTorch
Tensors, and uses PyTorch autograd to compute gradients.
A PyTorch Tensor represents a node in a computational graph. If x
is a
Tensor that has x.requires_grad=True
then x.grad
is another Tensor
holding the gradient of x
with respect to some scalar value.
import torch
import matplotlib.pyplot as plt
import torch.optim as optim
dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold input and outputs.
# 将requires_grad=False设置为False表明在反向传播过程中不需要计算这些Tensors的梯度
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)
# Create random Tensors for weights.
# 将requires_grad=True设置为True表明在反向传播过程中计算这些Tensors的梯度
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)
learning_rate = 1e-6
optimizer = optim.SGD([{'params': w1},
{'params': w2}],
lr = learning_rate)
#创建图并命名
plt.figure('Loss')
ax = plt.gca()
#设置x轴、y轴名称
ax.set_xlabel('iter')
ax.set_ylabel('loss')
iter_plot = []
loss_plot = []
for t in range(500):
# Forward pass: compute predicted y using operations on Tensors; these
# are exactly the same operations we used to compute the forward pass using
# Tensors, but we do not need to keep references to intermediate values since
# we are not implementing the backward pass by hand.
y_pred = x.mm(w1).clamp(min=0).mm(w2)
# Compute and print loss using operations on Tensors.
# Now loss is a Tensor of shape (1,)
# loss.item() gets the a scalar value held in the loss.
loss = (y_pred - y).pow(2).sum()
iter_plot.append(t)
loss_plot.append(loss.item())
# Use autograd to compute the backward pass. This call will compute the
# gradient of loss with respect to all Tensors with requires_grad=True.
# After this call w1.grad and w2.grad will be Tensors holding the gradient
# of the loss with respect to w1 and w2 respectively.
loss.backward()
# Manually update weights using gradient descent. Wrap in torch.no_grad()
# because weights have requires_grad=True, but we don't need to track this
# in autograd.
# 在使用autograd过程中不需要对各变量的梯度进行追踪,因而使用 with torch.no_grad()
# An alternative way is to operate on weight.data and weight.grad.data.
# Recall that tensor.data gives a tensor that shares the storage with
# tensor, but doesn't track history.
# You can also use torch.optim.SGD to achieve this.
# with torch.no_grad():
# w1 -= learning_rate * w1.grad
# w2 -= learning_rate * w2.grad
# # Manually zero the gradients after updating weights
# w1.grad.zero_()
# w2.grad.zero_()
optimizer.step() # Does the update
optimizer.zero_grad()# 每进行一次更新都要对梯度清零
ax.plot(iter_plot, loss_plot, color='r', linewidth=1, alpha=0.6)
plt.show()
2.PyTorch: 定义新的autograd函数
A fully-connected ReLU network with one hidden layer and no biases, trained to
predict y from x by minimizing squared Euclidean distance.
This implementation computes the forward pass using operations on PyTorch
Variables, and uses PyTorch autograd to compute gradients.
在本文中,我们使用自定义autograd函数实现ReLU函数。
实现自定义autograd函数的方法:
自定义的autograd函数有两个方法:forward函数使用输入Tensors计算输出Tensors;backward函数接收标量值关于输出Tensors的梯度,并计算同样的标量关于输入Tensors的梯度。
import torch
import matplotlib.pyplot as plt
class MyReLU(torch.autograd.Function):
"""
为了实现自定义autograd函数,我们需要继承torch.autograd.Function类并重载forward、backward方法,这两个方法对Tensors进行操作。
"""
@staticmethod
def forward(ctx, input):
"""
在forward方法中,我们接收input Tensor,并返回output Tensor。上下文对象
ctx对用于backward计算的相关信息进行存储。我们可以使用ctx.save_for_backward
方法对任意对象(用于backward操作)进行存储。
"""
ctx.save_for_backward(input)# 将input存入ctx中
return input.clamp(min=0)# 返回输出张量
@staticmethod
def backward(ctx, grad_output):
"""
在backward函数中,我们接收到一个包含loss关于输入tensor的张量,我们需要在此梯度的基础上
计算loss关于input张量的梯度(链式求导法则)。
"""
input, = ctx.saved_tensors
grad_input = grad_output.clone()
grad_input[input < 0] = 0
return grad_input
dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold input and outputs.
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)
# Create random Tensors for weights.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)
#创建图并命名
plt.figure('Loss')
ax = plt.gca()
#设置x轴、y轴名称
ax.set_xlabel('iter')
ax.set_ylabel('loss')
iter_plot = []
loss_plot = []
learning_rate = 1e-6
for t in range(500):
# 为了使用自定义函数,需要调用Function.apply方法声明实例。
relu = MyReLU.apply
# Forward pass: compute predicted y using operations; we compute
# ReLU using our custom autograd operation.
y_pred = relu(x.mm(w1)).mm(w2)
# Compute and print loss
loss = (y_pred - y).pow(2).sum()
iter_plot.append(t)
loss_plot.append(loss.item())
# Use autograd to compute the backward pass.
loss.backward()
# Update weights using gradient descent
with torch.no_grad():
w1 -= learning_rate * w1.grad
w2 -= learning_rate * w2.grad
# Manually zero the gradients after updating weights
w1.grad.zero_()
w2.grad.zero_()
ax.plot(iter_plot, loss_plot, color='r', linewidth=1, alpha=0.6)
plt.show()
3.PyTorch模块: nn
A fully-connected ReLU network with one hidden layer, trained to predict y from x
by minimizing squared Euclidean distance.
本文中使用Pytorch中的nn包构建网络结构。使用Pytorch的autograd很容易对计算图进行计算、
并计算梯度。但对于更为复杂的网络结构来说,Pytorch autograd过于底层。使用nn包可以克
服这一缺点。在nn包中,定义了一系列的Modules,可以将Modules视为一层网络层,给定输入
便可给出输出,同时包含一些可训练参数。
import torch
import matplotlib.pyplot as plt
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)
# 使用nn包将model定义为一系列的layers。nn.Sequential是一个Module,在该Module中
# 包含有其他Modules,nn.Sequential将这些Modules拼接为一个序列,使用该序列产生相
# 应的输出。每一个Linear Module使用线性函数在input的基础上计算得出output,并保
# 存权重和偏置张量。
model = torch.nn.Sequential(
torch.nn.Linear(D_in, H),
torch.nn.ReLU(),
torch.nn.Linear(H, D_out),
)
# nn包中包含了多种经典损失函数实现,这里我们使用最小二乘法作为损失函数
loss_fn = torch.nn.MSELoss(reduction='sum')
learning_rate = 1e-4
#创建图并命名
plt.figure('Loss')
ax = plt.gca()
#设置x轴、y轴名称
ax.set_xlabel('iter')
ax.set_ylabel('loss')
iter_plot = []
loss_plot = []
for t in range(500):
# Forward pass: compute predicted y by passing x to the model. Module objects
# override the __call__ operator so you can call them like functions. When
# doing so you pass a Tensor of input data to the Module and it produces
# a Tensor of output data.
# 前向传播
y_pred = model(x)
# Compute and print loss. We pass Tensors containing the predicted and true
# values of y, and the loss function returns a Tensor containing the
# loss.
# 计算损失
loss = loss_fn(y_pred, y)
iter_plot.append(t)
loss_plot.append(loss.item())
# Zero the gradients before running the backward pass.
# 在进行反向传播,对梯度进行清零
model.zero_grad()
# Backward pass: compute gradient of the loss with respect to all the learnable
# parameters of the model. Internally, the parameters of each Module are stored
# in Tensors with requires_grad=True, so this call will compute gradients for
# all learnable parameters in the model.
# 反向传播
loss.backward()
# Update the weights using gradient descent. Each parameter is a Tensor, so
# we can access its gradients like we did before.
# 参数更新
with torch.no_grad():
for param in model.parameters():
param -= learning_rate * param.grad
ax.plot(iter_plot, loss_plot, color='r', linewidth=1, alpha=0.6)
plt.show()
4.PyTorch: 流控制+权值共享
在本文中,我们对pytorch的动态图特性进行展示,我们将实现一个非常奇怪的模型:一个全连接的ReLU网络,在该网络的前向传播过程中,将随机产生1-4范围内的随机数,依照随机数的数值决定隐藏层的个数,进而将同样的权重重复使用多次以构成最内层的隐藏层。
import random
import torch
import matplotlib.pyplot as plt
class DynamicNet(torch.nn.Module):
def __init__(self, D_in, H, D_out):
"""
构建三个nn.Linear实体以用于前向传播过程。
"""
super(DynamicNet, self).__init__()
self.input_linear = torch.nn.Linear(D_in, H)
self.middle_linear = torch.nn.Linear(H, H)
self.output_linear = torch.nn.Linear(H, D_out)
def forward(self, x):
"""
For the forward pass of the model, we randomly choose either 0, 1, 2, or 3
and reuse the middle_linear Module that many times to compute hidden layer
representations.
因为在进行每一次前向传播都会创建一幅动态计算图,所以我们可以使用一般的python
流控制技术,例如:loops、conditional statements等,来定义模型的前向传播过程。
在定义一幅计算图时,将同样的模块重复使用多次是可行的。相对于Lua Torch中的模块
只能被使用一次,这一特性是一个很大的提升。
"""
h_relu = self.input_linear(x).clamp(min=0)
for _ in range(random.randint(0, 3)):
h_relu = self.middle_linear(h_relu).clamp(min=0)
y_pred = self.output_linear(h_relu)
return y_pred
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)
# Construct our model by instantiating the class defined above
model = DynamicNet(D_in, H, D_out)
#创建图并命名
plt.figure('Loss')
ax = plt.gca()
#设置x轴、y轴名称
ax.set_xlabel('iter')
ax.set_ylabel('loss')
iter_plot = []
loss_plot = []
# Construct our loss function and an Optimizer. Training this strange model with
# vanilla stochastic gradient descent is tough, so we use momentum
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4, momentum=0.9)
for t in range(500):
# Forward pass: Compute predicted y by passing x to the model
y_pred = model(x)
# Compute and print loss
loss = criterion(y_pred, y)
iter_plot.append(t)
loss_plot.append(loss.item())
# Zero gradients, perform a backward pass, and update the weights.
optimizer.zero_grad()
loss.backward()
optimizer.step()
ax.plot(iter_plot, loss_plot, color='r', linewidth=1, alpha=0.6)
plt.show()