Pytorch官方教程学习笔记（6）

最新推荐文章于 2024-06-10 15:30:17 发布

ECODER-MXQ

最新推荐文章于 2024-06-10 15:30:17 发布

阅读量1.3k

点赞数

分类专栏：读书笔记 Pytorch 文章标签： Pytorch

Pytorch 同时被 2 个专栏收录

8 篇文章 0 订阅

订阅专栏

读书笔记

7 篇文章 0 订阅

订阅专栏

文章目录

在本博客中，只摘录了一部分内容，删去了与前面的博客重复的部分。

1.PyTorch: Tensors 与 autograd

A fully-connected ReLU network with one hidden layer and no biases, trained to
predict y from x by minimizing squared Euclidean distance.

This implementation computes the forward pass using operations on PyTorch
Tensors, and uses PyTorch autograd to compute gradients.

A PyTorch Tensor represents a node in a computational graph. If x is a
Tensor that has x.requires_grad=True then x.grad is another Tensor
holding the gradient of x with respect to some scalar value.

import torch
import matplotlib.pyplot as plt
import torch.optim as optim


dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold input and outputs.
# 将requires_grad=False设置为False表明在反向传播过程中不需要计算这些Tensors的梯度
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

# Create random Tensors for weights.
# 将requires_grad=True设置为True表明在反向传播过程中计算这些Tensors的梯度
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

learning_rate = 1e-6

optimizer = optim.SGD([{'params': w1}, 
                      {'params': w2}], 
                      lr = learning_rate)


#创建图并命名
plt.figure('Loss')
ax = plt.gca()
#设置x轴、y轴名称
ax.set_xlabel('iter')
ax.set_ylabel('loss')

iter_plot = []
loss_plot = []
for t in range(500):
    # Forward pass: compute predicted y using operations on Tensors; these
    # are exactly the same operations we used to compute the forward pass using
    # Tensors, but we do not need to keep references to intermediate values since
    # we are not implementing the backward pass by hand.
    y_pred = x.mm(w1).clamp(min=0).mm(w2)

    # Compute and print loss using operations on Tensors.
    # Now loss is a Tensor of shape (1,)
    # loss.item() gets the a scalar value held in the loss.
    loss = (y_pred - y).pow(2).sum()

    iter_plot.append(t)
    loss_plot.append(loss.item())
    
    # Use autograd to compute the backward pass. This call will compute the
    # gradient of loss with respect to all Tensors with requires_grad=True.
    # After this call w1.grad and w2.grad will be Tensors holding the gradient
    # of the loss with respect to w1 and w2 respectively.
    loss.backward()

    # Manually update weights using gradient descent. Wrap in torch.no_grad()
    # because weights have requires_grad=True, but we don't need to track this
    # in autograd.
    # 在使用autograd过程中不需要对各变量的梯度进行追踪，因而使用 with torch.no_grad()    
    # An alternative way is to operate on weight.data and weight.grad.data.
    # Recall that tensor.data gives a tensor that shares the storage with
    # tensor, but doesn't track history.
    # You can also use torch.optim.SGD to achieve this.
#     with torch.no_grad():
#         w1 -= learning_rate * w1.grad
#         w2 -= learning_rate * w2.grad

#         # Manually zero the gradients after updating weights
#         w1.grad.zero_()
#         w2.grad.zero_()

    optimizer.step()    # Does the update
    optimizer.zero_grad()# 每进行一次更新都要对梯度清零
    
ax.plot(iter_plot, loss_plot, color='r', linewidth=1, alpha=0.6)
plt.show()

损失曲线

2.PyTorch: 定义新的autograd函数

A fully-connected ReLU network with one hidden layer and no biases, trained to
predict y from x by minimizing squared Euclidean distance.

This implementation computes the forward pass using operations on PyTorch
Variables, and uses PyTorch autograd to compute gradients.

在本文中，我们使用自定义autograd函数实现ReLU函数。
实现自定义autograd函数的方法：
自定义的autograd函数有两个方法：forward函数使用输入Tensors计算输出Tensors；backward函数接收标量值关于输出Tensors的梯度，并计算同样的标量关于输入Tensors的梯度。

import torch
import matplotlib.pyplot as plt


class MyReLU(torch.autograd.Function):
    """
    为了实现自定义autograd函数，我们需要继承torch.autograd.Function类并重载forward、backward方法，这两个方法对Tensors进行操作。
    """

    @staticmethod
    def forward(ctx, input):
        """
        在forward方法中，我们接收input Tensor，并返回output Tensor。上下文对象
        ctx对用于backward计算的相关信息进行存储。我们可以使用ctx.save_for_backward
        方法对任意对象（用于backward操作）进行存储。
        """
        ctx.save_for_backward(input)# 将input存入ctx中
        return input.clamp(min=0)# 返回输出张量

    @staticmethod
    def backward(ctx, grad_output):
        """
        在backward函数中，我们接收到一个包含loss关于输入tensor的张量，我们需要在此梯度的基础上
        计算loss关于input张量的梯度（链式求导法则）。
        """
        input, = ctx.saved_tensors
        grad_input = grad_output.clone()
        grad_input[input < 0] = 0
        return grad_input


dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold input and outputs.
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

# Create random Tensors for weights.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

#创建图并命名
plt.figure('Loss')
ax = plt.gca()
#设置x轴、y轴名称
ax.set_xlabel('iter')
ax.set_ylabel('loss')

iter_plot = []
loss_plot = []
learning_rate = 1e-6
for t in range(500):
    # 为了使用自定义函数，需要调用Function.apply方法声明实例。     
    relu = MyReLU.apply

    # Forward pass: compute predicted y using operations; we compute
    # ReLU using our custom autograd operation.
    y_pred = relu(x.mm(w1)).mm(w2)

    # Compute and print loss
    loss = (y_pred - y).pow(2).sum()
    
    iter_plot.append(t)
    loss_plot.append(loss.item())
    
    # Use autograd to compute the backward pass.
    loss.backward()

    # Update weights using gradient descent
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad

        # Manually zero the gradients after updating weights
        w1.grad.zero_()
        w2.grad.zero_()
        
ax.plot(iter_plot, loss_plot, color='r', linewidth=1, alpha=0.6)
plt.show()

损失曲线

3.PyTorch模块: nn

A fully-connected ReLU network with one hidden layer, trained to predict y from x
by minimizing squared Euclidean distance.

本文中使用Pytorch中的nn包构建网络结构。使用Pytorch的autograd很容易对计算图进行计算、
并计算梯度。但对于更为复杂的网络结构来说，Pytorch autograd过于底层。使用nn包可以克
服这一缺点。在nn包中，定义了一系列的Modules，可以将Modules视为一层网络层，给定输入
便可给出输出，同时包含一些可训练参数。

import torch
import matplotlib.pyplot as plt


# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# 使用nn包将model定义为一系列的layers。nn.Sequential是一个Module，在该Module中
# 包含有其他Modules，nn.Sequential将这些Modules拼接为一个序列，使用该序列产生相
# 应的输出。每一个Linear Module使用线性函数在input的基础上计算得出output，并保
# 存权重和偏置张量。
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
)
# nn包中包含了多种经典损失函数实现，这里我们使用最小二乘法作为损失函数
loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-4

#创建图并命名
plt.figure('Loss')
ax = plt.gca()
#设置x轴、y轴名称
ax.set_xlabel('iter')
ax.set_ylabel('loss')

iter_plot = []
loss_plot = []
for t in range(500):
    # Forward pass: compute predicted y by passing x to the model. Module objects
    # override the __call__ operator so you can call them like functions. When
    # doing so you pass a Tensor of input data to the Module and it produces
    # a Tensor of output data.
    # 前向传播    
    y_pred = model(x)

    # Compute and print loss. We pass Tensors containing the predicted and true
    # values of y, and the loss function returns a Tensor containing the
    # loss.
    # 计算损失     
    loss = loss_fn(y_pred, y)
    
    iter_plot.append(t)
    loss_plot.append(loss.item())
    
    # Zero the gradients before running the backward pass.
    # 在进行反向传播，对梯度进行清零    
    model.zero_grad()

    # Backward pass: compute gradient of the loss with respect to all the learnable
    # parameters of the model. Internally, the parameters of each Module are stored
    # in Tensors with requires_grad=True, so this call will compute gradients for
    # all learnable parameters in the model.
    # 反向传播    
    loss.backward()

    # Update the weights using gradient descent. Each parameter is a Tensor, so
    # we can access its gradients like we did before.
    # 参数更新    
    with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate * param.grad
            
ax.plot(iter_plot, loss_plot, color='r', linewidth=1, alpha=0.6)
plt.show()

损失曲线

4.PyTorch: 流控制+权值共享

在本文中，我们对pytorch的动态图特性进行展示，我们将实现一个非常奇怪的模型：一个全连接的ReLU网络，在该网络的前向传播过程中，将随机产生1-4范围内的随机数，依照随机数的数值决定隐藏层的个数，进而将同样的权重重复使用多次以构成最内层的隐藏层。

import random
import torch
import matplotlib.pyplot as plt


class DynamicNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        """
        构建三个nn.Linear实体以用于前向传播过程。
        """
        super(DynamicNet, self).__init__()
        self.input_linear = torch.nn.Linear(D_in, H)
        self.middle_linear = torch.nn.Linear(H, H)
        self.output_linear = torch.nn.Linear(H, D_out)

    def forward(self, x):
        """
        For the forward pass of the model, we randomly choose either 0, 1, 2, or 3
        and reuse the middle_linear Module that many times to compute hidden layer
        representations.

        因为在进行每一次前向传播都会创建一幅动态计算图，所以我们可以使用一般的python
        流控制技术，例如：loops、conditional statements等，来定义模型的前向传播过程。
        
        在定义一幅计算图时，将同样的模块重复使用多次是可行的。相对于Lua Torch中的模块
        只能被使用一次，这一特性是一个很大的提升。
        """
        h_relu = self.input_linear(x).clamp(min=0)
        for _ in range(random.randint(0, 3)):
            h_relu = self.middle_linear(h_relu).clamp(min=0)
        y_pred = self.output_linear(h_relu)
        return y_pred


# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Construct our model by instantiating the class defined above
model = DynamicNet(D_in, H, D_out)

#创建图并命名
plt.figure('Loss')
ax = plt.gca()
#设置x轴、y轴名称
ax.set_xlabel('iter')
ax.set_ylabel('loss')

iter_plot = []
loss_plot = []
# Construct our loss function and an Optimizer. Training this strange model with
# vanilla stochastic gradient descent is tough, so we use momentum
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4, momentum=0.9)
for t in range(500):
    # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x)

    # Compute and print loss
    loss = criterion(y_pred, y)
    iter_plot.append(t)
    loss_plot.append(loss.item())
    
    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
ax.plot(iter_plot, loss_plot, color='r', linewidth=1, alpha=0.6)
plt.show()

损失曲线

ECODER-MXQ

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Pytorch官方教程学习笔记（6）

1.PyTorch: Tensors 与 autogradA fully-connected ReLU network with one hidden layer and no biases, trained topredict y from x by minimizing squared Euclidean distance.This implementation computes the...
复制链接

扫一扫