Pytorch 0.4.0入门

最新推荐文章于 2024-04-30 17:23:00 发布

云南省高校数据化运营管理工程研究中心

最新推荐文章于 2024-04-30 17:23:00 发布

阅读量599

点赞数

分类专栏：王玥文章标签： Pytorch 深度学习入门教程

王玥专栏收录该内容

5 篇文章 0 订阅

订阅专栏

Pytorch 0.4.0 入门

最近在学习Pytorch深度学习库，今天和大家分享一下jcjohnson的Pytorch sample 学习心得以及在运行sample代码时自己对各行代码打的注释。
Windows下安装下载Pytorch戳这里
Pytorch的核心特性有两个：n维张量器（类似于numpy，但可以在GPU上运行）；建立及训练神经网络的自动微分。
本文将使用一个完全连接的relu网络作为运行示例。该网络将有一个单一的隐藏层，通过最小化欧氏距离的网络输出和真正的输出。

Pytorch 0.4.0 入门

Warm-up: numpy

在介绍PyTorch之前，我们首先使用numpy实现网络。
Numpy提供了一个n维数组对象，以及许多用于操作这些数组的函数。我们首先地使用numpy来适应两层网络中的随机数据，手动实现前后向传播神经网络：
戳这里了解什么是前后向传播算法

import numpy as np

# 注意参数的设置
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# 创建随机输入和输出数据
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)

# 随机初始化权值
w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)

#学习率为0.000001，即参数达到最优值过程的速度快慢
learning_rate = 1e-6
for t in range(500):
    #向前传播 ：计算y的预测值
    # Forward pass: compute predicted y
    h = x.dot(w1) # x乘以权值w
    h_relu = np.maximum(h, 0)
    y_pred = h_relu.dot(w2)

    # Compute and print loss
    loss = np.square(y_pred - y).sum()

    # Backprop to compute gradients of w1 and w2 with respect to loss
    # 计算反向传播损失的梯度
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.T.dot(grad_y_pred)
    grad_h_relu = grad_y_pred.dot(w2.T)
    grad_h = grad_h_relu.copy()
    grad_h[h < 0] = 0
    grad_w1 = x.T.dot(grad_h)

    # Update weights
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

备注：numpy中的np.max 与 np.maximum区别
1. 参数
首先比较二者的参数部分：
• np.max：(a, axis=None, out=None, keepdims=False)
• 求序列的最值
• 最少接收一个参数
• axis：默认为列向（也即 axis=0），axis = 1 时为行方向的最值；
• np.maximum：(X, Y, out=None)
• X 与 Y 逐位比较取其大者；
• 最少接收两个参数
2. 使用上

>>  np.max([-2, -1, 0, 1, 2])
    2

>>  np.maximum([-2, -1, 0, 1, 2], 0)
    array([0, 0, 0, 1, 2])

PyTorch: Tensors张量

Numpy是一个很好的框架，但它不能使用GPU加速其数值计算，这对现代深度学习来说是不够的。
Tensors张量是最基本的PyTorch概念。Tensors是n维数组，PyTorch提供了许多在这些Tensors上操作的函数。使用numpy执行的任何计算都可以使用PyTorch张量来完成；可以将它们看作是科学计算的通用工具。

并且，与Numpy不同的是，PyTorch张量可以利用GPU加速它们的数值计算，即使用device函数将张量放置在GPU上。

在这里，我们使用PyTorch张量对随机数据进行两层网络拟合.与上面的Numpy示例一样，使用PyTorch Tensors上的操作手动实现前后向传播神经网络：

import torch

device = torch.device('cpu')
# device = torch.device('cuda')
# Uncomment this to run on GPU 这一行即为在GPU上运行

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random input and output data
x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)

# Randomly initialize weights随机初始化权值
w1 = torch.randn(D_in, H, device=device)
w2 = torch.randn(H, D_out, device=device)

learning_rate = 1e-6
for t in range(500):
  # Forward pass: compute predicted y
  h = x.mm(w1) #矩阵相乘
  h_relu = h.clamp(min=0) #将input中的元素限制在[min,max]范围内并返回一个Tensor张量
  y_pred = h_relu.mm(w2)

  # Compute and print loss; loss is a scalar, and is stored in a PyTorch Tensor
  # of shape (); we can get its value as a Python number with loss.item().
  loss = (y_pred - y).pow(2).sum()
  print(t, loss.item())

  # Backprop to compute gradients of w1 and w2 with respect to loss
  grad_y_pred = 2.0 * (y_pred - y)
  grad_w2 = h_relu.t().mm(grad_y_pred)
  grad_h_relu = grad_y_pred.mm(w2.t())
  grad_h = grad_h_relu.clone()
  grad_h[h < 0] = 0
  grad_w1 = x.t().mm(grad_h)

  # Update weights using gradient descent
  w1 -= learning_rate * grad_w1
  w2 -= learning_rate * grad_w2

PyTorch: Autograd自动微分

在上面的例子中，我们必须手动实现我们的神经网络的前、后向传播。对于一个小的两层网络来说，手动实现反向传递并不是什么大问题，但是对于大型复杂网络来说，这样是不行的。

在Pytorch的Autograd模块中，我们可以用自动微分实现神经网络后传计算的自动化。使用Autograd时，网络的前向传递将定义计算图图中的节点为张量，定义以输入张量产生输出张量的函数为边。然后，通过这个图进行反向传播，轻松地计算渐变。

如果我们想要计算关于某个张量的梯度，就在构造张量的时候设置requires_grad=True即可。对该张量的任何PyTorch操作都将会构造一个计算图，通过该图可以执行反向传播。如果x是张量且设置了requires_grad=True，在反向传播之后x.grad将是某个标量值，它能保存张量x的梯度。

有时，当对张量执行某些操作时，可能会希望PyTorch不要生成计算图。例如，在训练神经网络时，我们通常不希望通过权值更新步骤进行反向传播。在这种情况下，我们可以使用torch.no_grad()语句，以防止计算图的构造。

在这里，我们使用PyTorch Tensors和autograd来实现我们的两层网络，它能自动实现网络的反向传播：

import torch

device = torch.device('cpu')
# device = torch.device('cuda') # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold input and outputs
x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)

#设置为True意味着在向后传播过程中要计算梯度

w1 = torch.randn(D_in, H, device=device, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, requires_grad=True)

learning_rate = 1e-6
for t in range(500):
    # Forward pass: compute predicted y using operations on Tensors. Since w1 and
    # w2 have requires_grad=True, operations involving these Tensors will cause
    # PyTorch to build a computational graph, allowing automatic computation of
    # gradients. Since we are no longer implementing the backward pass by hand 
    # 自动计算梯度则意味着不需要保留中间值
    y_pred = x.mm(w1).clamp(min=0).mm(w2)
    # clamp：将input中的元素限制在[min,max]范围内并返回一个Tensor张量

    # Compute and print loss. Loss is a Tensor of shape (), and loss.item()
    # is a Python number giving its value.
    loss = (y_pred - y).pow(2).sum()
    print(t, loss.item())

    # Use autograd to compute the backward pass. This call will compute the
    # gradient of loss with respect to all Tensors with requires_grad=True.
    # After this call w1.grad and w2.grad will be Tensors holding the gradient
    # of the loss with respect to w1 and w2 respectively.
    # 得到了w1和w2的损失梯度的张量w1.grad,w2.grad
    loss.backward()

    # Update weights using gradient descent. For this step we just want to mutate
    # graph for the update steps, so we use the torch.no_grad() context manager
    # to prevent PyTorch from building a computational graph for the updates
    #对w1和w2进行适当修改，torch.no_grad是不建立计算图
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad

        # Manually zero the gradients after running the backward pass
        # 在运行完反向传播算法后手动将梯度调零
        w1.grad.zero_()
        w2.grad.zero_()

PyTorch: Defining new autograd functions定义新的自动梯度函数

在这种情况下，每个本原autograd算子实际上是两个作用于张量的函数。forward函数根据输入张量计算输出张量。backward函数接收输出张量相对于某个标量值的梯度，并计算输入张量相对于该标量值的梯度。

在PyTorch中，我们可以通过定义forward和backward职能构造torch.autograd.Function。我们可以构造一个实例并像函数一样调用新的autograd运算符，传递包含输入数据的张量。

在这个例子中，我们定义了new autograd函数来执行relu非线性计算，并使用它来实现我们的两层网络：

import torch
class MyReLU(torch.autograd.Function):
    """
    We can implement our own custom autograd Functions by subclassing
    torch.autograd.Function and implementing the forward and backward passes
    which operate on Tensors.
    """

    @staticmethod
    def forward(ctx, x):
        """
        在前向传递中，我们接收一个上下文对象和一个包含输出的张量。
        我们必须返回一个包含输出的张量，我们可以使用上下文对象来缓存对象，以便在反向传递中使用。
        """
        ctx.save_for_backward(x)
        return x.clamp(min=0)

    def backward(ctx, grad_output):
        """
       在后向传递中，我们接收上下文对象和一个张量，其中包含与前向传递过程中产生的输出有关的损耗梯度。
        我们可以从上下文对象中检索缓存数据，并且必须计算并返回与转发函数的输入有关的丢失梯度。
        """
        x, = ctx.saved_tensors
        grad_x = grad_output.clone()
        grad_x[x < 0] = 0
        return grad_x


device = torch.device('cpu')
# device = torch.device('cuda') # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold input and output
x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)

# Create random Tensors for weights.
w1 = torch.randn(D_in, H, device=device, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, requires_grad=True)

learning_rate = 1e-6
for t in range(500):
    # Forward pass: compute predicted y using operations on Tensors; we call our
    # 使用MyReLU调用relu函数
    y_pred = MyReLU.apply(x.mm(w1)).mm(w2)

    # Compute and print loss
    loss = (y_pred - y).pow(2).sum()
    print(t, loss.item())

    # Use autograd to compute the backward pass.
    loss.backward()

    with torch.no_grad():
        # Update weights using gradient descent
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad

        # Manually zero the gradients after running the backward pass
        w1.grad.zero_()
        w2.grad.zero_()

PyTorch: nn

在TensorFlow中，类似于Keras, TensorFlow-Slim和 TFLearn这些包都能在原始计算图之上提供更高层次的可学习参数（例如梯度）的抽象，这对于构建神经网络非常有用。

在PyTorch，nn包也有同样的功能。nn包定义了一组大致相当于神经网络层的模块。模块接受输入张量并计算输出张量，也可以包含例如可学习参数的张量这样的内部结果。nn Package还定义了一组有用的损失函数，这些函数通常用于训练神经网络。
在本例中，我们使用nn实现我们两层网络的包：

import torch

device = torch.device('cpu')
# device = torch.device('cuda') # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)

# 使用nn包将我们的模型定义为一个层序列。nn.equential是一个模块，它包含其他模块，
# 并按顺序应用它们来生成它的输出。每个线性模块使用线性函数计算输入输出，并保存内部张量以确定其权重和偏差。
# 在构造模型之后，我们使用.to()方法将其移动到所需的设备。
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
).to(device)

# nn包还包含常用损失函数的定义；在本例中，我们将使用均方误差(MSE)作为损失函数。
loss_fn = torch.nn.MSELoss(size_average=False)

learning_rate = 1e-4
for t in range(500):
    # 向前传递：通过将x传递到模型来计算预测的y。模块对象覆盖__CALL__操作符，
    # 这样就可以像调用函数一样调用它们。当这样做时，您将输入数据的张量传递给模块，
    # 它产生输出数据的张量
    y_pred = model(x)

    # Compute and print loss. We pass Tensors containing the predicted and true
    # values of y, and the loss function returns a Tensor containing the loss.
    loss = loss_fn(y_pred, y)
    print(t, loss.item())

    # Zero the gradients before running the backward pass.
    model.zero_grad()

    # 反向传递：根据模型的所有可学习参数计算损失的梯度。在内部，每个模块的参数都存储在张量中
    # 要求_grad=True，所以这个调用将计算模型中所有可学习参数的梯度。
    loss.backward()

    # Update the weights using gradient descent. Each parameter is a Tensor, so
    # we can access its data and gradients like we did before.
    with torch.no_grad():
        for param in model.parameters():
            param.data -= learning_rate * param.grad

PyTorch: optim

optim PyTorch中的包抽象了优化算法的思想，并给出了常用优化算法（AdaGrad、RMSProp、ADAM等）的实现。在本例中，将使用nn包来定义模型，使用ADAM算法对模型进行优化。

import torch

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs.
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Use the nn package 
# 定义模型和损失函数
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
)
loss_fn = torch.nn.MSELoss(size_average=False)

# 使用OptimPackage来定义一个优化器，它将为我们更新#the模型的权重。这里我们将使用ADAM；
# OptimPackage包含许多其他优化算法。ADAM构造函数的第一个参数告诉优化器应该更新哪些张量。
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(500):
    # Forward pass: compute predicted y by passing x to the model.
    y_pred = model(x)

    # Compute and print loss.
    loss = loss_fn(y_pred, y)
    print(t, loss.item())

    # 在向后传递之前，使用优化器对象使它将要更新的张量的所有梯度为零(这是模型的可学习权重)
    optimizer.zero_grad()

    # Backward pass: compute gradient of the loss with respect to model parameters
    loss.backward()

    # Calling the step function on an Optimizer makes an update to its parameters
    optimizer.step()

PyTorch与TensorFlow的对比

1.在Pytorch中的动态图和TensorFlow中的静态图功能是类似的，但是Pytorch中前向传播过程每次都会定义一个新的计算图，而TF中每次使用的都是同一张图，没有进行更新；
2.Pytorch可以看做一个高级的Numpy，有很多计算语句和Numpy是相同的，这为我们在基础计算方面的学习减少了不少时间，但是TF会有很多新的函数名需要进行记忆，个人认为在代码迁移方面可能Pytorch会省事一点。

云南省高校数据化运营管理工程研究中心

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
Pytorch 0.4.0入门

Pytorch 0.4.0 入门最近在学习Pytorch深度学习库，今天和大家分享一下jcjohnson的Pytorch sample 学习心得以及在运行sample代码时自己对各行代码打的注释。 Pytorch的核心特性有两个：n维张量器（类似于numpy，但可以在GPU上运行）；建立及训练神经网络的自动微分。本文将使用一个完全连接的relu网络作为运行示例。该网络将有一个...
复制链接

扫一扫