PyTorch 1.0 系列学习教程（2）： autograd

最新推荐文章于 2024-08-19 03:25:24 发布

gukedream

最新推荐文章于 2024-08-19 03:25:24 发布

阅读量4.4k

点赞数 1

分类专栏： pytorch 文章标签： pytorch:learning pytorch with examp

本文链接：https://blog.csdn.net/gukedream/article/details/85942192

版权

pytorch 专栏收录该内容

15 篇文章 4 订阅

订阅专栏

PyTorch 1.0：autograd

AUTOGRAD

AUTOGRAD

PyTorch:Tensors and autograd

上一篇的examples中，我们不得不手工实现网络的前向和反向传播. 手工实现一个2层网络的反向传播并不是一个大问题，但如果面对的是一个大型复杂网络，这将非常麻烦.
幸好，我们可以使用自动微分技术（automatic differentiation）去自动计算神经网络的反向传播. 事实上，PyTorch 的autograd包中已经提供了这个功能. 在使用autograd时，你的网络的前向传导会定义一个计算图（computational graph）；图的节点（nodes）用Tensors代表，边（edges）用函数代表，其从输入张量产生输出张量. 通过这张图进行反向传播（Backpropagating)，使我们可以轻松计算梯度.
上面的描述让人觉得这个技术很复杂，其实在实践中，很容易使用. 在计算图中每个张量代表了一个节点. 如果x是一个设置了x.requires_grad=True的张量，那么x.grad就是包含x的梯度的另一个张量.
在这里我们使用PyTorch张量和autograd实现两层网络；现在我们不再需要手动实现反向传导网络.

# -*- coding: utf-8 -*-
import torch

dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold input and outputs.
# Setting requires_grad=False indicates that we do not need to compute gradients
# with respect to these Tensors during the backward pass.
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

# Create random Tensors for weights.
# Setting requires_grad=True indicates that we want to compute gradients with
# respect to these Tensors during the backward pass.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

learning_rate = 1e-6
for t in range(500):
    # Forward pass: compute predicted y using operations on Tensors; these
    # are exactly the same operations we used to compute the forward pass using
    # Tensors, but we do not need to keep references to intermediate values since
    # we are not implementing the backward pass by hand.
    y_pred = x.mm(w1).clamp(min=0).mm(w2)

    # Compute and print loss using operations on Tensors.
    # Now loss is a Tensor of shape (1,)
    # loss.item() gets the a scalar value held in the loss.
    loss = (y_pred - y).pow(2).sum()
    print(t, loss.item())

    # Use autograd to compute the backward pass. This call will compute the
    # gradient of loss with respect to all Tensors with requires_grad=True.
    # After this call w1.grad and w2.grad will be Tensors holding the gradient
    # of the loss with respect to w1 and w2 respectively.
    loss.backward()

    # Manually update weights using gradient descent. Wrap in torch.no_grad()
    # because weights have requires_grad=True, but we don't need to track this
    # in autograd.
    # An alternative way is to operate on weight.data and weight.grad.data.
    # Recall that tensor.data gives a tensor that shares the storage with
    # tensor, but doesn't track history.
    # You can also use torch.optim.SGD to achieve this.
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad

        # Manually zero the gradients after updating weights
        w1.grad.zero_()
        w2.grad.zero_()

PyTorch:Defining new autograd functions

在后台，每个原始autograd operator事实上包含两个操作张量的函数. 前向函数从输入张量计算得到输出张量，反向函数接收特定标量值的输出张量的梯度，同时对同一标量值计算输入张量的梯度.
在PyTorch中，我们可以很容易的通过定义torch.autograd.Function的一个子类定义我们自己的autograd operator，实现forward和backward函数. 随后，我们便可以通过构造和调用一个实例像调用一个函数一样，使用新的autograd operator.
在这个例子中，我们定义自己的自定义autograd函数执行ReLU非线性函数,并使用它来实现我们的两层网络.

# -*- coding: utf-8 -*-
import torch


class MyReLU(torch.autograd.Function):
    """
    We can implement our own custom autograd Functions by subclassing
    torch.autograd.Function and implementing the forward and backward passes
    which operate on Tensors.
    """

    @staticmethod
    def forward(ctx, input):
        """
        In the forward pass we receive a Tensor containing the input and return
        a Tensor containing the output. ctx is a context object that can be used
        to stash information for backward computation. You can cache arbitrary
        objects for use in the backward pass using the ctx.save_for_backward method.
        """
        ctx.save_for_backward(input)
        return input.clamp(min=0)

    @staticmethod
    def backward(ctx, grad_output):
        """
        In the backward pass we receive a Tensor containing the gradient of the loss
        with respect to the output, and we need to compute the gradient of the loss
        with respect to the input.
        """
        input, = ctx.saved_tensors
        grad_input = grad_output.clone()
        grad_input[input < 0] = 0
        return grad_input


dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold input and outputs.
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

# Create random Tensors for weights.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

learning_rate = 1e-6
for t in range(500):
    # To apply our Function, we use Function.apply method. We alias this as 'relu'.
    relu = MyReLU.apply

    # Forward pass: compute predicted y using operations; we compute
    # ReLU using our custom autograd operation.
    y_pred = relu(x.mm(w1)).mm(w2)

    # Compute and print loss
    loss = (y_pred - y).pow(2).sum()
    print(t, loss.item())

    # Use autograd to compute the backward pass.
    loss.backward()

    # Update weights using gradient descent
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad

        # Manually zero the gradients after updating weights
        w1.grad.zero_()
        w2.grad.zero_()

TensorFlow:Static Graphs

PyTorch autograd看起来很像TensorFlow:在两个框架,我们定义一个计算图,并使用自动微分计算梯度。两者之间最大的区别就是TensorFlow计算图是静态的和PyTorch使用动态计算图.
在TensorFlow中，我们定义一次计算图，然后一遍又一遍的执行这张相同的图，可能每次喂给这张图的数据是不同的. 在PyTorch中，每个前向传导定义一个新的计算图.
静态图很好,因为你可以预先优化图；例如框架可能决定融合一些图操作以提高效率；或想出一个策略，将图分配到许多gpu或许多机器. 如果你重用相同的图，那么这个潜在的昂贵的前期优化可以得到平摊由于相同的图可以多次重复运行.
另一方面，静态和动态图的不同在于控制流（control flow）. 对于一些模型来说，我们会希望为不同的数据点执行不同的计算；例如，递归网络可能会为每个数据点展开不同数量的时间步长；这个展开可以通过一个循环来实现. 若用静态图，这个循环构造需要成为图的一部分；为这个原因TensorFlow提供操作如tf.scan将该循环嵌入计算图.使用动态图，这个情况就简单了：由于我们是在运行时为每个示例创建图，我们可以使用常规的控制流命令去为每个不同的输入执行计算.
为了与上面的PyTorch autograd例子做对比，这里我们使用TensorFlow实现一个简单的两层网.

# -*- coding: utf-8 -*-
import tensorflow as tf
import numpy as np

# First we set up the computational graph:

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create placeholders for the input and target data; these will be filled
# with real data when we execute the graph.
x = tf.placeholder(tf.float32, shape=(None, D_in))
y = tf.placeholder(tf.float32, shape=(None, D_out))

# Create Variables for the weights and initialize them with random data.
# A TensorFlow Variable persists its value across executions of the graph.
w1 = tf.Variable(tf.random_normal((D_in, H)))
w2 = tf.Variable(tf.random_normal((H, D_out)))

# Forward pass: Compute the predicted y using operations on TensorFlow Tensors.
# Note that this code does not actually perform any numeric operations; it
# merely sets up the computational graph that we will later execute.
h = tf.matmul(x, w1)
h_relu = tf.maximum(h, tf.zeros(1))
y_pred = tf.matmul(h_relu, w2)

# Compute loss using operations on TensorFlow Tensors
loss = tf.reduce_sum((y - y_pred) ** 2.0)

# Compute gradient of the loss with respect to w1 and w2.
grad_w1, grad_w2 = tf.gradients(loss, [w1, w2])

# Update the weights using gradient descent. To actually update the weights
# we need to evaluate new_w1 and new_w2 when executing the graph. Note that
# in TensorFlow the the act of updating the value of the weights is part of
# the computational graph; in PyTorch this happens outside the computational
# graph.
learning_rate = 1e-6
new_w1 = w1.assign(w1 - learning_rate * grad_w1)
new_w2 = w2.assign(w2 - learning_rate * grad_w2)

# Now we have built our computational graph, so we enter a TensorFlow session to
# actually execute the graph.
with tf.Session() as sess:
    # Run the graph once to initialize the Variables w1 and w2.
    sess.run(tf.global_variables_initializer())

    # Create numpy arrays holding the actual data for the inputs x and targets
    # y
    x_value = np.random.randn(N, D_in)
    y_value = np.random.randn(N, D_out)
    for _ in range(500):
        # Execute the graph many times. Each time it executes we want to bind
        # x_value to x and y_value to y, specified with the feed_dict argument.
        # Each time we execute the graph we want to compute the values for loss,
        # new_w1, and new_w2; the values of these Tensors are returned as numpy
        # arrays.
        loss_value, _, _ = sess.run([loss, new_w1, new_w2],
                                    feed_dict={x: x_value, y: y_value})
        print(loss_value)