Pytorch(二) —— 神经网络基础知识(tensor概念&Autograd&nn module&optim&Control Flow + Weight Sharing)

本文链接：https://blog.csdn.net/hxxjxw/article/details/106104090

Tensor

Pytorch是由numpy引出来的，Numpy is a generic framework for scientific computing; but it does not know anything about computation graphs, or deep learning

Numpy is a great framework, but it cannot utilize GPUs to accelerate its numerical computations. For modern deep neural networks, GPUs often provide speedups of 50x or greater, so unfortunately numpy won’t be enough for modern deep learning.

Tensor，即“张量”。实际上跟numpy数组、向量、矩阵的格式基本一样。但是是专门针对GPU来设计的，能够利用GPU的性能来加快计算效率。

也就是说，就把Tensor看成是一个多维的numpy array, pytorch的功能和函数多是基于tensor。但是除此之外，Tensor还有一个重要的功能，就是能够keep track of a computational graph and gradients. 也同样可以作为一个科学计算的通用工具来使用

Pytorch的两个main feature，都是以Tensor为核心的

An n-dimensional Tensor, similar to numpy but can run on GPUs
Automatic differentiation for building and training neural networks

默认的tensor类型是FloatTensor

Pytorch版本的反向传播&梯度下降的直观理解程序

反向传播&梯度下降的直观理解程序(numpy)_hxxjxw的博客-CSDN博客

其实默认的tensor类型就是FloatTensor，device就是cpu

dtype = torch.float 这里设定的是torch.float32

如果没写，requires_grad 默认为False

一般开始时采用的learning_rate就是1e-6

import torch
import math
import matplotlib.pyplot as plt
import numpy as np
dtype = torch.float
device = torch.device('cpu')

x = torch.linspace(-math.pi, math.pi, 2000, device = device, dtype = dtype)
#你看像numpy有的np.linspace，torch都有自己的版本
y = torch.sin(x)
# print(x)
# print(y)
# plt.scatter(x,y)
# plt.show()

a = torch.randn((), device=device, dtype=dtype)
b = torch.randn((), device=device, dtype=dtype)
c = torch.randn((), device=device, dtype=dtype)
d = torch.randn((), device=device, dtype=dtype)

learning_rate = 1e-6
for t in range(3000):
    # Forward pass: compute predicted y
    y_pred = a + b * x + c * x ** 2 + d * x ** 3

    loss = (y_pred - y).pow(2).sum().item()

    if t % 100 == 99:
        print(t, loss)

    # Backprop to compute gradients of a, b, c, d with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_a = grad_y_pred.sum()
    grad_b = (grad_y_pred * x).sum()
    grad_c = (grad_y_pred * x ** 2).sum()
    grad_d = (grad_y_pred * x ** 3).sum()

    # Update weights using gradient descent
    a -= learning_rate * grad_a
    b -= learning_rate * grad_b
    c -= learning_rate * grad_c
    d -= learning_rate * grad_d


print(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')

想用GPU的话只需要修改device那一行

device = torch.device("cuda:0")

对于Tensor/多维数组的形状的认识

例如tensor维度是[4,3,5,4],那么从左到右依次是最外层到最里层的维度

import torch
t = torch.rand(4,3,5,4)
#最外层是4层
#再里层是3层
#再里层是5层
#最里层是4层

print(t[0])
print(t[0].shape)
#torch.Size([3, 5, 4])

print(t[0][0])
print(t[0][0].shape)
#torch.Size([5, 4])

print(t[0][0][0])
print(t[0][0][0].shape)
#torch.Size([4])

Autograd

在Tensor上的所有操作，Autograd都能为它们自动提供微分，避免手动计算导数的复杂过程

前面我们手动实现了神经网络的前向和后向传递。对于小型的两层网络，手动实施反向传递并不是什么大问题，但是对于大型的复杂网络来说，可以很快变得非常麻烦。

幸运的是，我们可以使用自动微分来自动计算神经网络中的反向传递。 PyTorch中的autograd package完全提供了此功能。使用autograd时，神经网络的正向传递将定义一个计算图；图中的节点为Tensor，边为从输入张量产生输出张量的函数。然后通过该图进行反向传播，可以轻松计算梯度。

在实践中非常简单。每个Tensor代表计算图中的一个节点。 如果x是具有x.requires_grad = True的张量，则x.grad是另一个Tensor，它持有x相对于某个标量值的梯度。

实际上，每个autograd operator都是在Tensor上运行两个函数：forward()和backward().

我们再使用PyTorch Tensor和autograd来实现我们的sin与三阶多项式的示例；现在我们不再需要手动通过网络实现反向传递
import torch
import math

dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0")  # Uncomment this to run on GPU

# Create Tensors to hold input and outputs.
# By default, requires_grad=False, which indicates that we do not need to
# compute gradients with respect to these Tensors during the backward pass.
x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)

# Create random Tensors for weights. For a third order polynomial, we need
# 4 weights: y = a + b x + c x^2 + d x^3
# Setting requires_grad=True indicates that we want to compute gradients with
# respect to these Tensors during the backward pass.
a = torch.randn((), device=device, dtype=dtype, requires_grad=True)
b = torch.randn((), device=device, dtype=dtype, requires_grad=True)
c = torch.randn((), device=device, dtype=dtype, requires_grad=True)
d = torch.randn((), device=device, dtype=dtype, requires_grad=True)

learning_rate = 1e-6
for t in range(2000):
    # Forward pass: compute predicted y using operations on Tensors.
    y_pred = a + b * x + c * x ** 2 + d * x ** 3

    # Compute and print loss using operations on Tensors.
    # Now loss is a Tensor of shape (1,)
    # loss.item() gets the scalar value held in the loss.
    loss = (y_pred - y).pow(2).sum()
    if t % 100 == 99:
        print(t, loss.item())

    # Use autograd to compute the backward pass. This call will compute the
    # gradient of loss with respect to all Tensors with requires_grad=True.
    # After this call a.grad, b.grad. c.grad and d.grad will be Tensors holding
    # the gradient of the loss with respect to a, b, c, d respectively.
    loss.backward() #现在loss是一个Tensor

    # Manually update weights using gradient descent. Wrap in torch.no_grad()
    # because weights have requires_grad=True, but we don't need to track this
    # in autograd.
    with torch.no_grad():
        a -= learning_rate * a.grad
        b -= learning_rate * b.grad
        c -= learning_rate * c.grad
        d -= learning_rate * d.grad

        # Manually zero the gradients after updating weights
        a.grad = None
        b.grad = None
        c.grad = None
        d.grad = None

print(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')
x是Tensor且requires_grad=True, 那么x.grad就也是个Tensor

调用backward()函数之前都要将梯度清零，因为如果梯度不清零，autograd中会将上次计算的梯度和本次计算的梯度累加

nn module

在Pytorch中，我们使用nn这个库来构建网络，就像使用autograd来构建计算图和计算gradient一样

计算图和autograd是定义复杂运算符并自动求导的非常强大的范例。但是对于大型神经网络，原始的autograd可能会太低级。

在构建神经网络时，我们经常考虑将计算分为几层，其中一些具有可学习的参数，这些参数将在学习过程中进行优化。

在TensorFlow中，像Keras，TensorFlow-Slim和TFLearn这样的软件包在原始计算图上提供了更高级别的抽象，这些抽象对构建神经网络很有用。

在PyTorch中，nn软件包具有相同的目的。 nn包定义了一组模块，这些模块大致等效于神经网络层。模块接收输入张量并计算输出张量，但也可以保持内部状态，例如包含可学习参数的张量。 nn包还定义了一组有用的损失函数，这些函数通常在训练神经网络时使用。

In this example we use the nn package to implement our polynomial model network:
import torch
import math

# Create Tensors to hold input and outputs.
#这些torch.函数生成的就是tensor类型的
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

# For this example, the output y is a linear function of (x, x^2, x^3), so
# we can consider it as a linear layer neural network. Let's prepare the
# tensor (x, x^2, x^3).
p = torch.tensor([1, 2, 3])
xx = x.unsqueeze(-1).pow(p)   #unsqueeze(-1)就是在倒数第一个维度上增加一维
#现在xx是三列，第一列是1次方，第二列是2次方，第三列是3次方
#x.unsqueeze(-1)是([2000, 1])
#xx是([2000,3])

# In the above code, x.unsqueeze(-1) has shape (2000, 1), and p has shape
# (3,), for this case, broadcasting semantics will apply to obtain a tensor
# of shape (2000, 3)

# Use the nn package to define our model as a sequence of layers. nn.Sequential
# is a Module which contains other Modules, and applies them in sequence to
# produce its output. The Linear Module computes output from input using a
# linear function, and holds internal Tensors for its weight and bias.
# The Flatten layer flatens the output of the linear layer to a 1D tensor,
# to match the shape of `y`.
model = torch.nn.Sequential(
    torch.nn.Linear(in_feature=3, out_feature=1),   #torch.nn.Linear(input_feature, output_feature)
    torch.nn.Flatten(start_dim=0, end_dim=1)
)

# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function.
loss_fn = torch.nn.MSELoss(reduction='sum')
#loss_fn是<class 'torch.nn.modules.loss.MSELoss'>

learning_rate = 1e-6
for t in range(2000):

    # Forward pass: compute predicted y by passing x to the model. Module objects
    # override the __call__ operator so you can call them like functions. When
    # doing so you pass a Tensor of input data to the Module and it produces
    # a Tensor of output data.
    y_pred = model(xx)
    # Compute and print loss. We pass Tensors containing the predicted and true
    # values of y, and the loss function returns a Tensor containing the
    # loss.
    loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    # Zero the gradients before running the backward pass.
    model.zero_grad()

    # Backward pass: compute gradient of the loss with respect to all the learnable
    # parameters of the model. Internally, the parameters of each Module are stored
    # in Tensors with requires_grad=True, so this call will compute gradients for
    # all learnable parameters in the model.
    loss.backward()

    # Update the weights using gradient descent. Each parameter is a Tensor, so
    # we can access its gradients like we did before.
    with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate * param.grad

# You can access the first layer of `model` like accessing the first item of a list
linear_layer = model[0]

# For linear layer, its parameters are stored as `weight` and `bias`.
print(f'Result: y = {linear_layer.bias.item()} + {linear_layer.weight[:, 0].item()} x + {linear_layer.weight[:, 1].item()} x^2 + {linear_layer.weight[:, 2].item()} x^3')
torch.nn.Linear(input_feature, output_feature)
model

在这里，model是

model的每一层也可以访问，输入model[0]

model[1]

model[2]就报错了

model.parameters()
    with torch.no_grad():
        for _,param in enumerate(model.parameters()):
            print(_, param)
            param -= learning_rate * param.grad
可以看到第0层Linear层的参数是 3*1

第1层Flatten层的参数是 1*1

torch.nn.Linear(in_features, out_features, bias=True)

nn.linear()是用来设置网络中的全连接层的，而在全连接层中的输入与输出都是二维张量，一般形状为[batch_size, size]，与卷积层要求输入输出是4维张量不同。

bias - 如果设置为False，则图层不会学习附加偏差。默认值：True

output的行数是会和input保持一样的，所以nn.Linear的参数是不需要输入函数的，只要in_features和out_features就行，也即input层的节点个数和output层的节点个数

关于Linear模型的weight

weight的形状是[out_features, in_features]

torch.nn.flatten(input, start_dim=0, end_dim=-1)

默认start_dim=0, end_dim=-1

意思是从input的第start_dim 开始到第end_dim 进行展平

只能用在网络层当中，不能单独拿出来使用，
t = torch.nn.Flatten(t,start_dim=1, end_dim=2)   这样就是不行的
import torch
t = torch.rand(4,3,5,4)
#t是torch.Size([4，3，5，4])

model = torch.nn.Sequential(
    torch.nn.Flatten(1, 2)  #输出是1维
)
t = model(t)
print(t.shape)

optim

到目前为止，我们通过使用torch.no_grad()来手动更新模型的权重。对于像随机梯度下降这样的简单优化算法来说，这并不是一个巨大的负担，但是在实践中，我们经常使用更复杂的优化器（例如AdaGrad，RMSProp，Adam等）来训练神经网络。

PyTorch中的optim软件包抽象了优化算法的思想并提供了常用优化算法的实现。

在此示例中，我们将像以前一样使用nn包来定义模型，但是我们将使用optim包提供的RMSprop算法来优化模型

注意optim包中的优化算法是用来更新模型权重的，要和损失函数区分开！

import torch
import math

# Create Tensors to hold input and outputs.
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

# Prepare the input tensor (x, x^2, x^3).
p = torch.tensor([1, 2, 3])
xx = x.unsqueeze(-1).pow(p)

# Use the nn package to define our model and loss function.
model = torch.nn.Sequential(
    torch.nn.Linear(3, 1),
    torch.nn.Flatten(0, 1)
)
loss_fn = torch.nn.MSELoss(reduction='sum')

# Use the optim package to define an Optimizer that will update the weights of
# the model for us. Here we will use RMSprop; the optim package contains many other
# optimization algorithms. The first argument to the RMSprop constructor tells the
# optimizer which Tensors it should update.
learning_rate = 1e-3
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)
for t in range(2000):
    # Forward pass: compute predicted y by passing x to the model.
    y_pred = model(xx)

    # Compute and print loss.
    loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    # Before the backward pass, use the optimizer object to zero all of the
    # gradients for the variables it will update (which are the learnable
    # weights of the model). This is because by default, gradients are
    # accumulated in buffers( i.e, not overwritten) whenever .backward()
    # is called. Checkout docs of torch.autograd.backward for more details.
    optimizer.zero_grad() #这一步一定要在loss.backward()之前

    # Backward pass: compute gradient of the loss with respect to model
    # parameters
    loss.backward()

    # Calling the step function on an Optimizer makes an update to its
    # parameters
    optimizer.step()


linear_layer = model[0]
print(f'Result: y = {linear_layer.bias.item()} + {linear_layer.weight[:, 0].item()} x + {linear_layer.weight[:, 1].item()} x^2 + {linear_layer.weight[:, 2].item()} x^3')

Custom nn Modules 自定义网络层

当我们自己实现类的话，必须继承自nn.Module，并且在init中完成初始化的步骤和forward中完成计算图的前向构建的过程

反向传播直接依赖你在这里构建的计算图，已经实现好了，不需要我们再手动实现了。

super(MyLinear, self).init()就是对继承自父类nn.Module的属性进行初始化。

python中的super(Net, self).init()是指首先找到Net的父类,然后子类把父类的__init__()放到自己的__init__()当中，这样子类就有了父类的__init__()的那些东西
import torch
import math


class Polynomial3(torch.nn.Module):
    def __init__(self):
        """
        In the constructor we instantiate four parameters and assign them as
        member parameters.
        """
        super().__init__()
        self.a = torch.nn.Parameter(torch.randn(()))
        self.b = torch.nn.Parameter(torch.randn(()))
        self.c = torch.nn.Parameter(torch.randn(()))
        self.d = torch.nn.Parameter(torch.randn(()))

    def forward(self, x):
        """
        In the forward function we accept a Tensor of input data and we must return
        a Tensor of output data. We can use Modules defined in the constructor as
        well as arbitrary operators on Tensors.
        """
        return self.a + self.b * x + self.c * x ** 2 + self.d * x ** 3

    def string(self):
        """
        Just like any class in Python, you can also define custom method on PyTorch modules
        """
        return f'y = {self.a.item()} + {self.b.item()} x + {self.c.item()} x^2 + {self.d.item()} x^3'


# Create Tensors to hold input and outputs.
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

# Construct our model by instantiating the class defined above
model = Polynomial3()

# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the nn.Linear
# module which is members of the model.
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-6)
for t in range(2000):
    # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x)

    # Compute and print loss
    loss = criterion(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

print(f'Result: {model.string()}')
model = Polynomial3()这里就是在实例化模型，此时就已经执行了__init__()中的代码

torch.nn.Parameter()

torch.nn.Parameter是继承自torch.Tensor的子类，其主要作用是作为nn.Module中的可训练参数使用。

它与torch.Tensor的区别就是nn.Parameter会自动被认为是module的可训练参数，即加入到parameter()这个迭代器中去；而module中非nn.Parameter()的普通tensor是不在parameter中的。

注意，nn.Parameter的对象的requires_grad属性的默认值是True，即是可被训练的，这与torth.Tensor对象的默认值相反。

在nn.Module类中，pytorch也是使用nn.Parameter来对每一个module的参数进行初始化的

Control Flow + Weight Sharing

作为动态图和权重共享的示例，我们实现了一个非常奇怪的模型：一个三阶多项式，在每个前向传播中在4或5中间选择一个随机数，并使用其作为阶数，多次重复使用相同的权重进行计算第四和第五阶

对于此模型，可以使用常规的Python流控制来实现循环，并且可以通过在定义前向传播时简单地多次重复使用同一模块来实现最内层之间的权重共享。

我们可以轻松地将此模型实现为Module子类：

import random
import torch
import math

class DynamicNet(torch.nn.Module):
    def __init__(self):
        """
        In the constructor we instantiate five parameters and assign them as members.
        """
        super().__init__()
        self.a = torch.nn.Parameter(torch.randn(()))
        self.b = torch.nn.Parameter(torch.randn(()))
        self.c = torch.nn.Parameter(torch.randn(()))
        self.d = torch.nn.Parameter(torch.randn(()))
        self.e = torch.nn.Parameter(torch.randn(()))

    def forward(self, x):
        """
        For the forward pass of the model, we randomly choose either 4, 5
        and reuse the e parameter to compute the contribution of these orders.

        Since each forward pass builds a dynamic computation graph, we can use normal
        Python control-flow operators like loops or conditional statements when
        defining the forward pass of the model.

        Here we also see that it is perfectly safe to reuse the same parameter many
        times when defining a computational graph.
        """
        y = self.a + self.b * x + self.c * x ** 2 + self.d * x ** 3
        for exp in range(4, random.randint(4, 6)):
            y = y + self.e * x ** exp
        return y

    def string(self):
        """
        Just like any class in Python, you can also define custom method on PyTorch modules
        """
        return f'y = {self.a.item()} + {self.b.item()} x + {self.c.item()} x^2 + {self.d.item()} x^3 + {self.e.item()} x^4 + {self.e.item()} x^5'


# Create Tensors to hold input and outputs.
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

# Construct our model by instantiating the class defined above
model = DynamicNet()

# Construct our loss function and an Optimizer. Training this strange model with
# vanilla stochastic gradient descent is tough, so we use momentum
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-8, momentum=0.9)
for t in range(30000):
    # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x)

    # Compute and print loss
    loss = criterion(y_pred, y)
    if t % 2000 == 1999:
        print(t, loss.item())

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

print(f'Result: {model.string()}')