深度学习框架_PyTorch_PyTorch框架入门简介

最新推荐文章于 2024-08-06 08:18:19 发布

Rocky Ding*

最新推荐文章于 2024-08-06 08:18:19 发布

阅读量909

点赞数

分类专栏： # 深度学习框架文章标签： PyTorch 深度学习框架深度学习机器学习算法

本文链接：https://blog.csdn.net/Rocky6688/article/details/103350003

版权

深度学习框架专栏收录该内容

19 篇文章 0 订阅

订阅专栏

一.深度学习框架的作用

深度学习需要大量的计算。
数据量和节点数量是主要计算需求来源。
神经网络的结构适用于GPU进行高效并行计算。
深度学习框架的创建目标是在GPU上高效运行深度学习模型。
提供了基础的数据结构。
利用计算图实现自动求导和性能优化。

二.PyTorch定义张量

张量是向量、矩阵在更高维度上的一种推广。

首先我们来看看PyTorch如何定义张量：

import torch
the_array = torch.tensor([[1, 2],[3,4]])

接下来我们编写关于张量的一系列操作：

#编写全0张量
>>> x = torch.zeros(5, 3, dtype=torch.long)
>>> print(x)
tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])

#编写全1张量
>>> x = x.new_ones(5, 3, dtype=torch.double)
>>> print(x)
tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)

#转换成随机数张量
>>> x = torch.randn_like(x, dtype=torch.float)
>>> print(x)
tensor([[-0.5872,  0.1364,  1.7419],
        [-1.0413, -0.3609, -0.1092],
        [ 1.3338, -0.5147, -0.6457],
        [ 0.4084, -0.1589,  0.0457],
        [-0.7811, -1.7387, -0.7983]])

#两张量相加
>>> y = torch.rand(5 ,3)
>>> print(x + y)
tensor([[-0.5236,  0.9482,  2.3097],
        [-0.9912, -0.2774,  0.8689],
        [ 1.7117, -0.4530, -0.2512],
        [ 1.1644,  0.6180,  1.0024],
        [-0.6203, -1.5504,  0.0290]])

#相同的结果，不同的相加形式
>>> print(torch.add(x, y))
tensor([[-0.5236,  0.9482,  2.3097],
        [-0.9912, -0.2774,  0.8689],
        [ 1.7117, -0.4530, -0.2512],
        [ 1.1644,  0.6180,  1.0024],
        [-0.6203, -1.5504,  0.0290]])

#将相加结果赋值给result
>>> result = torch.empty(5, 3)
>>> torch.add(x, y, out=result)
tensor([[-0.5236,  0.9482,  2.3097],
        [-0.9912, -0.2774,  0.8689],
        [ 1.7117, -0.4530, -0.2512],
        [ 1.1644,  0.6180,  1.0024],
        [-0.6203, -1.5504,  0.0290]])
>>> print(result)
tensor([[-0.5236,  0.9482,  2.3097],
        [-0.9912, -0.2774,  0.8689],
        [ 1.7117, -0.4530, -0.2512],
        [ 1.1644,  0.6180,  1.0024],
        [-0.6203, -1.5504,  0.0290]])

#把x加给y
>>> y.add_(x)
tensor([[-0.5236,  0.9482,  2.3097],
        [-0.9912, -0.2774,  0.8689],
        [ 1.7117, -0.4530, -0.2512],
        [ 1.1644,  0.6180,  1.0024],
        [-0.6203, -1.5504,  0.0290]])
>>> print(y)
tensor([[-0.5236,  0.9482,  2.3097],
        [-0.9912, -0.2774,  0.8689],
        [ 1.7117, -0.4530, -0.2512],
        [ 1.1644,  0.6180,  1.0024],
        [-0.6203, -1.5504,  0.0290]])

三.PyTorch的数据结构逻辑

PyTorch支持Number Array与Torch Tensor的相互转换：

我们可以使用torch.from_numpy()来初始化Tensor

>>> import numpy as np
>>> import torch
>>> np_array = np.ones((2, 2))
>>> np_array
array([[1., 1.],
       [1., 1.]])
>>> torch_array = torch.from_numpy(np_array)
>>> torch_array
tensor([[1., 1.],
        [1., 1.]], dtype=torch.float64)
>>> torch_array.add(1)
tensor([[2., 2.],
        [2., 2.]], dtype=torch.float64)
>>> np_array
array([[1., 1.],
       [1., 1.]])

除了CharTensor之外所有在CPU上的张量都支持和numpy的array之间的相互转换。

下面我们看一个有趣的例子：

>>> np_array = np.ones((2,2))
>>> torch_array =torch.from_numpy(np_array)
>>> np_array
array([[1., 1.],
       [1., 1.]])
>>> torch_array
tensor([[1., 1.],
        [1., 1.]], dtype=torch.float64)
>>> np_array = np_array + 1
>>> torch.sum(torch_array)
tensor(4., dtype=torch.float64)

>>> np_array = np.ones((2,2))
>>> torch_array = torch.from_numpy(np_array)
>>> np_array +=1
>>> torch.sum(torch_array)
tensor(8., dtype=torch.float64)

这说明了什么问题呢？
那就是Torch Tensor拷贝了numpy array的数据指针，而不是内容。
可以通过使用python的引用计数机制来保证引用的安全性。

我们再介绍一下Tensor Storage（相当于指针）：

Tensor并不保存数据，而是通过Storage。
Storage保存了指向Raw data、大小、Allocator等信息。
Storage并不负责解析数据，数据怎么解读由Tensor负责。
通过Starge的抽象，使得Raw data和解析方法可以解耦。
view函数：可以定义多个Tensor而共享相同的Storage。
Tensor的Storage属性指向了Raw data和Allocator来确定数据所在设备。

如下代码所示：

>>> tensor_a = torch.rand((4, 2))
>>> tensor_b = tensor_a.view((-1,4))
>>> print(tensor_b)
tensor([[0.9487, 0.1505, 0.3391, 0.3393],
        [0.0519, 0.9332, 0.6811, 0.2277]])
>>> tensor_a_data = tensor_a.storage().data_ptr()
>>> tensor_b_data = tensor_b.storage().data_ptr()
>>> tensor_a_data == tensor_b_data
True

我们总结一下上面的内容：Tensor相当于指向指针的指针，其指向Storage，Storage是指针，其指向Raw data 和Allocator。

以上就是PyTorch的数据结构逻辑。

补充一点：CUDA Tensor：当Tensor需要在不同的设备（CPU、GPU）之间迁移时，可以使用.to()方法。

四.Autograd

Autograd 包为张量上的所有操作提供了自动求导机制。它是一个在运行时定义（define-by-run）的框架，这意味着反向传播是根据代码如何运行来决定的，并且每次迭代可以是不同的。

torch.Tensor 是这个包的核心类。如果设置它的属性 .requires_grad 为 True，那么它将会追踪对于该张量的所有操作。当完成计算后可以通过调用 .backward()，来自动计算所有的梯度。这个张量的所有梯度将会自动累加到.grad属性。

要阻止一个张量被跟踪历史，可以调用 .detach() 方法将其与计算历史分离，并阻止它未来的计算记录被跟踪。

为了防止跟踪历史记录（和使用内存），可以将代码块包装在 with torch.no_grad(): 中。在评估模型时特别有用，因为模型可能具有 requires_grad = True 的可训练的参数，但是我们不需要在此过程中对他们进行梯度计算。

Function类在自动求导机制实现过程中起了很大的作用。

Tensor 和 Function 互相连接生成了一个无圈图(acyclic graph)，它编码了完整的计算历史。每个张量都有一个 .grad_fn 属性，该属性引用了创建 Tensor 自身的Function（除非这个张量是用户手动创建的，即这个张量的 grad_fn 是 None ）。

下面我们举一个反向传播的例子：

>>> import numpy as np
>>> import torch
>>> x = torch.ones(2, 2, requires_grad=True)
>>> print(x)
tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
>>> y = x + 2
>>> print(y)
tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)
>>> z = y * y * 3
>>> out = z.mean()
>>> print(z, out)
tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward0>
>>> out.backward()
>>> print(x.grad)
tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])