Pytorch入门教程学习笔记（一）数据操作和自动梯度

最新推荐文章于 2024-08-01 20:49:22 发布

我就是黑凤梨

最新推荐文章于 2024-08-01 20:49:22 发布

阅读量370

点赞数 1

分类专栏：机器学习文章标签： python 深度学习自然语言处理 pytorch 神经网络

本文链接：https://blog.csdn.net/hzy199772/article/details/115797191

版权

机器学习专栏收录该内容

10 篇文章 3 订阅

订阅专栏

1.1 创建一个Tensor

import torch
import numpy as np
#我们创建一个5x3的未初始化的Tensor：
x = torch.empty(5,3)
x

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])

#创建一个5x3的随机初始化的Tensor:
x = torch.rand(5,3)
x

tensor([[0.6246, 0.1021, 0.8471],
        [0.1354, 0.2436, 0.1399],
        [0.9956, 0.1943, 0.8527],
        [0.9746, 0.8519, 0.4142],
        [0.6289, 0.3844, 0.2082]])

#创建一个5x3的long型全0的Tensor:
x = torch.zeros(5, 3, dtype=torch.long)
x

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])

#直接根据数据创建
x = torch.tensor([5.5, 3])
print(x)

tensor([5.5000, 3.0000])

#通过现有的Tensor来创建，此方法会默认重用输入Tensor的一些属性，例如数据类型，除非自定义数据类型。
x = x.new_ones(5, 3, dtype=torch.float64)
print(x)

x = torch.randn_like(x, dtype=torch.float)
print(x)

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)
tensor([[-0.6733,  0.3861, -0.6226],
        [-1.1622, -0.4318,  1.5766],
        [-0.6587,  0.9455, -0.8536],
        [ 2.0206, -2.2608,  0.6398],
        [-1.8293,  1.8952, -0.5302]])

#可以通过shape或者size()来获取Tensor的形状:
print(x.shape)
print(x.size())

torch.Size([5, 3])
torch.Size([5, 3])

这些方法都以在创建的时候指定数据类型dtype和存放的device(cpu/gpu)

在这里插入图片描述

1.2 Tensor操作

# 加法形式一
y = torch.rand(5,3)
print(x+y)

tensor([[-0.6629,  0.4116,  0.2245],
        [-0.4891, -0.2085,  2.4657],
        [ 0.0031,  1.4532, -0.4255],
        [ 2.4316, -2.0213,  1.4931],
        [-1.1799,  1.9080, -0.1649]])

#加法形式二
print(torch.add(x,y))

tensor([[-0.6629,  0.4116,  0.2245],
        [-0.4891, -0.2085,  2.4657],
        [ 0.0031,  1.4532, -0.4255],
        [ 2.4316, -2.0213,  1.4931],
        [-1.1799,  1.9080, -0.1649]])

#指定输出result
result = torch.empty(5,3)
torch.add(x, y, out=result)
print(result)

tensor([[-0.6629,  0.4116,  0.2245],
        [-0.4891, -0.2085,  2.4657],
        [ 0.0031,  1.4532, -0.4255],
        [ 2.4316, -2.0213,  1.4931],
        [-1.1799,  1.9080, -0.1649]])

#加法形式三、下划线原地更改值    PyTorch操作inplace版本都有后缀  _  , 例如x.copy_(y), x.t_()
y.add_(x)
print(y)

tensor([[-0.6629,  0.4116,  0.2245],
        [-0.4891, -0.2085,  2.4657],
        [ 0.0031,  1.4532, -0.4255],
        [ 2.4316, -2.0213,  1.4931],
        [-1.1799,  1.9080, -0.1649]])

索引

可以使用类似NumPy的索引操作来访问Tensor的一部分，
注意的是：原数据与索引结果共享内存，若修改一个，另一个会跟着修改。

y = x[0, :]
print(x[0, :])
y += 1
print(y)
print(x[0, :]) # 源tensor也被改了

tensor([-0.6733,  0.3861, -0.6226])
tensor([0.3267, 1.3861, 0.3774])
tensor([0.3267, 1.3861, 0.3774])

在这里插入图片描述

改变形状

#用view()来改变Tensor的形状：
y = x.view(15)
z = x.view(-1, 5)  # -1所指的维度可以根据其他维度的值推出来
print(x.size(), y.size(), z.size())

torch.Size([5, 3]) torch.Size([15]) torch.Size([3, 5])

注意：view()返回的新Tensor与原Tensor虽然可能有不同的size，但共享data，即更改其中一个，另外一个也会跟着改变。(view仅仅是改变了对这个张量观察的角度，但是内部数据并未改变)

如果我们想得到一个真正的新副本（不共享data的内存）该怎么做呢？
Pytorch提供了一个reshape()可以改变张量的形状，但是此函数并不能保证返回的是其拷贝，所以一般不推荐使用。推荐使用clone创造副本然后再使用view

x_cp = x.clone().view(15)
x -=1
print(x)
print(x_cp)

tensor([[-0.6733,  0.3861, -0.6226],
        [-2.1622, -1.4318,  0.5766],
        [-1.6587, -0.0545, -1.8536],
        [ 1.0206, -3.2608, -0.3602],
        [-2.8293,  0.8952, -1.5302]])
tensor([ 0.3267,  1.3861,  0.3774, -1.1622, -0.4318,  1.5766, -0.6587,  0.9455,
        -0.8536,  2.0206, -2.2608,  0.6398, -1.8293,  1.8952, -0.5302])

clone方法的另一个好处是会被记录在计算图，梯度回传到副本时也会传到原Tensor。

还有一个常用的函数是item(), 它可以将一个标量的Tensor转换成一个Python number：

x = torch.randn(1)
print(x)
print(x.item())

tensor([-0.1014])
-0.10138028115034103

函数：线性代数

另外，PyTorch还支持一些线性函数，具体用法参考官方文档。如下表所示：

在这里插入图片描述

Pytorch广播机制

当我们对两个形状不同的Tensor张量按元素进行运算时，可能会触发pytorch的广播（broadcasting）机制：先复制元素使这两个Tensor张量的形状相同后再进行元素的运算。例如：

x = torch.arange(1, 3).view(1, 2)
print(x)
y = torch.arange(1, 4).view(3, 1)
print(y)
print(x + y)

tensor([[1, 2]])
tensor([[1],
        [2],
        [3]])
tensor([[2, 3],
        [3, 4],
        [4, 5]])

由于x和y分别是1X2和3X1的两个矩阵，要计算x + y，这时候x中第1行的2个元素被复制到了第2、3行
y中第1列的3个元素被广播（复制）到了第2列。如此形状相同，就可以对2个3X2的矩阵按元素相加了。

1.4运算的内存开销

索引操作是不会开辟新内存的，而像y = x + y这样的运算是会新开内存的，然后将y指向新内存。为了演示这一点，我们可以使用Python自带的id函数：如果两个实例的ID一致，那么它们所对应的内存地址相同；反之则不同。

x = torch.tensor([1, 2])
y = torch.tensor([3, 4])
id_before = id(y)
y = y + x
print('id_before:'+str(id_before))
print('id(y):'+str(id(y)))

id_before:3181384989976
id(y):3181384988696

如果想指定结果到原来的y的内存，我们可以使用前面介绍的索引来进行替换操作。在下面的例子中，我们把x + y的结果通过[:]写进y对应的内存中

x = torch.tensor([1, 2])
y = torch.tensor([3, 4])
id_before = id(y)
y[:] = y + x
print('id_before:'+str(id_before))
print('id(y):'+str(id(y)))

id_before:3181385009416
id(y):3181385009416

我们还可以使用运算符全名函数中的out参数或者自加运算符+=(也即add_())达到上述效果，例如torch.add(x, y, out=y)和y += x(y.add_(x))。

所以推荐使用方法 +=

x = torch.tensor([1, 2])
y = torch.tensor([3, 4])
id_before = id(y)
torch.add(x, y, out=y) # y += x, y.add_(x)
print('id_before:'+str(id_before))
print('id(y):'+str(id(y)))

id_before:3181385011832
id(y):3181385011832

注：虽然view返回的Tensor与源Tensor是共享data的，但是依然是一个新的Tensor（因为Tensor除了包含data外还有一些其他属性），二者id（内存地址）并不一致。

也就是说，使用view时，虽然修改数据同时变化，但地址仍然不同，只是共享data

1.5 Tensor与Numpy

用numpy()和from_numpy()这两个方法可以将Tensor和NumPy中的数组进行相互转换。但我们要注意的一点是：这两个函数所产生的的Tensor和NumPy中的数组共享的是相同的内存，这样两者之间转换很快，但改变其中一个的时候另一个也会改变！！

另外一个常用的把NumPy中的array转换成Tensor的方法是torch.tensor(), 注意，这个方法总会进行data copy（消耗更多的时间和空间），所以返回的Tensor和原来的数据不再共享内存。

Tensor----->Numpy

a = torch.ones(5)
b = a.numpy()
print(a, b)

a += 1
print(a, b)
b += 1
print(a, b)

tensor([1., 1., 1., 1., 1.]) [1. 1. 1. 1. 1.]
tensor([2., 2., 2., 2., 2.]) [2. 2. 2. 2. 2.]
tensor([3., 3., 3., 3., 3.]) [3. 3. 3. 3. 3.]

Numpy----》Tensor

a = np.ones(5)
b = torch.from_numpy(a)
print(a, b)

a += 1
print(a, b)
b += 1
print(a, b)

[1. 1. 1. 1. 1.] tensor([1., 1., 1., 1., 1.], dtype=torch.float64)
[2. 2. 2. 2. 2.] tensor([2., 2., 2., 2., 2.], dtype=torch.float64)
[3. 3. 3. 3. 3.] tensor([3., 3., 3., 3., 3.], dtype=torch.float64)

有在CPU上的Tensor（除了CharTensor）都支持与NumPy数组相互转换。

c = torch.tensor(a)
a += 1
print(a, c)

[4. 4. 4. 4. 4.] tensor([3., 3., 3., 3., 3.], dtype=torch.float64)

1.6 Tensor在GPU上的使用

用方法to()可以将Tensor在CPU和GPU（需要硬件支持）之间相互移动。

# 以下代码只有在PyTorch GPU版本上才会执行
if torch.cuda.is_available():
    device = torch.device("cuda")          # GPU
    y = torch.ones_like(x, device=device)  # 直接创建一个在GPU上的Tensor
    x = x.to(device)                       # 等价于 .to("cuda")
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))       # to()还可以同时更改数据类型

tensor([2, 3], device='cuda:0')
tensor([2., 3.], dtype=torch.float64)

1.7 自动求梯度

在深度学习中，我们经常需要对函数求梯度（gradient）。PyTorch提供的autograd包能够根据输入和前向传播过程自动构建计算图，并执行反向传播。本节将介绍如何使用autograd包来进行自动求梯度的有关操作

1.7.1 概念

上一节介绍的Tensor是这个包的核心类，如果将其属性.requires_grad设置为True，它将开始追踪(track)在其上的所有操作（这样就可以利用链式法则进行梯度传播了）。完成计算后，可以调用.backward()来完成所有梯度计算。此Tensor的梯度将累积到.grad属性中。

注意在y.backward()时，如果y是标量，则不需要为backward()传入任何参数；否则，需要传入一个与y同形的Tensor。解释见 2.3.2 节。

如果不想要被继续追踪，可以调用.detach()将其从追踪记录中分离出来，这样就可以防止将来的计算被追踪，这样梯度就传不过去了。此外，还可以用with torch.no_grad()将不想被追踪的操作代码块包裹起来，这种方法在评估模型的时候很常用，因为在评估模型时，我们并不需要计算可训练参数（requires_grad=True）的梯度。

Function是另外一个很重要的类。Tensor和Function互相结合就可以构建一个记录有整个计算过程的有向无环图（DAG）。每个Tensor都有一个.grad_fn属性，该属性即创建该Tensor的Function, 就是说该Tensor是不是通过某些运算得到的，若是，则grad_fn返回一个与这些运算相关的对象，否则是None。

1.7.2 Tensor

#创建一个Tensor并设置requires
x = torch.ones(2, 2, requires_grad=True)
print(x)
print(x.grad_fn)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
None

#做一个运算操作
y = x + 2
print(y)
print(y.grad_fn)

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)
<AddBackward0 object at 0x000002E4B9539F88>

注意x是直接创建的，所以它没有grad_fn, 而y是通过一个加法操作创建的，所以它有一个为的grad_fn。

像x这种直接创建的称为叶子节点，叶子节点对应的grad_fn是None。

print(x.is_leaf, y.is_leaf) # True False

True False

#复杂度运算操作：
z = y * y * 3
out = z.mean()#mean:期望EX
print(z, out)

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward0>)

通过.requires_grad_()来用in-place（原地）的方式改变requires_grad属性：

a = torch.randn(2,2)#缺失情况下默认requires_grad = false
a = ((a * 3) / (a - 1) )
print(a.requires_grad)#false
a.requires_grad_(True)
print(a.requires_grad)#True
b = (a * a).sum()
print(b.grad_fn)

False
True
<SumBackward0 object at 0x000002E4B9514988>

1.7.3 梯度

因为out是一个标量，所以调用backward()时不需要指定求导变量：

我们来看看out关于x的梯度 d(out) / dx :

out.backward() # 等价于 out.backward(torch.tensor(1.))
print(x.grad)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])

在这里插入图片描述

# 再来反向传播一次，注意grad是累加的
print(x)
out2 = x.sum()
print(out2)
out2.backward()
print(x.grad)

out3 = x.sum()
x.grad.data.zero_()
out3.backward()
print(x.grad)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
tensor(4., grad_fn=<SumBackward0>)
tensor([[5.5000, 5.5000],
        [5.5000, 5.5000]])
tensor([[1., 1.],
        [1., 1.]])

中断梯度追踪

x = torch.tensor(1.0, requires_grad = True)
y1 = x ** 2
with torch.no_grad():
    y2 = x**3
y3 = y1 + y2

print(x.requires_grad)
print(y1, y1.requires_grad)
print(y2, y2.requires_grad)
print(y3, y3.requires_grad)

True
tensor(1., grad_fn=<PowBackward0>) True
tensor(1.) False
tensor(2., grad_fn=<AddBackward0>) True

可以看到，上面的y2是没有grad_fn而且y2.requires_grad=False的，而y3是有grad_fn的。如果我们将y3对x求梯度的话会是多少呢？

y3.backward()
print(x.grad)

tensor(2.)

在这里插入图片描述

x = torch.ones(1,requires_grad=True)

print(x.data) # 还是一个tensor
print(x.data.requires_grad) # 但是已经是独立于计算图之外

y = 2 * x
x.data *= 100 # 只改变了值，不会记录在计算图，所以不会影响梯度传播

y.backward()
print(x) # 更改data的值也会影响tensor的值
print(x.grad)

tensor([1.])
False
tensor([100.], requires_grad=True)
tensor([2.])

我就是黑凤梨

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
1
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录