Pytorch——PyTorch Tensors Explained - Neural Network Programming

PyTorch Tensors Explained - Neural Network Programming

Instances of the torch.Tensor class

PyTorch tensors are instances of the torch.Tensor Python class. We can create a torch.Tensor object using the class constructor like so:

> t = torch.Tensor()
> type(t)
torch.Tensor

Tensor attributes

Every torch.Tensor has these attributes:

  • torch.dtype
  • torch.device
  • torch.layout
> print(t.dtype)
> print(t.device)
> print(t.layout)
torch.float32
cpu
torch.strided
Tensors have a torch.dtype

The dtype, which is torch.float32 in our case, specifies the type of the data that is contained within the tensor. Tensors contain uniform (of the same type) numerical data with one of these types:
在这里插入图片描述
Tensor operations between tensors must happen between tensors with the same type of data.

Tensors have a torch.device

The device, cpu in our case, specifies the device (CPU or GPU) where the tensor’s data is allocated.
This determines where tensor computations for the given tensor will be performed.
PyTorch supports the use of multiple devices, and they are specified using an index like so:

> device = torch.device('cuda:0')
> device
device(type='cuda', index=0)

Tensor operations between tensors must happen between tensors that exists on the same device.

Tensors have a torch.layout

The layout, strided in our case, specifies how the tensor is stored in memory.

Take away from the tensor attributes

As neural network programmers, we need to be aware of the following:

  1. Tensors contain data of a uniform type (dtype).
  2. Tensor computations between tensors depend on the dtype and the device.

Creating tensors using data

These are the primary ways of creating tensor objects (instances of the torch.Tensor class), with data (array-like) in PyTorch:
1.torch.Tensor(data)
2.torch.tensor(data)
3.torch.as_tensor(data)
4.torch.from_numpy(data)

Here is an example:

> data = np.array([1,2,3])
> type(data)
numpy.ndarray

> o1 = torch.Tensor(data)
> o2 = torch.tensor(data)
> o3 = torch.as_tensor(data)
> o4 = torch.from_numpy(data)

> print(o1)
> print(o2)
> print(o3)
> print(o4)
tensor([1., 2., 3.])
tensor([1, 2, 3], dtype=torch.int32)
tensor([1, 2, 3], dtype=torch.int32)
tensor([1, 2, 3], dtype=torch.int32)

Tensor creation operations: What’s the difference?

Uppercase/lowercase: torch.Tensor() vs torch.tensor()

The first option with the uppercase T is the constructor of the torch.Tensor class (构造器)
The second option is what we call a factory function that constructs torch.Tensor objects and returns them to the caller. (制造函数) You can think of the torch.tensor() function as a factory that builds tensors given some parameter inputs. Factory functions are a software design pattern for creating objects.

default dtype vs inferred dtype 默认数据类型和推断得出的数据类型
> print(o1.dtype)
> print(o2.dtype)
> print(o3.dtype)
> print(o4.dtype)
torch.float32
torch.int32
torch.int32
torch.int32

The difference here arises in the fact that the torch.Tensor() constructor uses the default dtype when building the tensor. 构造器用默认数据类型创建tensor

> torch.get_default_dtype()
torch.float32
> o1.dtype == torch.get_default_dtype()
True

The other calls choose a dtype based on the incoming data. This is called type inference. The dtype is inferred based on the incoming data. 其他方式创建tensor的数据类型是基于输入数据的,也被称作类型引用。
Note that the dtype can also be explicitly set for these calls by specifying the dtype as an argument:

> torch.tensor(data, dtype=torch.float32)
> torch.as_tensor(data, dtype=torch.float32)

With torch.Tensor(), we are unable to pass a dtype to the constructor. This is an example of the torch.Tensor() constructor lacking in configuration options.
无法向构造器传入数据类型,这也是构造器 缺少一个配置选项的例子
This is one of the reasons to go with the torch.tensor() factory function for creating our tensors.

Sharing memory for performance: copy vs share

Here is an example:

> data = np.array([1,2,3])
> type(data)
numpy.ndarray

> o1 = torch.Tensor(data)
> o2 = torch.tensor(data)
> o3 = torch.as_tensor(data)
> o4 = torch.from_numpy(data)

> print('old:', data)
old: [1 2 3]
# 仅改变原始值,o1,o2,o3,o4不做任何变化
> data[0] = 0

> print('new:', data)
new: [0 2 3]

> print(o1)
> print(o2)
> print(o3)
> print(o4)

tensor([1., 2., 3.])
tensor([1, 2, 3], dtype=torch.int32)
tensor([0, 2, 3], dtype=torch.int32)
tensor([0, 2, 3], dtype=torch.int32)

The first two o1 and o2 still have the original value of 1 for index 0, while the second two o3 and o4 have the new value of 0 for index 0.
This happens because torch.Tensor() and torch.tensor() copy their input data while torch.as_tensor() and torch.from_numpy() share their input data in memory with the original input object.
在这里插入图片描述

This sharing just means that the actual data in memory exists in a single place. As a result, any changes that occur in the underlying data will be reflected in both objects, the torch.Tensor and the numpy.ndarray.
Sharing data is more efficient and uses less memory than copying data because the data is not written to two locations in memory.

If we have a torch.Tensor and we want to convert it to a numpy.ndarray, we do it like so:

> print(o3.numpy())
> print(o4.numpy())
[0 2 3]
[0 2 3]

> print(type(o3.numpy()))
> print(type(o4.numpy()))
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>

This establishes that torch.as_tensor() and torch.from_numpy() both share memory with their input data. However, which one should we use, and how are they different?
The torch.from_numpy() function only accepts numpy.ndarrays, while the torch.as_tensor() function accepts a wide variety of array-like objects including other PyTorch tensors. For this reason, torch.as_tensor() is the winning choice in the memory sharing game.

Best options for creating tensors in PyTorch

These two are the best options:

  1. torch.tensor()
  2. torch.as_tensor()

The torch.tensor() call is the sort of go-to call, while torch.as_tensor() should be employed when tuning our code for performance.
torch.tensor()是一个首选调用方式,但在调试代码,调优的过程中可以选用torch.as_tensor()

Some things to keep in mind about memory sharing (it works where it can):

  1. Since numpy.ndarray objects are allocated on the CPU, the as_tensor() function must copy the data from the CPU to the GPU when a GPU is being used.
  2. The memory sharing of as_tensor() doesn’t work with built-in Python data structures like lists.
  3. The as_tensor() call requires developer knowledge of the sharing feature. This is necessary so we don’t inadvertently make an unwanted change in the underlying data without realizing the change impacts multiple objects.
    as_tensor()调用需要开发人员了解共享特性。这是必要的,这样我们就不会无意中对底层数据进行不必要的更改,而没有意识到更改会影响多个对象。
  4. The as_tensor() performance improvement will be greater if there are a lot of back and forth operations between numpy.ndarray objects and tensor objects. However, if there is just a single load operation, there shouldn’t be much impact from a performance perspective.
    如果numpy.ndarray对象和tensor对象之间有大量的来回操作,那么as_tensor()的性能改进将会更大。但是,如果只有一个加载操作,那么从性能的角度来看,应该不会有太大的影响。

Creation options without data

We have the torch.eye() function which returns a 2-D tensor with ones on the diagonal and zeros elsewhere. The name eye() is connected to the idea of an identity matrix , which is a square matrix with ones on the main diagonal and zeros everywhere else.返回一个对角矩阵

> print(torch.eye(2))
tensor([
    [1., 0.],
    [0., 1.]
])

We have the torch.zeros() function that creates a tensor of zeros with the shape of specified shape argument.

> print(torch.zeros([2,2]))
tensor([
    [0., 0.],
    [0., 0.]
])

Similarly, we have the torch.ones() function that creates a tensor of ones.

> print(torch.ones([2,2]))
tensor([
    [1., 1.],
    [1., 1.]
])

We also have the torch.rand() function that creates a tensor with a shape of the specified argument whose values are random.

> print(torch.rand([2,2]))
tensor([
    [0.0465, 0.4557],
    [0.6596, 0.0941]
])

更多内容可以参见:

Pytorch Documentations

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

TonyHsuM

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值