PyTorch：全局函数

-柚子皮-

已于 2023-10-12 14:32:03 修改

阅读量785

点赞数

分类专栏： Pytorch 文章标签： pytorch

于 2020-10-29 00:02:24 首次发布

本文链接：https://blog.csdn.net/pipisorry/article/details/109348484

版权

Pytorch 专栏收录该内容

18 篇文章

订阅专栏

分布生成函数

设置seed

torch.manual_seed(0)

等差数列张量生成torch.arange

x = torch.arange(0, 12, 2)

tensor([ 0, 2, 4, 6, 8, 10])
x = x.reshape(2, 3)

tensor([[ 0, 2, 4],
[ 6, 8, 10]])

torch.randint

torch.randint(low=0, high, size, ...)

Returns a tensor filled with random integers generated uniformly between low (inclusive) and high (exclusive). The shape of the tensor is defined by the variable argument size.

示例：生成[3-9]的，size=(2,2)的整数tensor

torch.randint(3, 10, (2, 2))
tensor([[4, 5],
[6, 7]])

torch.randint_like(input, low=0, high, ...)

torch.randint_like(input, 0, 100)

torch.randn

torch.randn(*size, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor

outi∼N(0,1)

示例

>>> torch.randn(4)
tensor([-2.1436, 0.9966, 2.3426, -0.6366])
>>> torch.randn(2, 3)
tensor([[ 1.5954, 2.8929, -1.0923],
[ 1.1719, -0.4709, -0.1996]])

# 生成一个4*3*2维的张量

input = torch.randn(4, 3, 2)

torch.normal

torch.normal(mean, std, *, generator=None, out=None) → Tensor

这种生成正态分布数据的张量创建有4种模式：

（1）mean为张量，std为张量

（2）mean为标量，std为标量

（3）mean为标量，std为张量

（4）mean为张量，std为标量

[从零开始深度学习Pytorch笔记（3）——张量的创建（下）]

torch.normal(mean, std, size, *, out=None) → Tensor
torch.normal(2, 3, size=(1, 4))
tensor([[-1.3987, -1.9544, 3.6048, 0.7909]])

[TORCH.NORMAL]

torch.zeros

可以指定dtype。

torch.zeros((1, 0), dtype=torch.float32)

torch.ones

torch.ones(*size, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)

torch.ones(2, 3)
tensor([[ 1., 1., 1.],
[ 1., 1., 1.]])

torch.ones_like

torch.ones_like(input, *, dtype=None, layout=None, device=None, requires_grad=False, memory_format=torch.preserve_format)

Returns a tensor filled with the scalar value 1, with the same size as input. torch.ones_like(input) is equivalent to torch.ones(input.size(), dtype=input.dtype, layout=input.layout, device=input.device).

input = torch.empty(2, 3)
>>> torch.ones_like(input)
tensor([[ 1., 1., 1.],
[ 1., 1., 1.]])

torch.eye

torch.eye(n, m=None, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)

Returns a 2-D tensor with ones on the diagonal and zeros elsewhere.

参数
n (int) – the number of rows

m (int, optional) – the number of columns with default being n

示例

torch.eye(3)
tensor([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])

输出相关

onehot和multihot编码

sklearn实现

可以通过sklearn编码后再换

sklearn实现时，需要注意在init时定义encoder并保存，否则train和predict时可能编码对应的还不一样，因为原始label不是被当成从0开始且连续的，而是当成离散的无序的。

[onehot - Scikit-learn：数据预处理Preprocessing data]

onehot编码

nn.functional.one_hot

torch.nn.functional.one_hot(tensor, num_classes=-1)

参数Parameters

tensor (LongTensor) – class values of any shape.

num_classes (int) – Total number of classes. If set to -1, the number of classes will be inferred as one greater than the largest class value in the input tensor.

自动检测类别个数

import torch.nn.functional as F
import torch

tensor = torch.arange(0, 5) % 3 # tensor([0, 1, 2, 0, 1])
one_hot = F.one_hot(tensor)

# 输出：
# tensor([[1, 0, 0],
# [0, 1, 0],
# [0, 0, 1],
# [1, 0, 0],
# [0, 1, 0]])

F.one_hot会自动检测不同类别个数，生成对应独热编码。

指定类别数

tensor = torch.arange(0, 5) % 3 # tensor([0, 1, 2, 0, 1])
one_hot = F.one_hot(tensor, num_classes=5)

# 输出：
# tensor([[1, 0, 0, 0, 0],
# [0, 1, 0, 0, 0],
# [0, 0, 1, 0, 0],
# [1, 0, 0, 0, 0],
# [0, 1, 0, 0, 0]])

示例：生成onehot

两种方法都可以，只是类型不一样

import torch
import torch.nn.functional as F

label_size = 5
target = 3
one_hot0 = F.one_hot(torch.tensor(target), label_size)
print(one_hot0)
one_hot1 = torch.eye(label_size)[target]
print(one_hot1)
# tensor([0, 0, 0, 1, 0])
# tensor([0., 0., 0., 1., 0.])

multihot编码

import torch.nn.functional as F
import torch

tensor = torch.tensor([[0, 2], [2, 2]])

下面两种方法都需要先将tensor内的元素对齐成一样长度。

对齐后直接实现

multi_hot = torch.zeros(2, 4).scatter_(1, tensor, 1)
print(multi_hot)
# tensor([[1., 0., 1., 0.],
# [0., 0., 1., 0.]])

import torch
def get_multi_hot_label(batchs, label_size):
    # 先对齐
    batch_size = len(batchs)
    max_label_num = max([len(x) for x in batchs])
    doc_labels_extend = [[doc_label[0] for _ in range(max_label_num)] for doc_label in batchs]
    for i in range(0, batch_size):
        doc_labels_extend[i][0: len(batchs[i])] = batchs[i]
    y = torch.Tensor(doc_labels_extend).long()
    # 再变成multihot
    multihot_tensor = torch.zeros(batch_size, label_size).scatter_(1, y, 1)
    return multihot_tensor


print(get_multi_hot_label([[0, 1], [2]], label_size=3))
# tensor([[1., 1., 0.],
#         [0., 0., 1.]])

对齐后通过onehot间接实现

one_hot = F.one_hot(tensor, num_classes=4).sum(dim=-2)
print(one_hot)
# tensor([[1, 0, 1, 0],
# [0, 0, 1, 0]])

不对齐直接生成multihot

import torch
def get_multi_hot_label(batchs, label_size):
    multihot_list = [[1 if i in batch else 0 for i in range(label_size)] for batch in batchs]
    return torch.Tensor(multihot_list)

print(get_multi_hot_label([[0, 1], [2]], label_size=3))
# tensor([[1., 1., 0.],
#         [0., 0., 1.]])

torch.sigmoid

这里只提一个精度问题：

torch.sigmoid(torch.Tensor([10])) = torch.sigmoid(torch.Tensor([-89])) = tensor([1.0000])

超出-89和10之外的，都是极值。

[TORCH.SIGMOID]

torch.topk

torch.topk(input, k, dim=None, largest=True, sorted=True, *, out=None)

求tensor中某个dim的前k大或者前k小的值以及对应的index。

If dim is not given, the last dimension of the input is chosen.

示例

x = torch.arange(0, 12, 2).reshape(2, 3)
print(torch.topk(x, 2, -1))

tensor([[ 0, 2, 4],
[ 6, 8, 10]])
torch.return_types.topk(
values=tensor([[ 4, 2],
[10, 8]]),
indices=tensor([[2, 1],
[2, 1]]))

torch.log

torch.log(input, *, out=None)

Returns a new tensor with the natural logarithm of the elements of input.

y_i=log_e(x_i)

torch.exp

torch.exp(input, *, out=None)

Returns a new tensor with the exponential of the elements of the input tensor input.

y_i=e^(x_i)

资源使用相关

pytorch将线程数设置成1

def SetUpPytroch():
torch_thread_num = int(os.getenv("TORCH_THREAD_NUM", "1"))
if torch_thread_num != -1:
torch.set_num_threads(torch_thread_num)
torch.set_num_interop_threads(torch_thread_num)
Note: pytorch内部是线程池，观察时可能还是会看到多个线程。

Pytorch代码资源使用解析

torch.profiler.profile(*, activities=None, schedule=None, on_trace_ready=None, record_shapes=False, profile_memory=False, with_stack=False, with_flops=False, with_modules=False, experimental_config=None, use_cuda=None)

with torch.profiler.profile(
activities=[
torch.profiler.ProfilerActivity.CPU,
torch.profiler.ProfilerActivity.CUDA,
]
) as p:
code_to_profile()
print(p.key_averages().table(
sort_by="self_cuda_time_total", row_limit=-1))

老版本：

torch.autograd.profiler.profile(enabled=True, use_cuda=False, record_shapes=False)

x = torch.randn((1, 1), requires_grad=True)
with torch.autograd.profiler.profile() as prof:
for _ in range(100): # any normal python code, really!
　　y = x ** 2
　　y.backward()
# NOTE: some columns were removed for brevity
print(prof.key_averages().table(sort_by="self_cpu_time_total"))

[https://www.cnblogs.com/jiangkejie/p/13256094.html]

from: -柚子皮-

ref: