pytorch常用操作和方法_torch常用方法-CSDN博客

本文链接：https://blog.csdn.net/so_that/article/details/97026988

基础

常用数据类型和转换
torch.cat，torch.stack，torch.chunk
torch.sum，torch.mean，torch.max
squeeze，unsqueeze
permute 重排序， transpose
tensor的数值，tensor.numpy，tensor.data，tensor.detach
torch.gt，torch.lt，torch.eq，torch.ne
torch.masked_select，torch.masked_filled
cuda和device
requires_grad
pack_padded_sequence和pad_packed_sequence
torch.gather
torch.mm,torch.bmm,torch.matmul

常用数据类型和转换

torch的常用数据类型有：torch.IntTensor、 torch.FloatTensor、 torch.LongTensor

torch.Tensor是默认的tensor类型默认的是 torch.FloatTensor。

我们来创建一个（下面这种的方式是从numpy转化为tensor进行创建）：

a = np.array([2, 2])
tensor = torch.from_numpy(a)
output：
tensor([2, 2], dtype=torch.int32)

接着我们来转变他的数据类型：（举一个float其他同理）

tensor=tensor.float()
print(tensor,type(tensor))
output：
tensor([2., 2.]) <class 'torch.Tensor'>

在介绍下面的操作之前，先说明一点，下面的所有的 dim 设置不做特殊说明时，都表示：0 第一个维度，1第二个维度，2第三个维度。依次类推。

torch.cat() ， torch.stack() ，torch.chunk()

我们随机创建两个tensor进行合并，这里需要注意 torch.cat() 和 torch.stack() 的区别，来看下面两个例子：

torch.cat()

a=torch.rand(2,3)
b=torch.randn(2,3)
print(a)
print(b)
c=torch.cat((a,b),0)
print(c)
print(c.size())
output：
tensor([[0.5070, 0.2374, 0.2489],
        [0.7007, 0.2080, 0.4985]])
tensor([[ 0.1958, -0.0674, -0.7950],
        [-1.1569, -0.8597,  0.8683]])
tensor([[ 0.5070,  0.2374,  0.2489],
        [ 0.7007,  0.2080,  0.4985],
        [ 0.1958, -0.0674, -0.7950],
        [-1.1569, -0.8597,  0.8683]])
torch.Size([4, 3])

torch.stack()

d=torch.stack((a,b), dim=0)
print(d)
print(d.size())
output:
tensor([[[ 0.7555,  0.1871,  0.2619],
         [ 0.5023,  0.4412,  0.7843]],

        [[-1.0808,  0.0563, -0.2942],
         [ 0.2736,  0.7614,  1.7735]]])
torch.Size([2, 2, 3])

torch.chunk()

第一个参数是tensor，第二个参数是切成多少段，第三个参数是按照那个维度切。感觉和cat()刚好相反。

import torch
a = torch.rand(3, 3)
print(a)
'''
tensor([[0.4530, 0.5209, 0.9342],
        [0.2975, 0.8168, 0.6980],
        [0.2103, 0.6733, 0.0945]])
'''
c = torch.chunk(a, 3, 0)
print(c)
'''
(tensor([[0.4530, 0.5209, 0.9342]]), tensor([[0.2975, 0.8168, 0.6980]]), tensor([[0.2103, 0.6733, 0.0945]]))
'''

torch.sum()，torch.mean()，torch.max()

a=torch.rand(2,3)
print(a)
print(torch.sum(a,dim=0))
c=torch.mean(a,0)
print(c)
output：
tensor([[0.1001, 0.9999, 0.9990],
        [0.2909, 0.8446, 0.6064]])
tensor([0.3910, 1.8445, 1.6054])
tensor([0.1955, 0.9223, 0.8027])

a=torch.randn(2,2)
print(a)
max_value,maxindex=torch.max(a,1)
print(max_value)
print(maxindex)
output:
tensor([[-0.4345, -0.1618],
        [-0.8059, -0.0876]])
tensor([-0.1618, -0.0876])
tensor([1, 1])

squeeze，unsqueeze

unsqueeze是增加一个维度，squeeze是去除维度1的维度

a=torch.rand(2,3)
print(a)
a=a.unsqueeze(1)
print(a.size())
print(a.squeeze().size())
output：
tensor([[0.8247, 0.2986, 0.0938],
        [0.4890, 0.7971, 0.5439]])
torch.Size([2, 1, 3])
torch.Size([2, 3])

重排序

permute

这里主要用于调整数据的维度的顺序，例如在lstm中的batch_first为true时对应（batch，seq_len,dim），当batch_first为false时候对应的为（seq_len，batch，dim），这里就可以使用这个函数进行调整。

a = torch.rand(2, 3, 4)
print(a.shape)
a = a.permute(2, 1, 0)
print(a.shape)
output:
torch.Size([2, 3, 4])
torch.Size([4, 3, 2])

transpose

a=torch.randn(2,4,2)
c=a.transpose(1,2)
print(c.shape)

这里transpose只能传两个参数，也就是一次交换一下，而permute可以传多个，也就是进行多次交换，在实际应用中，只需要进行一次交换的用transpose，需要多次的就用permute；原因是速度快啊。

tensor的数值

在获取tensor的数值的时候，当只有一个元素时候使用 tensor.item()，否则使用tensor.tolist()；
在使用这两个方法的时候不需要考虑当前数据是再GPU还是CPU，的问题，而使用numpy()就需要了。

a = torch.randn(1,requires_grad=True,device='cuda')
print(a.item())
b=torch.randn(2,3,requires_grad=True,device='cuda')
print(b.tolist())
output:
1.2667691707611084
[[-0.7672975063323975, 0.1947876662015915, -0.5312148332595825], [-1.6979146003723145, -0.21547940373420715, -1.6267175674438477]]

tensor.numpy()

当requires_grad=True的时候需要加上detach()，这样子更安全。

x  = torch.rand([3,3], device='cuda')
x_ = x.cpu().numpy()

x= torch.rand([3,3], requires_grad=True, device='cuda')
x = x.cpu().detach().numpy()

tensor.data 和 tensor.detach()
tensor.data 会导致隐藏错误；而tensor.detach()会报错提示。这里采用参考文献的一个例子说明问题：

a = torch.tensor([7., 0, 0], requires_grad=True)
b = a + 2
print(b)
loss = torch.mean(b * b)
b_ = b.detach()
b_.zero_()
print(b)
# 修改 b_ , b 的值也变了
loss.backward()
output：
tensor([9., 2., 2.], grad_fn=<AddBackward0>)
tensor([0., 0., 0.], grad_fn=<AddBackward0>)
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation:

tensor.data

a = torch.tensor([7., 0, 0], requires_grad=True)
b = a + 2
print(b)
# tensor([9., 2., 2.], grad_fn=<AddBackward0>)
loss = torch.mean(b * b)
b_ = b.data
b_.zero_()
print(b)
# tensor([0., 0., 0.], grad_fn=<AddBackward0>)
loss.backward()
print(a.grad)
tensor([0., 0., 0.])
# 其实正确的结果(将上面的b_.zero_()频掉)应该是：
# tensor([6.0000, 1.3333, 1.3333])

torch.gt()，torch.lt()，torch.eq()，torch.ne()

下面这几个常用来计算mask。

torch.gt() 大于（大于为true，否则为false）

a=torch.randn(2,3)
print(a)
print(a.gt(0))
output:
tensor([[-0.5801, -1.4859, -0.7225],
        [ 0.2278, -1.3240, -1.7059]])
tensor([[False, False, False],
        [ True, False, False]])

torch.lt() 小于（小于为true，否则为false）

a=torch.randn(2,3)
print(a)
print(a.lt(0))
output：
tensor([[ 1.8523,  1.0978, -1.6345],
        [ 0.0851,  0.8104,  0.2026]])
tensor([[False, False,  True],
        [False, False, False]])

torch.eq()等于（等于为true，否则为false）

x = torch.arange(5)
print(x)
mask = torch.eq(x,3)   
print(mask)
print(x[mask])
output:
tensor([0, 1, 2, 3, 4])
tensor([False, False, False,  True, False])
tensor([3])

torch.ne() 非（等于为false，否则为true）

x = torch.Tensor([1,2,0,3,0])
mask = torch.ne(x,0)
print(mask)
print(x[mask])
output：
tensor([ True,  True, False,  True, False])
tensor([1., 2., 3.])

torch.masked_select()，torch.masked_filled

torch.masked_select：选择x的对应mask中true对应的下标。

x = torch.randn(2, 4)
print(x)
mask = x.ge(0.5)
print(mask)
print(torch.masked_select(x, mask))
output：
tensor([[-0.0840,  2.1119,  2.5315, -0.8200],
        [-1.5103,  0.7379, -0.2429,  0.0112]])
tensor([[False,  True,  True, False],
        [False,  True, False, False]])
tensor([2.1119, 2.5315, 0.7379])

torch.masked_filled：将a对应mask为true的位置替换为value。

a=torch.randn(1,4)
print(a)
b=a.masked_fill(mask = torch.BoolTensor([1,1,0,0]), value=np.inf)
print(b)
output：
tensor([[0.0173, 0.7796, 0.7994, 1.2002]])
tensor([[   inf,    inf, 0.7994, 1.2002]])

cuda和device

这里主要有两种不同的用法，一种是使用to指定设备另外一种是通过cuda来转换。第一种方法通过在config中判断设备之后可以直接调用，第二种就需要每次都判断一下。

device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
a = torch.rand([2,3]).to(device)

if torch.cuda.is_available():
    a = torch.rand([2,3]).cuda()

tensor.to() and model.to()这里简单举例说明两者的区别，官方文档的说法是tensor.to()不是一个 in_place操作；而model.to()是一个in_place操作（关于in_place操作我们可以通过测试内存地址是否一样来判断，地址一样就是in_place操作了）都在cuda上的时候，还是一样的。下面先来看tensor.to()。

a = torch.rand(10)
b = a.to(torch.device("cuda"))
print(b is a)
c = b.to(torch.device("cuda"))
print(c is b)
output：
False
True

对于model.to()如下：

model = torch.nn.Sequential(torch.nn.Linear(10, 10))
model_new = model.to(torch.device("cuda"))
print(model_new is model)
output：
True

requires_grad

这里想说的是关于模型的中参数的梯度问题，在网络结构中，经常会让一部分参数不更新，一部分更新，具体做法如下：

需要注意的是我们定义的tensor默认的required_grad=False;但是模型中的参数默认required_grad=True

    def __init__(self, emb_weights, vocab_size, dim, config):
        super(lstmnet, self).__init__()

        self.config = config

        # embedding and LSTM layers
        self.embedding = nn.Embedding.from_pretrained(embeddings=emb_weights, freeze=config.emb_freeze)

        self.lstm = nn.LSTM(input_size=config.input_size,
                            hidden_size=config.hidden_size,
                            num_layers=config.num_layers,
                            batch_first=config.batch_first,
                            bidirectional=config.bidir)
        for p in self.parameters():
            p.requires_grad=False

        # dropout layer
        self.dropout = nn.Dropout(config.dropout)

        if config.bidir:
            self.l1 = nn.Linear(config.hidden_size * 2, config.num_classes)

        else:
            self.l1 = nn.Linear(config.hidden_size, config.num_classes)

在上面的这段代码中通过对参数的遍历使得这个（p.requires_grad=False）代码之前的网络参数的requires_grad设置成false，之后的还是true。接着我们在使用的时候进行过滤：

    if config.emb_freeze:
        model_parameters = filter(lambda p: p.requires_grad, model.parameters())
    else:
        model_parameters = model.parameters()

    optimzier = torch.optim.Adam(model_parameters, lr=config.lr, weight_decay=config.weight_decay)

怎么知道是否成功呢？我们来test一下，test代码如下：

  for p in model.parameters():
        print(p.shape,p.requires_grad)
    print("-------------------------------")

    if config.emb_freeze:
        model_parameters = filter(lambda p: p.requires_grad, model.parameters())
    else:
        model_parameters = model.parameters()
    for p in model_parameters:
        print(p.shape)
    exit()
output:
torch.Size([64, 50]) False
torch.Size([400, 50]) False
torch.Size([400, 100]) False
torch.Size([400]) False
torch.Size([400]) False
torch.Size([400, 50]) False
torch.Size([400, 100]) False
torch.Size([400]) False
torch.Size([400]) False
torch.Size([2, 200]) True
torch.Size([2]) True
-------------------------------
torch.Size([2, 200])
torch.Size([2])

很明显还是有效果的啊。
另外，关于pytorch如何一步一步更新梯度，大家可以看一下: PyTorch 中的 tensor 及使用

pack_padded_sequence和pad_packed_sequence

pack_padded_sequence是对序列进行压缩，pad_packed_sequence是解压缩，当我们每次都传给LSTM一条数据时，也不用这么麻烦了，那样会使的效率很低，所以使用了batch，这样就会出现不等长的情况，我们这里的操作大致意思就是告诉LSTM每个batch执行多少的time_step。先看代码：

import torch.nn as nn
import torch
t=[]
l=[[1,3,4],[2,3,5],[5,4,3],[4,0,1]]
l1=[[1,3,4],[2,3,5],[5,4,3],[0,0,0]]
l2=[[1,3,4],[2,3,5],[0,0,0],[0,0,0]]
t.append(l)
t.append(l1)
t.append(l2)
yy=torch.FloatTensor(t)
print(yy)
k=[4,3,2]
x_packed = nn.utils.rnn.pack_padded_sequence(input=yy, lengths=k, batch_first=True)
print(x_packed)
lstm=nn.LSTM(input_size=3,hidden_size=10,num_layers=1,batch_first=True)
out,_=lstm(x_packed)
print(out)
c,_=torch.nn.utils.rnn.pad_packed_sequence(out,batch_first=True)
print(c.size())
output：
tensor([[[1., 3., 4.],
         [2., 3., 5.],
         [5., 4., 3.],
         [4., 0., 1.]],

        [[1., 3., 4.],
         [2., 3., 5.],
         [5., 4., 3.],
         [0., 0., 0.]],

        [[1., 3., 4.],
         [2., 3., 5.],
         [0., 0., 0.],
         [0., 0., 0.]]])
PackedSequence(data=tensor([[1., 3., 4.],
        [1., 3., 4.],
        [1., 3., 4.],
        [2., 3., 5.],
        [2., 3., 5.],
        [2., 3., 5.],
        [5., 4., 3.],
        [5., 4., 3.],
        [4., 0., 1.]]), batch_sizes=tensor([3, 3, 2, 1]), sorted_indices=None, unsorted_indices=None)
-------
PackedSequence(data=tensor([[ 0.0076,  0.0838, -0.1635,  0.1357, -0.1519,  0.0391, -0.0421, -0.0023,
          0.0880, -0.0821],
        [ 0.0076,  0.0838, -0.1635,  0.1357, -0.1519,  0.0391, -0.0421, -0.0023,
          0.0880, -0.0821],
        [ 0.0076,  0.0838, -0.1635,  0.1357, -0.1519,  0.0391, -0.0421, -0.0023,
          0.0880, -0.0821],
        [ 0.0089,  0.0943, -0.2057,  0.2117, -0.3167,  0.0308,  0.0038,  0.0939,
          0.2045, -0.1033],
        [ 0.0089,  0.0943, -0.2057,  0.2117, -0.3167,  0.0308,  0.0038,  0.0939,
          0.2045, -0.1033],
        [ 0.0089,  0.0943, -0.2057,  0.2117, -0.3167,  0.0308,  0.0038,  0.0939,
          0.2045, -0.1033],
        [ 0.0143,  0.0322,  0.0555,  0.0909, -0.5819,  0.1404,  0.1093,  0.1695,
          0.4254, -0.1481],
        [ 0.0143,  0.0322,  0.0555,  0.0909, -0.5819,  0.1404,  0.1093,  0.1695,
          0.4254, -0.1481],
        [ 0.0508,  0.0634,  0.0852,  0.2010, -0.5225,  0.1044,  0.3157,  0.0988,
          0.3785, -0.2072]], grad_fn=<CatBackward>), batch_sizes=tensor([3, 3, 2, 1]), sorted_indices=None, unsorted_indices=None)
torch.Size([3, 4, 10])

上面代码中是生成一个 3 * 4 * 3的tensor 其中0是pad；压缩之后其实就是统计每个batch不为0的行数，仔细看一下3，3，2，1.就是这么来的，也就是把pad去掉了，也就相当于告诉LSTM需要执行多少timestep。

torch.gather

这个东西乍一看我也没整明白，后来看来官网的文档。其实只需要严格的套公式就可以了（注意输出的大小是和index 的大小是一致的）。

For a 3-D tensor the output is specified by::
out[i][j][k] = input[index[i][j][k]][j][k] # if dim == 0
out[i][j][k] = input[i][index[i][j][k]][k] # if dim == 1
out[i][j][k] = input[i][j][index[i][j][k]] # if dim == 2

import torch
b = torch.Tensor([[1,2,3],[4,5,6]])
print (b)
index_1 = torch.LongTensor([[0,1],[2,0]])
index_2 = torch.LongTensor([[0,1,1],[0,0,0]])
print (torch.gather(b, dim=1, index=index_1))
print (torch.gather(b, dim=0, index=index_2))
output:
tensor([[1., 2., 3.],
        [4., 5., 6.]])
tensor([[1., 2.],
        [6., 4.]])
tensor([[1., 5., 6.],
        [1., 2., 3.]])

严格套公式，就可以得到上面的结果。
这里推荐一个博客讲解 pytorch实现seq2seq时如何对loss进行mask 讲解的很好，看完这个应该会更好的理解这个算子的使用。

torch.mm,torch.bmm,torch.matmul

torch.mm
*从下面的例子可以看出来就是简单的矩阵相乘。

l=[[1,2],[3,3]]
l1=[[2,2],[2,2],[1,1]]
a=torch.tensor(l1)
b=torch.tensor(l)
print(a)
print(b)
c=torch.mm(a,b)
print(c)
output:
tensor([[2, 2],
        [2, 2],
        [1, 1]])
tensor([[1, 2],
        [3, 3]])
tensor([[ 8, 10],
        [ 8, 10],
        [ 4,  5]])

torch.bmm
和mm类似只是变成了多个矩阵相乘，也就是batch-mm

a = torch.tensor([[[2., 3.], [1., 2.]], [[3., 4.], [0., 5.]]])
b = torch.tensor([[[3.], [1.]], [[2.], [4.]]])
print(a)
print(b)
out = torch.bmm(a, b)
print(out)
output:
tensor([[[2., 3.],
         [1., 2.]],

        [[3., 4.],
         [0., 5.]]])
tensor([[[3.],
         [1.]],

        [[2.],
         [4.]]])
tensor([[[ 9.],
         [ 5.]],

        [[22.],
         [20.]]])

torch.matmul

import torch
l=[[1,9],[2,2]]
l1=[1,3]
a=torch.tensor(l1)
b=torch.tensor(l)
print(a)
print(b)
c=torch.matmul(a,b)
print(c)
output:
tensor([1, 3])
tensor([[1, 9],
        [2, 2]])
tensor([ 7, 15])

这里是1 * 1+3 * 2=7 1 * 9+3 * 2=15

反过来第一个是2维，第二个是1维。

import torch
l=[[1,9],[2,2]]
l1=[1,3]
a=torch.tensor(l)
b=torch.tensor(l1)
print(a)
print(b)
c=torch.matmul(a,b)
print(c)
output：
tensor([[1, 9],
        [2, 2]])
tensor([1, 3])
tensor([28,  8])