pytorch知识整理

℡听风ヾ

已于 2023-03-09 11:16:29 修改

阅读量332

点赞数

文章标签： pytorch 深度学习人工智能

于 2022-11-10 15:29:41 首次发布

本文链接：https://blog.csdn.net/weixin_44038243/article/details/127787463

版权

1.torch.device()

torch.device()主要作用是：在训练时指定使用GPU训练还是CPU训练。

使用方法：

# cuda:0  代表第几块GPU
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

2.nn.ModuleList()

nn.ModuleList()主要作用:我们可以把任意 nn.Module 的子类加到这个 list 里面，方法同 Python 自带的 list 一样，例如说extend，append操作。但不同是，加入到 nn.ModuleList 里面的 module 是会注册到整个网络上的，所有 nn.ModuleList 内部的 nn.Module 的 parameter 也被添加作为我们的网络的parameter。我们构建两个全连接层做一下测试：我们可以看到权重 (weithgs) 和偏置 (bias) 都在这个网络之内。

import torch.nn as nn

class test1_net(nn.Module):
    def __init__(self):
        super(test1_net, self).__init__()
        self.linears = nn.ModuleList([nn.Linear(5,5) for i in range(2)])
    def forward(self, x):
        for m in self.linears:
            x = m(x)
        return x

net1 = test1_net()
print(net1)
'''
test1_net(
  (linears): ModuleList(
    (0): Linear(in_features=5, out_features=5, bias=True)
    (1): Linear(in_features=5, out_features=5, bias=True)
  )
)
'''

for param in net1.parameters():
    print(type(param.data), param.size())
'''
<class 'torch.Tensor'> torch.Size([5, 5])
<class 'torch.Tensor'> torch.Size([5])
<class 'torch.Tensor'> torch.Size([5, 5])
<class 'torch.Tensor'> torch.Size([5])
'''

更加详细参考链接：https://blog.csdn.net/byron123456sfsfsfa/article/details/89930990

3.nn.Linear()

原理：全连接层的主要作用就是将前层（卷积、池化等层）计算得到的特征空间映射样本标记空间。简单的说就是将特征表示整合成一个值，其优点在于减少特征位置对于分类结果的影响，提高了整个网络的鲁棒性，鲁棒性即在一定的参数摄动下，维持它某些性能的特性。nn.Linear表示的是线性变换，原型数学里学到的线性函数：y=kx+b

在深度学习中，变量都是多维张量，乘法就是矩阵乘法，加法就是矩阵加法，因此nn.Linear()运行的真正的计算就是：output = weight @ input + bias
例子：建立一个Linear对象，其输入特征维度为3，输出维度为4。数据随机生成,weight和bias可以查看数据，weight的维度为4行3列，bias的维度是一维的，为4. 可以方便做广播操作

import torch.nn as nn
linear1 = nn.Linear(3, 4, bias=True)
print(linear1.weight)
'''
Parameter containing:
tensor([[ 0.0052, -0.0102, -0.5492],
        [ 0.0983,  0.1518,  0.2995],
        [ 0.5446,  0.2422, -0.1529],
        [ 0.0202, -0.4722, -0.1320]], requires_grad=True)
'''
print(linear1.bias)
'''
Parameter containing:
tensor([-0.4863,  0.1122,  0.1602,  0.3065], requires_grad=True)
'''
# 计算
import torch
x = torch.randn((2,3))
y1 = linear1(x)
print(y1.shape)
print(y1)
'''
torch.Size([2, 4])
tensor([[ 0.1484,  0.0809, -0.0806, -0.5291],
        [ 1.6042, -0.7190, -0.8004,  1.0664]], grad_fn=<AddmmBackward0>)
'''

更加详细链接：Pytorch nn.Linear的基本用法_iioSnail的博客-CSDN博客

4.nn.Parameter()

作用：Parameter：参数。在做神经网络的训练时，其实就是训练一个模型，这个模型就是去学习一个函数，这个函数可以准确的学习到我们想要到的东西，比如正确的对物体进行分类。函数的输入就是模型的输入：一张图像，得到的模型输出就是一个预测值。在我看的一篇论文Meshed-Memory Transformer for Image Captioning当中，是这样解释的当我们处理图像时，区域之间的关系建立先验知识的模型。例如，给定一个编码人的区域和一个编码篮球的区域，在没有任何先验知识的情况下很难推断球员或比赛的概念。同样，考虑到鸡蛋和烤面包的编码区域，使用关系的先验知识可以很容易地推断出图片描述了早餐的知识。而这个先验知识就是通过nn.parameter建立的。

论文中样例：

self.m_k = nn.Parameter(torch.FloatTensor(1, m, h * d_k)) # (1, 40, 8 * 64)
self.m_v = nn.Parameter(torch.FloatTensor(1, m, h * d_v)) # (1, 40, 8 * 64)

更加详细链接：Pytorch中nn.Parameter（）参数的使用_每天都想要出去玩鸭~的博客-CSDN博客

5.nn.init.xavier_uniform_()

作用：初始化，使用均匀分布用值填充输入张量，预防一些参数过大或过小的情况，再保证方差一样的情况下进行缩放，便于计算。

import torch.nn as nn
linear1 = nn.Linear(2, 3, bias=True)
print(linear1.weight)
xu = nn.init.xavier_uniform_(linear1.weight)
print(linear1)
print(xu)
'''Parameter containing:
tensor([[-0.1114, -0.3342],
        [ 0.0629,  0.3060],
        [-0.4445, -0.4200]], requires_grad=True)
Linear(in_features=2, out_features=3, bias=True)
Parameter containing:
tensor([[-0.7165, -0.0940],
        [ 0.0299,  0.4402],
        [ 0.4945, -0.3356]], requires_grad=True)'''

6.nn.init.normal_()、nn.init.constant_()

作用：初始化，nn.init.normal_ : 按正态分布对tensor随机赋值。nn.init.constant_ : 使用常数val对tensor赋值。用法同上。

初始化的一些用法Pytorch nn.init 参数初始化方法_Vic_Hao的博客-CSDN博客_nn.init.calculate_gain

7.nn.Dropout()

作用：为了防止或减轻过拟合，一般用在全连接层Dropout就是在不同的训练过程中随机扔掉一部分神经元。也就是让某个神经元的激活值以一定的概率p，让其停止工作，训练过程中不更新权值，也不参加神经网络的计算。但是它的权重得保留下来（只是暂时不更新而已），因为下次样本输入时它可能又得工作了。

用法：

import torch
x = torch.randn(3, 4)
dropout = nn.Dropout(p=0.2) 
x_drop = dropout(x)
print(x)
print(x_drop)
'''tensor([[-1.5766, -0.5238,  0.5143, -0.0836],
        [-3.1830, -0.7626,  0.5806, -0.0129],
        [-0.7500, -0.3818,  1.5403,  0.4114]])
tensor([[-1.9707, -0.6548,  0.6429, -0.0000],
        [-3.9787, -0.0000,  0.7257, -0.0162],
        [-0.0000, -0.4773,  1.9254,  0.5142]])'''

更加详细链接：pytorch中nn.Dropout的使用技巧_木盏的博客-CSDN博客_nn.dropout pytorch

8.nn.LayerNorm()

原理：LayerNorm是归一化的一种方法，与BatchNorm不同的是它是对每单个batch进行的归一化，而BatchNorm(BN)是对所有Batch一起进行归一化。

注意：我们的输入是(1, 3, 4, 4)，如果要完成LayerNorm(LN)，我们只需要提供一个参数，即norm = nn.LayerNorm(3)，但是如果只提供一个参数，默认为对最后一维进行归一化，所以我们需要将输入进行变化，即变为(1, 4, 4, 3)。样例如下

import torch
import torch.nn as nn
x = torch.randn([1, 3, 4, 4])
x = x.permute(0,2,3,1)
ln = nn.LayerNorm(3)
print(ln(x).permute(0,3,1,2))
'''tensor([[[[-0.1889, -0.5626,  1.0711,  0.9577],
          [ 1.2075,  1.0102, -0.2517, -1.0269],
          [-1.1925, -0.9310,  1.4074,  0.9917],
          [ 0.1890,  0.2050, -0.8575,  0.2097]],

         [[ 1.3082, -0.8423, -1.3351,  0.4223],
          [-1.2413, -1.3622, -1.0794, -0.3287],
          [-0.0621, -0.4564, -0.8236, -1.3690],
          [ 1.1192,  1.1093,  1.4026, -1.3160]],

         [[-1.1193,  1.4049,  0.2640, -1.3800],
          [ 0.0339,  0.3520,  1.3310,  1.3555],
          [ 1.2546,  1.3874, -0.5839,  0.3772],
          [-1.3083, -1.3143, -0.5451,  1.1063]]]], grad_fn=<PermuteBackward0>)'''

详细链接如下：nn.LayerNorm的实现及原理_harry_tea的博客-CSDN博客_layernorm

9.nn.Embedding()

原理：nn.Embedding()主要适用在NLP中进行语言编码，我们在处理语言数据时，通常会把单词进行一个编号比如：a:1，而我们就会对这些单词进行一个编码，通常会使用这种方式，这里填一下one-hot编码，我们并不会使用这种，因为这种编码方式会使得单词编码后过于稀疏，很难构成之间的语义关系，因此我们会使用enbedding这种方式。

注意：这里参数10，为编码的单词共有多少个，单词只有在编号里才能进行编码，每一行都是进行单词编号后的一个稠密矩阵形式，可以看到编号为2的编码是相同的。代码如下：

import torch
import torch.nn as nn
embedding = nn.Embedding(10,4)
x = torch.LongTensor([[1,2,3],
                     [4,2,6]])
print(embedding(x))
'''tensor([[[-0.9006,  0.7501,  0.3422, -1.6497],
         [-0.3167,  0.7401, -1.4557,  2.2191],
         [ 0.2161,  0.2541, -0.3091,  1.0904]],

        [[-0.4814,  0.6302, -0.8751,  0.2819],
         [-0.3167,  0.7401, -1.4557,  2.2191],
         [-1.2459, -0.3328, -0.9408,  1.8342]]], grad_fn=<EmbeddingBackward0>)'''

更加详细链接：通俗讲解pytorch中nn.Embedding原理及使用_taoqick的博客-CSDN博客

10. nn.Embedding.from_pretrained()

原理：加载预训练好的词向量，我们在进行具体NLP任务时，一般通过对应的上面说得Embedding层做词向量的处理，再拿词向量去进行下游的处理，比如分类，但我们可以使用预训练好的词向量，会带来更优的性能。

样例如下：

import torch
import torch.nn as nn
weight = torch.FloatTensor([[1,2.3,3],
                     [4.1,5.3,6.2]])
embedding = nn.Embedding.from_pretrained(weight)
input = torch.LongTensor([0])
print(embedding(input))
# tensor([[1.0000, 2.3000, 3.0000]])

11.torch基础知识汇总

（1）torch.arange()

原理：输出一个Tensor类型的张量

用法：下面三个参数是开始，结尾以及步长参数，左闭右开形式。

import torch
torch.arange(1,5,2)
tensor([1, 3])
torch.arange(1,5,0.5)
tensor([1.0000, 1.5000, 2.0000, 2.5000, 3.0000, 3.5000, 4.0000, 4.5000])

更加详细链接：Torch.arange函数详解__湘江夜话_的博客-CSDN博客_torch.arange()

（2）torch.sin()、torch.cos()

原理：对张量进行正弦余弦计算。

用法：

input = torch.arange(10)
input
tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
input = input.view(1,-1)
input
tensor([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
sin = torch.sin(input)
sin
tensor([[ 0.0000,  0.8415,  0.9093,  0.1411, -0.7568, -0.9589, -0.2794,  0.6570,
          0.9894,  0.4121]])
cos = torch.cos(input)
cos
tensor([[ 1.0000,  0.5403, -0.4161, -0.9900, -0.6536,  0.2837,  0.9602,  0.7539,
         -0.1455, -0.9111]])

（3）torch.zeros()

原理：返回一个全为零的张量。

用法：

zeros = torch.zeros([3,4])
zeros
tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

（4）torch.sum()

原理：对输入的tensor数据的某一维度求和。

用法：

import torch
'''
１．torch.sum(input, dtype=None)
２．torch.sum(input, list: dim, bool: keepdim=False, dtype=None) → Tensor
　
input:输入一个tensor
dim:要求和的维度，可以是一个列表
keepdim:求和之后这个dim的元素个数为１，所以要被去掉，如果要保留这个维度，则应当keepdim=True
#If keepdim is True, the output tensor is of the same size as input except in the dimension(s) dim where it is of size 1. 
'''
sum = torch.ones(2,2,3)
s1 = torch.sum(sum,dim=0)
s2 = torch.sum(sum,dim=1)
s3 = torch.sum(sum,dim=-1)
print(sum)
print(s1)
print(s2)
print(s3)
'''
tensor([[[1., 1., 1.],
         [1., 1., 1.]],

        [[1., 1., 1.],
         [1., 1., 1.]]])
tensor([[2., 2., 2.],
        [2., 2., 2.]])
tensor([[2., 2., 2.],
        [2., 2., 2.]])
tensor([[3., 3.],
        [3., 3.]])'''

（5）torch.cat()

原理：将两个向量拼接在一起。

用法：

import torch
cat1 = torch.ones(2,3)
cat2 = torch.zeros(3,3)
cat3 = torch.randint(0,10,[2,4])
c1 = torch.cat((cat1,cat2),0) # 0按行拼接
c2 = torch.cat([cat1,cat3],1) # 1按列拼接
print(cat1)
print(cat2)
print(cat3)
print(c1)
print(c2)
'''tensor([[1., 1., 1.],
        [1., 1., 1.]])
tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])
tensor([[2, 5, 8, 9],
        [0, 5, 4, 3]])
tensor([[1., 1., 1.],
        [1., 1., 1.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])
tensor([[1., 1., 1., 2., 5., 8., 9.],
        [1., 1., 1., 0., 5., 4., 3.]])'''

（6）torch.matmul()

原理：两个张量矩阵相乘。

用法：

import torch
m1 = torch.ones(size=(2,3),dtype=torch.int64)
m2 = torch.randint(0,10,[3,4])
m = torch.matmul(m1,m2)
print(m1)
print(m2)
print(m)
'''
tensor([[1, 1, 1],
        [1, 1, 1]])
tensor([[7, 9, 3, 8],
        [6, 3, 8, 6],
        [6, 5, 3, 8]])
tensor([[19, 17, 14, 22],
        [19, 17, 14, 22]])
'''

（7）torch.softmax()

原理：非负性和归一化处理，把数据规整到0-1之间。

用法：

import torch
s1 = torch.randn([2,3])
s = torch.softmax(s1,-1)
print(s1)
print(s)
'''
tensor([[ 1.0598, -0.5518, -0.6113],
        [ 1.1284,  1.5232,  1.0603]])
tensor([[0.4828, 0.1115, 0.1582],
        [0.5172, 0.8884, 0.8418]])
'''

（8）expand()

原理：其将单个维度扩大成更大维度，从而返回一个新的tensor。

用法：

import torch
e1 = torch.rand([1,2,3])
ex = e1.expand(5,2,3)
print(e1)
print(ex)
'''
tensor([[[0.5190, 0.0167, 0.1989],
         [0.9708, 0.1636, 0.2861]]])
tensor([[[0.5190, 0.0167, 0.1989],
         [0.9708, 0.1636, 0.2861]],

        [[0.5190, 0.0167, 0.1989],
         [0.9708, 0.1636, 0.2861]],

        [[0.5190, 0.0167, 0.1989],
         [0.9708, 0.1636, 0.2861]],

        [[0.5190, 0.0167, 0.1989],
         [0.9708, 0.1636, 0.2861]],

        [[0.5190, 0.0167, 0.1989],
         [0.9708, 0.1636, 0.2861]]])
'''

（9）view()

原理：将张量转换为指定的形状，原始的数据不改变。

用法：

import torch
v1 = torch.rand([10,20,30])
v = v1.view(10,4,5,30)
print(v1.shape)
print(v.shape)
'''
torch.Size([10, 20, 30])
torch.Size([10, 4, 5, 30])'''

（10）permute()

原理：进行多个维度的交换或者可以成为维度重新排列。

用法：

import torch
v1 = torch.rand([10,20,30])
v = v1.view(10,4,5,30)
p = v.permute(0,2,1,3)
print(v1.shape)
print(v.shape)
print(p.shape)
'''
torch.Size([10, 20, 30])
torch.Size([10, 4, 5, 30])
torch.Size([10, 5, 4, 30])
'''

（11）contiguous()

原理：断开两个变量之间的依赖，类似于深拷贝。当调用contiguous()时，会强制拷贝一份tensor，让它的布局和原先创建的一模一样，但是两个tensor完全没有联系。

用法：

import torch
x = torch.randint(1,10,[2,3])
y = torch.transpose(x, 0, 1).contiguous()
print('修改前：')
print(x)
print(y)
y[0,0] = 20
print('修改后：')
print(x)
print(y)
'''
修改前：
tensor([[1, 1, 2],
        [6, 7, 9]])
tensor([[1, 6],
        [1, 7],
        [2, 9]])
修改后：
tensor([[1, 1, 2],
        [6, 7, 9]])
tensor([[20,  6],
        [ 1,  7],
        [ 2,  9]])
'''

（12）masked_fill()

原理：主要用在transformer的attention机制中，在时序任务中，主要是用来mask掉当前时刻后面时刻的序列信息。此时的mask主要实现时序上的mask。注意：mask与需要进行mask的数据张量可以进行广播机制才可以mask掉信息，可以多加尝试，就能理解，

UserWarning: masked_fill_ received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead. (Triggered internally at ..\aten\src\ATen\native\TensorAdvancedIndexing.cpp:1273.)
y = x.masked_fill(mask,-1e9)在这种警告下，只需要把mask改为mask.bool()即可。

用法：

import torch
x = torch.randn([2,3,3,3])
mask = torch.ByteTensor([[[[1,1,0]]],[[[1,0,1]]]])
print(mask.shape)
y = x.masked_fill(mask,-1e9)
print(x)
print(y)
'''
torch.Size([2, 1, 1, 3])
tensor([[[[ 1.3619, -0.4189, -0.1962],
          [-0.1894,  0.7659, -0.0566],
          [-0.0452, -0.1162,  2.3200]],

         [[ 0.4708, -0.1103, -0.3525],
          [ 1.3103, -1.0162, -0.5674],
          [-0.2876,  0.8353,  0.3060]],

         [[ 0.9054, -0.3159, -0.0910],
          [-0.4321, -1.0281,  1.1053],
          [ 1.3831, -0.6604,  0.0584]]],


        [[[-0.0300, -0.5480,  1.1722],
          [ 1.3205, -0.4150, -0.0274],
          [-1.3144, -1.1835, -0.5252]],

         [[-0.8591, -0.7441, -0.1815],
          [-1.4285, -1.1140, -1.6022],
          [ 0.2987,  1.0611,  0.6514]],

         [[-0.9407,  0.9054,  0.1344],
          [ 1.0765,  0.4661,  0.3070],
          [ 0.6302,  0.7232,  0.9769]]]])
tensor([[[[-1.0000e+09, -1.0000e+09, -1.9618e-01],
          [-1.0000e+09, -1.0000e+09, -5.6642e-02],
          [-1.0000e+09, -1.0000e+09,  2.3200e+00]],

         [[-1.0000e+09, -1.0000e+09, -3.5254e-01],
          [-1.0000e+09, -1.0000e+09, -5.6742e-01],
          [-1.0000e+09, -1.0000e+09,  3.0595e-01]],

         [[-1.0000e+09, -1.0000e+09, -9.1025e-02],
          [-1.0000e+09, -1.0000e+09,  1.1053e+00],
          [-1.0000e+09, -1.0000e+09,  5.8403e-02]]],


        [[[-1.0000e+09, -5.4799e-01, -1.0000e+09],
          [-1.0000e+09, -4.1496e-01, -1.0000e+09],
          [-1.0000e+09, -1.1835e+00, -1.0000e+09]],

         [[-1.0000e+09, -7.4409e-01, -1.0000e+09],
          [-1.0000e+09, -1.1140e+00, -1.0000e+09],
          [-1.0000e+09,  1.0611e+00, -1.0000e+09]],

         [[-1.0000e+09,  9.0543e-01, -1.0000e+09],
          [-1.0000e+09,  4.6610e-01, -1.0000e+09],
          [-1.0000e+09,  7.2317e-01, -1.0000e+09]]]])
'''

（13）unsqueeze()、squeeze()

原理：unsqueeze()添加维度，squeeze()则相反，里面参数则是为第几维添加维度。squeeze()智能为维度为1的矩阵压缩维度。里面不需要添加任何参数。

用法：

import torch
x = torch.randn([2,3])
x_un = x.unsqueeze(1).unsqueeze(1)
print(x_un.shape)
x = x.squeeze()
print(x.shape)
'''
torch.Size([2, 1, 1, 3])
torch.Size([2, 3])
'''

（14）torch.triu()

原理：返回矩阵上三角部分，其余部分全是0。第一个参数为矩阵形式，第二个参数为diagonal，默认为0。

如果diagonal为空或者0，输入矩阵保留主对角线与主对角线以上的元素；
如果diagonal为正数n，输入矩阵保留主对角线与主对角线以上除去n行的元素；
如果diagonal为负数-n，输入矩阵保留主对角线与主对角线以上与主对角线下方h行对角线的元素；

用法：

import torch
t = torch.ones([3,3])
t
tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])
torch.triu(t,1)
tensor([[0., 1., 1.],
        [0., 0., 1.],
        [0., 0., 0.]])
torch.triu(t,-1)
tensor([[1., 1., 1.],
        [1., 1., 1.],
        [0., 1., 1.]])
torch.triu(t,0)
tensor([[1., 1., 1.],
        [0., 1., 1.],
        [0., 0., 1.]])

（15）torch.mean()

原理：对张量Tensor取均值。

用法：

import torch
x = torch.randint(1,10,[2,2,3],dtype=torch.float64)
print(x)
mean = torch.mean(x) # 不指定维度，对所有元素做均值
print(mean)
mean = torch.mean(x,dim=1) # 指定维度做平均值
print(mean)
'''
tensor([[[9., 2., 7.],
         [7., 6., 9.]],

        [[6., 9., 5.],
         [8., 8., 7.]]], dtype=torch.float64)
tensor(6.9167, dtype=torch.float64)
tensor([[8.0000, 4.0000, 8.0000],
        [7.0000, 8.5000, 6.0000]], dtype=torch.float64)
'''

（16）torch.from_numpy()

原理：把数组转换成一个张量。

用法：

import numpy as np
import torch
x = np.random.randint(1,10,[3,4])
print(type(x))
y = torch.from_numpy(x)
print(type(y))
'''
<class 'numpy.ndarray'>
<class 'torch.Tensor'>
'''

持续更新中！！！