torch.nn用法

qq_38196982

已于 2023-09-12 19:33:08 修改

阅读量274

点赞数

文章标签：深度学习 python 机器学习

于 2023-09-07 15:12:42 首次发布

本文链接：https://blog.csdn.net/qq_38196982/article/details/132719048

版权

1.nn.embedding

nn.Embedding 是PyTorch中的一个嵌入层（Embedding Layer），通常用于将离散的整数索引映射为连续的实值向量。

函数原型为

torch.nn.Embedding(num_embeddings, embedding_dim, padding_idx=None,
 max_norm=None, norm_type=2.0, scale_grad_by_freq=False, 
sparse=False, _weight=None, _freeze=False, device=None, dtype=None)

torch.nn.Embedding 接收两个参数：

num_embeddings：表示词汇表的大小，即不同单词的总数。
embedding_dim：表示每个单词的嵌入维度。

嵌入层内部的操作可以概括为：

创建 nn.Embedding 实例时，它会初始化一个随机的权重矩阵（通常在训练过程中会学习这些权重）。
当将整数索引传递给嵌入层时，嵌入层会查找权重矩阵的对应行，并返回该行作为嵌入向量。

示例1：

import torch
import torch.nn as nn

# 假设词汇表大小为5，词嵌入维度为3
vocab_size = 5
embedding_dim = 3

# 创建嵌入层
embedding_layer = nn.Embedding(vocab_size, embedding_dim)
print(f'嵌入层权重矩阵:\n{embedding_layer.weight}')
# 输入整数标记序列
input_ids = torch.LongTensor([1, 2, 3, 0, 4])
print(f'输入tensor为:\n{input_ids}')
# 获取词嵌入向量
embeds = embedding_layer(input_ids)
print(f'结果tensor为:\n{embeds}')

结果为

import torch
import torch.nn as nn

# 假设词汇表大小为5，词嵌入维度为3
vocab_size = 5
embedding_dim = 3

# 创建嵌入层
embedding_layer = nn.Embedding(vocab_size, embedding_dim)
print(f'嵌入层权重矩阵:\n{embedding_layer.weight}')
# 输入整数标记序列
input_ids = torch.LongTensor([1, 2, 3, 0, 4])
print(f'输入tensor为:\n{input_ids}')
# 获取词嵌入向量
embeds = embedding_layer(input_ids)
print(f'结果tensor为:\n{embeds}')

2.nn.LayerNorm

LayerNorm(normalized_shape, eps = 1e-5, elementwise_affine = True, device=None, dtype=None)

nn.LayerNorm 接收3个参数：

normalized_shape：要实行标准化的最后 D 个维度，可以是一个 int 整数（必须等于tensor的最后一个维度的大小，不能是中间维度的大小），使用tensor.randn(3,4)的话此时这个整数必须为 normalized_shape=4，代表标准化 tensor 的最后一维。另外也可以是一个列表，但这个列表也必须是最后的 D 个维度的列表，如示例 tensor 的话就必须是 normalized_shape=[3, 4] 。
eps：一个小的常数，用于防止除以零的情况。默认是1e-5
elementwise_affine：是否需要仿射变换。仿射变换需要两个可学习参数 γ 和 β：把标准化的结果乘以缩放系数 γ 再加上偏置系数 β。仿射变换是为了保证非线性的获得。

nn.LayerNorm 的作用主要有两个方面：

归一化：对于输入张量中的每个特征维度，nn.LayerNorm 将其均值归一化为 0，方差归一化为 1。这有助于减少不同特征之间的尺度差异，提高模型训练的稳定性。
正则化：层归一化可以被看作是一种正则化方法，它在训练期间引入了一些噪声，有助于模型泛化到不同的输入数据。

在深度神经网络中，层归一化通常应用于神经网络的每个层的输出，而不是整个网络的输入。这有助于防止梯度爆炸和梯度消失问题，并提高了模型的训练速度和稳定性。

示例1：

import torch
import torch.nn as nn

# 假设输入数据形状为 (batch_size, input_dim)
batch_size = 4
input_dim = 3

# 创建层归一化层
layer_norm = nn.LayerNorm(input_dim)

# 输入数据张量
input_data = torch.randn(batch_size, input_dim)

# 对输入数据进行层归一化
output_data = layer_norm(input_data)

print(output_data)

结果为：

tensor([[-0.2996,  1.3467, -1.0471],
        [ 0.5380, -1.4016,  0.8636],
        [ 0.2907, -1.3439,  1.0532],
        [ 0.8922, -1.3964,  0.5042]], grad_fn=<NativeLayerNormBackward0>)

示例2：

3.nn.Softmax

nn.Softmax 接收一个参数：

dim：表示应用 softmax 操作的维度。

注意，Softmax 操作是逐元素进行的，因此输出的所有值都位于 0 到 1 之间，并且它们的总和将等于 1

softmax 函数的作用是将输入向量中的每个元素转化为一个概率值，这些概率值可以表示模型对于多个类别的预测概率。在多分类问题中，通常使用 softmax 函数将模型的原始分数转化为类别概率，然后选择具有最高概率的类别作为模型的预测结果。

示例1：

import torch
import torch.nn as nn

# 假设有一个未归一化的概率分布向量
logits = torch.tensor([2.0, 1.0, 0.1])
print(logits)
# 创建 softmax 层
softmax_layer = nn.Softmax(dim=0)

# 计算 softmax
probabilities = softmax_layer(logits)

print(probabilities)

结果为：

tensor([2.0000, 1.0000, 0.1000])
tensor([0.6590, 0.2424, 0.0986])

示例2：

import torch
import torch.nn as nn

# 假设有一个未归一化的概率分布向量
logits = torch.tensor([[2.0, 1.0, 0.1]])
print(logits)
print('--------------------------')
# 创建 softmax 层
softmax_layer = nn.Softmax(0)
# 计算 softmax
probabilities = softmax_layer(logits)
print(probabilities)
print('--------------------------')
# 创建 softmax 层
softmax_layer = nn.Softmax(1)
# 计算 softmax
probabilities = softmax_layer(logits)
print(probabilities)

结果为

tensor([[2.0000, 1.0000, 0.1000]])
--------------------------
tensor([[1., 1., 1.]])
--------------------------
tensor([[0.6590, 0.2424, 0.0986]])

示例3：

import torch
import torch.nn as nn

# 假设有一个未归一化的概率分布向量
logits = torch.randn(2,3)
print(logits)
# 创建 softmax 层
softmax_layer = nn.Softmax(dim=0)
# 计算 softmax
probabilities = softmax_layer(logits)
print(probabilities)
print('-------------------------------------')
# 创建 softmax 层
softmax_layer = nn.Softmax(dim=1)
# 计算 softmax
probabilities = softmax_layer(logits)
print(probabilities)

结果为

tensor([[-0.4286, -0.1605, -0.4445],
        [ 0.0444,  0.2785, -0.3434]])
tensor([[0.3839, 0.3920, 0.4748],
        [0.6161, 0.6080, 0.5252]])
-------------------------------------
tensor([[0.3038, 0.3972, 0.2990],
        [0.3399, 0.4295, 0.2306]])

示例4：

import torch
import torch.nn as nn

# 假设有一个未归一化的概率分布向量
logits = torch.randn(2,2,3)
print(logits)
# 创建 softmax 层
softmax_layer = nn.Softmax(dim=0)
# 计算 softmax
probabilities = softmax_layer(logits)
print(probabilities)
print('-------------------------------------')
# 创建 softmax 层
softmax_layer = nn.Softmax(dim=1)
# 计算 softmax
probabilities = softmax_layer(logits)
print(probabilities)
print('-------------------------------------')
# 创建 softmax 层
softmax_layer = nn.Softmax(dim=2)
# 计算 softmax
probabilities = softmax_layer(logits)
print(probabilities)

结果为

tensor([[[-0.3569,  1.3882,  1.4940],
         [ 0.3452,  0.5435,  0.7890]],

        [[ 0.5194, -2.1059,  1.7671],
         [ 0.6018, -0.0840,  1.6758]]])
tensor([[[0.2939, 0.9705, 0.4321],
         [0.4362, 0.6519, 0.2918]], #0.2959+0.7061总和为1

        [[0.7061, 0.0295, 0.5679],
         [0.5638, 0.3481, 0.7082]]])
-------------------------------------
tensor([[[0.3313, 0.6995, 0.6693], #矩阵的每一列的值总和为1
         [0.6687, 0.3005, 0.3307]],

        [[0.4794, 0.1169, 0.5228],
         [0.5206, 0.8831, 0.4772]]])
-------------------------------------
tensor([[[0.0764, 0.4374, 0.4862],
         [0.2647, 0.3227, 0.4126]],#矩阵的每一行的值总和为1

        [[0.2195, 0.0159, 0.7646],
         [0.2257, 0.1137, 0.6606]]])

对于三维张量，放在XYZ空间里就好理解了，012分别对应在XYZ方向上进行softmax

对于二维张量，方向XY空间就好理解，01分别对应XY方向上进行softmax

对于1维张量，方向X空间，0对应X方向进行softmax

softmax(-1) 表示在张量的最后一个维度上应用 softmax 操作，通常用于多类别分类问题中将分数转换为概率分布。如下：表示在 scores 张量的最后一个维度上应用 softmax 操作，并将结果保存到名为 attn 的新张量中

attn = nn.Softmax(dim=-1)(scores)

4.nn.Linear

nn.Linear 是 PyTorch 中的一个线性层（全连接层）的类，用于实现线性变换，也就是将输入张量乘以权重矩阵并加上偏置向量。线性层通常用于神经网络的隐藏层和输出层。

参数为：

in_features:int,输入样本的大小
out_features:int,输出样本的大小
bias:偏置，默认为true。如果设置为false,该层不会学习偏置

在 PyTorch 中，默认情况下，nn.Linear 层会自动创建一个可学习的偏置向量（bias vector）。这意味着该偏置向量会在训练过程中进行学习，以便更好地拟合训练数据。

nn.Linear 层通常用于对二维输入进行线性变换，而不是三维输入。在 PyTorch 中，nn.Linear 的输入应该是一个形状为 (batch_size, input_features) 的二维张量，其中 batch_size 是批量大小，而 input_features 是输入特征的数量。

如果您有一个三维张量，并希望对其进行线性变换，您需要首先将其重塑为一个二维张量，然后将其传递给 nn.Linear 层。例如，如果您有一个形状为 (batch_size, seq_length, input_features) 的三维张量 x，您可以将其重塑为 (batch_size * seq_length, input_features) 的二维张量，然后将其传递给线性层。

对于输入张量 x，权重矩阵为 W，偏置向量为 b，输出为 y，线性变换的数学表示如下：

y = x * W^T + b

其中，x 的形状为 (batch_size, input_features)，W 的形状为 (output_features, input_features)，b 的形状为 (output_features,)。

具体步骤如下：

x 与权重矩阵 W 进行矩阵乘法操作，得到中间结果 x * W^T，其中 ^T 表示 W 的转置。
对中间结果 x * W^T 加上偏置向量 b。
得到最终的输出 y，其形状为 (batch_size, output_features)。

这个线性变换的过程允许神经网络学习从输入到输出的映射，通过调整权重矩阵 W 和偏置向量 b 的值，网络可以逐渐优化其性能以适应特定的任务。这是深度学习中非常常见的操作，通常在神经网络的每一层中都包含线性变换。

以下是 nn.Linear 的基本用法和示例：

import torch
import torch.nn as nn

input_features = 4
output_features = 3
batch_size = 2
# 创建一个线性层，输入特征数为input_features，输出特征数为output_features
linear_layer = nn.Linear(input_features, output_features)
print(linear_layer)
# 定义输入张量（batch_size, input_features）
input_tensor = torch.randn(batch_size, input_features)
print(input_tensor)
# 将输入张量传递给线性层进行线性变换
output_tensor = linear_layer(input_tensor)
print(output_tensor)

结果为：

Linear(in_features=4, out_features=3, bias=True)
tensor([[ 0.0384, -1.1405,  0.4450, -0.0286],
        [-2.3155, -1.7393, -0.3294, -0.5488]])
tensor([[-0.2443, -0.0767, -0.6685],
        [ 0.2916,  0.1578, -1.8904]], grad_fn=<AddmmBackward0>)

5.nn.Parameter

nn.Parameter 是 PyTorch 中的一个类，它实际上是 nn.Parameter(data, requires_grad=True) 的一个实例，用于将张量标记为模型参数。在神经网络中，模型参数是需要在训练过程中进行优化的张量。使用 nn.Parameter 可以将张量包装成模型参数，并自动设置 requires_grad 属性以便梯度计算。

将张量包装为模型参数的目的是为了告诉PyTorch，这些张量是模型需要学习和优化的参数

示例1：

import torch
import torch.nn as nn

# 创建一个普通的张量并将其包装为模型参数
data = torch.randn(3, 4)
param = nn.Parameter(data)

# 将模型参数添加到模型的参数列表中
class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.param = nn.Parameter(torch.randn(5, 5))

model = MyModel()
print(model.parameters())  # 打印模型的参数列表

# 使用模型参数进行计算
output = param * 2
print(output)

# 模型参数的梯度将在反向传播时自动计算
loss = output.sum()
loss.backward()

# 获取模型参数的梯度
print(param.grad)

输出为

<generator object Module.parameters at 0x7b63406d5cb0>
tensor([[ 0.7516, -0.0416, -1.5190,  4.5334],
        [-4.1685,  0.6480,  2.8705, -2.3701],
        [ 3.0461,  1.4910, -1.3530, -0.7487]], grad_fn=<MulBackward0>)
tensor([[2., 2., 2., 2.],
        [2., 2., 2., 2.],
        [2., 2., 2., 2.]])

6.nn.Sequential

Sequential类将多个层串联在一起。当给定输入数据时，Sequential实例将数据传入到第一层，然后将第一层的输出作为第二层的输入，以此类推。必须确保前一个模块的输出大小和下一个模块的输入大小是一致的

# nn是神经网络的缩写
from torch import nn

net = nn.Sequential(nn.Linear(2, 1))

#初始化模型参数
net[0].weight.data.normal_(0, 0.01)  #通过data访问数据，使用normal初始化权重矩阵
net[0].bias.data.fill_(0)

同时可以使用OrderedDict对每一层进行命名

7.nn.ModuleList

激活函数

1.nn.tanh

nn.Tanh 是PyTorch中的双曲正切（tanh）激活函数的类。双曲正切函数是一种常用的非线性激活函数，通常用于神经网络中的隐藏层。它的数学表达式如下：

对于输入张量中的每个元素 x，nn.Tanh 计算上述公式并将结果作为输出。这意味着它将每个元素映射到范围在 -1 到 1 之间的值，而且是逐个元素的操作，不会改变张量的形状。这是典型的逐元素激活函数的操作方式，用于引入非线性性质，使神经网络能够捕捉复杂的模式和特征

示例1：

import torch
import torch.nn as nn

# 创建一个输入张量
x = torch.randn(5, 5)  # 5x5的随机张量

# 创建一个Tanh层并应用于输入张量
tanh_layer = nn.Tanh()
output = tanh_layer(x)

# 打印输出结果
print(output)

结果为：

tensor([[-0.8425,  0.8059,  0.9092,  0.9890, -0.7845],
        [-0.3596, -0.4170,  0.1090,  0.3055,  0.4469],
        [-0.7517,  0.4888, -0.7224, -0.3297,  0.2034],
        [-0.2453,  0.8468,  0.6851, -0.5300, -0.8716],
        [-0.6683, -0.8798,  0.4777,  0.0264, -0.5431]])

在上面的示例中，我们首先创建了一个输入张量 x，然后创建了一个 nn.Tanh() 层并将其应用于输入张量 x。最后，我们打印出了激活后的输出结果。

nn.Tanh 的作用是将输入张量的每个元素映射到范围在 -1 到 1 之间的值。它对于缓解梯度消失问题和训练深度神经网络非常有用，因此经常在隐藏层中使用。