rnn pytorch_pytorch介绍rnn字符级文本生成

rnn pytorch

Today, we’ll continue our journey through the fascinating world of natural language processing (NLP) by introducing the operation and use of recurrent neural networks to generate text from a small initial text. This type of problem is known as language modeling and is used when we want to predict the next word or character in an input sequence of words or characters.

今天,我们将通过介绍递归神经网络的操作和使用以从较小的初始文本生成文本,继续在迷人的自然语言处理(NLP)世界中前进。 这种类型的问题称为语言建模,在我们要预测单词或字符的输入序列中的下一个单词或字符时使用。

But in language-modeling problems, the presence of words isn’t the only thing that’s important but also their order — i.e., when they’re presented in the text sequence. In other words, the context that surrounds each word becomes a fundamental piece to predict the next one.

但是在语言建模问题中,单词的存在不是唯一重要的,而是单词的顺序,即它们以文本顺序显示时。 换句话说,每个单词周围的上下文成为预测下一个单词的基础。

And in this scenario, the traditional NLP methods, based on frequencies and probabilities of the words, aren’t very effective because they’re based on the premise of the independence of the words from each other.

在这种情况下,基于单词的频率和概率的传统NLP方法不是很有效,因为它们基于单词彼此独立的前提。

Here is where RNN networks can become a fundamental tool because of their ability to remember the different parts of a series of inputs, which means they can take the previous parts of a sentence into account to interpret context.

在这里,RNN网络可以成为基本工具,因为它们能够记住一系列输入的不同部分,这意味着它们可以考虑句子的前部分来解释上下文。

RNN的简要说明 (Brief Description of RNN)

In summary, in a vanilla neural network, the output of a layer is a function or transformation of its input applying some learnable weights.

总之,在普通神经网络中,层的输出是应用一些可学习的权重的函数或输入的变换。

In contrast, in an RNN, not only the input is taken into account but also the context or previous state of the network itself. As we progress in the forward pass through the network, it builds a representation of its state that aims to collect information obtained in previous steps, which is called the hidden state.

相反,在RNN中,不仅要考虑输入,还要考虑网络本身的上下文或先前状态。 当我们通过网络前进时,它会建立其状态的表示形式,该状态旨在收集在先前步骤中获得的信息,称为隐藏状态。

Image for post
Stanford CS230 Deep Learning course 斯坦福CS230深度学习课程

Here, for each timestep t, we have an activation a<t> and an output y<t>. And we have one set of weights to transform the input to a hidden-layer representation, a second set of weights to bring information from the previous hidden state into the next timestep, and a third one to control how much information from the actual state is transmitted to the output.

在这里,对于每个时间步长t ,都有一个激活a <t>和一个输出y <t>。 我们有一组权重将输入转换为隐藏层表示,第二组权重将来自先前隐藏状态的信息带入下一时间步,第三组权重控制来自实际状态的信息量传输到输出。

Image for post
RNN operations by Stanford CS-230 Deep Learning course
斯坦福大学CS-230深度学习课程的RNN操作

Therefore, each element of the sequence that passes through the network contributes to the current state and the latter to the output. And both the input and the previous hidden state incorporate new information to update the value of the hidden state for an arbitrarily long sequence of observations. RNNs can remember previous entries, but this capacity is restricted in time or steps — it was one of the first challenges to solve with these networks.

因此,通过网络的序列中的每个元素都有助于当前状态,后者有助于输出。 输入和先前的隐藏状态都包含新信息,以针对任意长的观察序列更新隐藏状态的值。 RNN可以记住以前的条目,但是这种能力在时间或步骤上受到限制-这是使用这些网络解决的首要挑战之一。

“The longer the input series is, the more the network “forgets”. Irrelevant data is accumulated over time and it blocks out the relevant data needed for the network to make accurate predictions about the pattern of the text. This is referred to as the vanishing gradient problem.” — Wikipedia

“输入序列越长,网络“遗忘”就越多。 不相关的数据会随着时间的推移而累积,并且会阻塞网络对文本样式进行准确预测所需的相关数据。 这被称为消失梯度问题。” —维基百科

You can dive deeper into that problem at this link. This a common problem with very deep neural networks. In the field of NLP and RNN, to solve this problem some advanced architectures have been developed, like LSTM and GRUs.

您可以在此链接上更深入地研究该问题。 这是非常深的神经网络的常见问题。 在NLP和RNN领域,为了解决此问题,已经开发了一些高级架构,例如LSTM和GRU。

长短期记忆(LSTM) (Long Short-Term Memory (LSTM))

LSTM networks seek to preserve relevant information from much earlier steps, for which they contain multiple gates that control how much information to keep or delete from the input and the previous states:

LSTM网络试图从更早的步骤中保留相关信息,为此,它们包含多个门,这些门控制从输入和先前状态保留或删除多少信息:

Image for post
Savvas Varsamopoulos Savvas Varsamopoulos的论文

W is the recurrent connection between the previous hidden layer and the current hidden layer. U is the weight matrix that connects the inputs to the hidden layer, and C is a candidate hidden state that’s computed based on the current input and the previous hidden state. C is the internal memory of the unit.

W是前一个隐藏层和当前隐藏层之间的循环连接。 U是将输入连接到隐藏层的权重矩阵, C是根据当前输入和先前的隐藏状态计算出的候选隐藏状态。 C是单元的内部存储器。

  • Forget gate: How much information from the past should be considered now?

    忘记门:现在应该考虑多少过去的信息?
  • Input gate + cell gate: Should we add information to the state from the input and how much?

    输入门+单元门:我们应该从输入中向状态添加信息吗?
  • Output gate: How much information should we output from the previous state?

    输出门:我们应该从以前的状态输出多少信息?

“In a similar way, an LSTM works as follows:

“以类似的方式,LSTM的工作方式如下:

• It keeps track not just of short term memory, but also of long term memory

•它不仅可以跟踪短期记忆,还可以跟踪长期记忆

• In every step of the sequence, the long and short term memory in the step get merged

•在序列的每个步骤中,该步骤中的长期和短期记忆将合并

• From this, we get a new long term memory, short term memory, and prediction”

•由此,我们获得了新的长期记忆,短期记忆和预测”

Peter Foy, “An Introduction to Recurrent Neural N

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
使用PyTorch搭建RNN(循环神经网络)和CNN(卷积神经网络)的代码上有一些区别。 首先,在导入模块时,你需要引入不同的模块。对于RNN,你需要导入`torch.nn`模块中的`RNN`类,而对于CNN,你需要导入`torch.nn`模块中的`Conv2d`类。 其次,在构建模型时,你需要使用不同的层。对于RNN,你可以使用`RNN`类来定义循环层,并将其作为模型的一部分。对于CNN,你可以使用`Conv2d`类来定义卷积层,并将其添加到模型中。 另外,在模型的前向传播中,你需要根据不同的网络类型进行相应的操作。对于RNN,你需要使用`torch.nn.functional`模块中的函数(如`torch.nn.functional.relu`)来对数据进行处理,并将其传递给循环层。对于CNN,你需要使用`torch.nn.functional`模块中的函数(如`torch.nn.functional.conv2d`)来对数据进行卷积操作。 最后,在训练和优化过程中,你可以使用相同的方法来定义损失函数和优化器。 下面是一个简单示例,展示了如何使用PyTorch搭建RNN和CNN模型: ```python import torch import torch.nn as nn import torch.nn.functional as F # RNN模型 class RNNModel(nn.Module): def __init__(self, input_size, hidden_size, output_size): super(RNNModel, self).__init__() self.hidden_size = hidden_size self.rnn = nn.RNN(input_size, hidden_size) self.fc = nn.Linear(hidden_size, output_size) def forward(self, input): output, hidden = self.rnn(input) output = self.fc(output[-1]) return output # CNN模型 class CNNModel(nn.Module): def __init__(self, input_channels, output_size): super(CNNModel, self).__init__() self.conv1 = nn.Conv2d(input_channels, 16, kernel_size=3, stride=1, padding=1) self.conv2 = nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1) self.fc = nn.Linear(32 * 28 * 28, output_size) def forward(self, input): output = F.relu(self.conv1(input)) output = F.relu(self.conv2(output)) output = output.view(output.size(0), -1) output = self.fc(output) return output # 创建RNN模型 rnn_model = RNNModel(input_size, hidden_size, output_size) # 创建CNN模型 cnn_model = CNNModel(input_channels, output_size) # 定义损失函数和优化器 criterion = nn.CrossEntropyLoss() optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate) ``` 请注意,这只是一个简单的示例,实际使用时可能需要根据问题的具体要求进行适当的修改和调整。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值