rnn pytorch_pytorch介绍rnn字符级文本生成

最新推荐文章于 2024-05-21 17:43:49 发布

weixin_26712095

最新推荐文章于 2024-05-21 17:43:49 发布

阅读量1k

点赞数

文章标签： python java linux

原文链接：https://medium.com/better-programming/intro-to-rnn-character-level-text-generation-with-pytorch-db02d7e18d89

版权

rnn pytorch

Today, we’ll continue our journey through the fascinating world of natural language processing (NLP) by introducing the operation and use of recurrent neural networks to generate text from a small initial text. This type of problem is known as language modeling and is used when we want to predict the next word or character in an input sequence of words or characters.

今天，我们将通过介绍递归神经网络的操作和使用以从较小的初始文本生成文本，继续在迷人的自然语言处理(NLP)世界中前进。这种类型的问题称为语言建模，在我们要预测单词或字符的输入序列中的下一个单词或字符时使用。

But in language-modeling problems, the presence of words isn’t the only thing that’s important but also their order — i.e., when they’re presented in the text sequence. In other words, the context that surrounds each word becomes a fundamental piece to predict the next one.

但是在语言建模问题中，单词的存在不是唯一重要的，而是单词的顺序，即当它们以文本顺序显示时。换句话说，每个单词周围的上下文成为预测下一个单词的基础。

And in this scenario, the traditional NLP methods, based on frequencies and probabilities of the words, aren’t very effective because they’re based on the premise of the independence of the words from each other.

在这种情况下，基于单词的频率和概率的传统NLP方法不是很有效，因为它们基于单词彼此独立的前提。

Here is where RNN networks can become a fundamental tool because of their ability to remember the different parts of a series of inputs, which means they can take the previous parts of a sentence into account to interpret context.

在这里，RNN网络可以成为基本工具，因为它们能够记住一系列输入的不同部分，这意味着它们可以考虑句子的前部分来解释上下文。

RNN的简要说明 (Brief Description of RNN)

In summary, in a vanilla neural network, the output of a layer is a function or transformation of its input applying some learnable weights.

总之，在普通神经网络中，层的输出是应用一些可学习的权重的函数或输入的变换。

In contrast, in an RNN, not only the input is taken into account but also the context or previous state of the network itself. As we progress in the forward pass through the network, it builds a representation of its state that aims to collect information obtained in previous steps, which is called the hidden state.

相反，在RNN中，不仅要考虑输入，还要考虑网络本身的上下文或先前状态。当我们通过网络前进时，它会建立其状态的表示形式，该状态旨在收集在先前步骤中获得的信息，称为隐藏状态。

Image for post — Stanford CS230 Deep Learning course 斯坦福CS230深度学习课程

Here, for each timestep t, we have an activation a<t> and an output y<t>. And we have one set of weights to transform the input to a hidden-layer representation, a second set of weights to bring information from the previous hidden state into the next timestep, and a third one to control how much information from the actual state is transmitted to the output.

在这里，对于每个时间步长t ，都有一个激活a <t>和一个输出y <t>。 我们有一组权重将输入转换为隐藏层表示，第二组权重将来自先前隐藏状态的信息带入下一时间步，第三组权重控制来自实际状态的信息量传输到输出。

Therefore, each element of the sequence that passes through the network contributes to the current state and the latter to the output. And both the input and the previous hidden state incorporate new information to update the value of the hidden state for an arbitrarily long sequence of observations. RNNs can remember previous entries, but this capacity is restricted in time or steps — it was one of the first challenges to solve with these networks.

因此，通过网络的序列中的每个元素都有助于当前状态，后者有助于输出。输入和先前的隐藏状态都包含新信息，以针对任意长的观察序列更新隐藏状态的值。 RNN可以记住以前的条目，但是这种能力在时间或步骤上受到限制-这是使用这些网络解决的首要挑战之一。

“The longer the input series is, the more the network “forgets”. Irrelevant data is accumulated over time and it blocks out the relevant data needed for the network to make accurate predictions about the pattern of the text. This is referred to as the vanishing gradient problem.” — Wikipedia

“输入序列越长，网络“遗忘”就越多。不相关的数据会随着时间的推移而累积，并且会阻塞网络对文本样式进行准确预测所需的相关数据。这被称为消失梯度问题。” —维基百科

You can dive deeper into that problem at this link. This a common problem with very deep neural networks. In the field of NLP and RNN, to solve this problem some advanced architectures have been developed, like LSTM and GRUs.

您可以在此链接上更深入地研究该问题。这是非常深的神经网络的常见问题。在NLP和RNN领域，为了解决此问题，已经开发了一些高级架构，例如LSTM和GRU。

长短期记忆(LSTM) (Long Short-Term Memory (LSTM))

LSTM networks seek to preserve relevant information from much earlier steps, for which they contain multiple gates that control how much information to keep or delete from the input and the previous states:

LSTM网络试图从更早的步骤中保留相关信息，为此，它们包含多个门，这些门控制从输入和先前状态保留或删除多少信息：

W is the recurrent connection between the previous hidden layer and the current hidden layer. U is the weight matrix that connects the inputs to the hidden layer, and C is a candidate hidden state that’s computed based on the current input and the previous hidden state. C is the internal memory of the unit.

W是前一个隐藏层和当前隐藏层之间的循环连接。 U是将输入连接到隐藏层的权重矩阵， C是根据当前输入和先前的隐藏状态计算出的候选隐藏状态。 C是单元的内部存储器。

Forget gate: How much information from the past should be considered now?
忘记门：现在应该考虑多少过去的信息？
Input gate + cell gate: Should we add information to the state from the input and how much?
输入门+单元门：我们应该从输入中向状态添加信息吗？
Output gate: How much information should we output from the previous state?
输出门：我们应该从以前的状态输出多少信息？

“In a similar way, an LSTM works as follows:

“以类似的方式，LSTM的工作方式如下：

• It keeps track not just of short term memory, but also of long term memory

•它不仅可以跟踪短期记忆，还可以跟踪长期记忆

• In every step of the sequence, the long and short term memory in the step get merged

•在序列的每个步骤中，该步骤中的长期和短期记忆将合并

• From this, we get a new long term memory, short term memory, and prediction”

•由此，我们获得了新的长期记忆，短期记忆和预测”

— Peter Foy, “An Introduction to Recurrent Neural Net

最低0.47元/天解锁文章

weixin_26712095

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
rnn pytorch_pytorch介绍rnn字符级文本生成

rnn pytorchToday, we’ll continue our journey through the fascinating world of natural language processing (NLP) by introducing the operation and use of recurrent neural networks to generate text from ...
复制链接

扫一扫