rnn 递归神经网络_递归神经网络rnn的简单解释

最新推荐文章于 2024-07-31 18:13:02 发布

weixin_26750481

最新推荐文章于 2024-07-31 18:13:02 发布

阅读量508

点赞数

文章标签： python 算法 leetcode 神经网络深度学习

原文链接：https://medium.com/swlh/simple-explanation-of-recurrent-neural-network-rnn-1285749cc363

版权

本文介绍了RNN（递归神经网络）如何处理序列数据，特别是在文本分类和命名实体识别中的应用。RNN通过考虑上下文信息来理解和记忆序列模式，与传统算法相比具有优势。在正向传播中，RNN逐词处理并计算损失函数，然后通过反向传播更新参数。经过多次迭代，RNN能有效识别句子中的名称等信息。

摘要由CSDN通过智能技术生成

rnn 递归神经网络

Recurrent neural network is a type of neural network used to deal specifically with sequential data. Actually what makes RNN so powerful is the fact that it doesn't take into consideration just the actual input but also the previous input which allows it to memorize what happens previously. To get a better intuition on RNN let’s take the example of text classification, for this task we can use the classic machine learning algorithms like naive bayes but the problem with this algorithm, it takes a sentence as a set of independent words and precisely the frequency of each word without worrying about the composition of words or the order of words in a sentence which makes a huge difference to form the meaning of a sentence. RNN unlike those classic algorithms, works well on sequence data because it takes the word i as input and combine with the output of word i-1, the same thing would be applied for word i+1 and this is the reason it’s called recurrent neural network because clearly the neural network apply the same operations on each word i of the sentence.

递归神经网络是一种用于专门处理顺序数据的神经网络。实际上，使RNN如此强大的原因是它不仅考虑了实际输入，还考虑了先前的输入，从而使它能够记住先前发生的事情。为了更好地了解RNN，让我们以文本分类为例，对于此任务，我们可以使用经典的机器学习算法(如朴素贝叶斯(Naive Bayes))，但该算法的问题是将一个句子作为一组独立的单词并精确地将频率无需担心单词的组成或句子中单词的顺序，而这会极大地影响句子的含义。 RNN与那些经典算法不同，它在序列数据上效果很好，因为它将词i作为输入并与词i-1的输出结合在一起，对词i + 1也会应用相同的东西，这就是其被称为递归神经的原因因为显然神经网络对句子的每个单词i都应用相同的操作。

As you might be thinking enough bla bla show us how they work, and that's exactly what I’d do in the next part :

您可能已经想够了，bla bla告诉我们它们是如何工作的，而这正是我在下一部分中所做的：

RNN的工作原理： (How RNN works :)

In order to understand how RNN works under the hood, let’s take an example of NLP application Named entity recognition, this technique is used to detect names in a sentence :

为了了解RNN的工作原理，让我们以NLP应用“命名实体识别”为例，该技术用于检测句子中的名称：

In the examples above, for each instance of training (sentence) we map each word with an output, if the word is name(john, Ellen …) we map it to 1. Otherwise, we map it to 0. So to train RNN on sentences to recognize names within, the RNN architecture would be something like that :

在上面的示例中，对于每个训练实例(句子)，我们都将每个单词映射到一个输出，如果单词是name(john，Ellen…)，则将其映射到1。否则，将其映射到0。因此要训练RNN在识别内部名称的句子上，RNN体系结构将是这样的：

正向传播： (Forward propagation :)

for this training example, we have 5 words which means 5 steps so for each step t we calculate a, y using the shared weights Wa,Wx, Wy, ba, by :

在此训练示例中，我们有5个单词，表示5个步骤，因此对于每个步骤t，我们使用共享权重Wa，Wx，Wy，ba通过以下公式计算a，y：

And generally the equations would be :

通常，等式为：

Then, we calculate the cost function to represent the relation between the real output y and the output predicted ŷ for each time step t:

然后，我们计算成本函数以表示每个时间步长t的实际输出y与预测输出ŷ之间的关系：

Now, we’ll sum over the cost function of each word to calculate the loss function :

现在，我们将对每个单词的成本函数求和以计算损失函数：

反向传播： (Back propagation :)

Back propagation is like going back in time to compute derivative of the loss function with respect to parameters Wa, Wx, Wy, ba, by using the chain rule to simplify the calculus. After getting the derivatives, we update the parameters using descent gradient :

反向传播就像通过使用链式规则简化计算来回溯以计算损耗函数相对于参数Wa，Wx，Wy，ba的导数。得到导数后，我们使用下降梯度更新参数：

After multiple iterations using several training examples, we’d be able to minimize the loss function and the predicted output would converge to the real output. Thus, we’ll use the optimized weights to detect names through future sentences.

在使用几个训练示例进行多次迭代之后，我们将能够使损失函数最小化，并且预测输出将收敛到实际输出。因此，我们将使用优化的权重来通过将来的句子检测名称。

For more articles check :

有关更多文章，请检查：