循环神经网络 递归神经网络
有关深层学习的FAU讲义 (FAU LECTURE NOTES ON DEEP LEARNING)
These are the lecture notes for FAU’s YouTube Lecture “Deep Learning”. This is a full transcript of the lecture video & matching slides. We hope, you enjoy this as much as the videos. Of course, this transcript was created with deep learning techniques largely automatically and only minor manual modifications were performed. If you spot mistakes, please let us know!
这些是FAU YouTube讲座“ 深度学习 ”的 讲义 。 这是演讲视频和匹配幻灯片的完整记录。 我们希望您喜欢这些视频。 当然,此成绩单是使用深度学习技术自动创建的,并且仅进行了较小的手动修改。 如果发现错误,请告诉我们!
导航 (Navigation)
Previous Lecture / Watch this Video / Top Level / Next Lecture
Welcome back to deep learning! Today I want to show you one alternative solution to solve this vanishing gradient problem in recurrent neural networks.
欢迎回到深度学习! 今天,我想向您展示一种替代解决方案,以解决递归神经网络中逐渐消失的梯度问题。
You already noticed long temporal contexts are a problem. Therefore, we will talk about long short-term memory units (LSTMs). They have been introduced by a Hochreiter in Schmidhuber and they were published in 1997.
您已经注意到长时态上下文是个问题。 因此,我们将讨论长短期存储单元(LSTM)。 它们由Schmidhuber的Hochreiter引入,并于1997年出版。
They were designed to solve this vanishing gradient problem in the long term dependencies. The main idea is that you introduce gates that control writing and accessing the memory in additional states.
它们旨在解决长期依赖关系中逐渐消失的梯度问题。 主要思想是引入门来控制在其他状态下写入和访问内存。
So, let’s have a look into the LSTM unit. You see here, one main feature is that we now have essentially two things that could be considered as a hidden state: We have the cell state C and we have the hidden state h. Again, we have some input x. Then we have quite a few of activation functions. We then combine them and in the end, we produce some output y. This unit is much more complex than what you’ve seen previously in the simple RNNs.
因此,让我们看一下LSTM单元。 您会在这里看到一个主要特征,就是现在我们基本上有两件事可以被视为隐藏状态:我们拥有单元状态C和拥有隐藏状态h 。 同样,我们有一些输入x 。 然后,我们有很多激活功能。 然后,我们将它们组合在一起,最后产生一些输出y 。 这个单元比以前在简单RNN中看到的要复杂得多。
Okay, so what are the main features the LSTM: Given some input x it produces a hidden state h. It also has a cell state Cthat we will look into a little more detail in the next couple of slides, to produce the output y. Now, we have several gates and the gates essentially are used to control the flow of information. There’s a forget gate and this is used to forget old information in the cell state. Then, we have the input gate and this is essentially deciding new input into the cell state. From this, we then compute the updated cell state and the updated hidden state.
好的,LSTM的主要特征是什么:给定输入x,它会产生隐藏状态h 。 它还具有单元状态C ,我们将在接下来的几张幻灯片中对其进行详细研究,以产生输出y 。 现在,我们有几个闸门,而这些闸门实际上是用来控制信息流的。 有一个忘记门,用于忘记单元状态中的旧信息。 然后,我们有了输入门,这实际上是在确定新输入进入单元状态。 由此,我们可以计算出更新后的单元状态和更新后的隐藏状态。
So let’s look into the workflow. We have the cell state after each time point t and the cell state undergoes only linear changes. So there is no activation function. You see there are only one multiplication and one addition on the path of the cell state. So, the cell state can flow through the unit. The cell state can be constant for multiple time steps. Now, we want to operate on the cell state. We do that with several gates and the first one is going to be the forget gate. The key idea here is that we want to forget information from the cell state. In another step, we then want to think about how to actually put new information in the cell state that is going to be used to memorize things.
因此,让我们看一下工作流程。 在每个时间点t之后,我们都有单元状态,并且单元状态仅发生线性变化。 因此没有激活功能。 您会看到单元状态的路径上只有一个乘法和一个加法。 因此,单元状态可以流过单元。 单元状态对于多个时间步长可以是恒定的。 现在,我们要对单元状态进行操作。 我们用几个门来做到这一点,第一个将是忘记门。 这里的关键思想是我们要从单元状态中忘记信息。 然后,下一步,我们要考虑如何将新信息实际置于将用于存储事物的单元状态中。
So, the forget gate f controls how much of the previous cell state is forgotten. You can see it is computed by a sigmoid function. So, it’s somewhere between 0 and 1. It’s essentially computed with a matrix multiplication of a concatenation of the hidden state and x plus some bias. This is then multiplied to the cell state. So, we decide which parts of the state vector to forget and which ones to keep.
因此,遗忘门f控制着遗忘了多少先前的电池状态。 您可以看到它是由S型函数计算的。 因此,它介于0到1之间。它基本上是由隐藏状态和x的串联矩阵乘积加上一些偏差来计算的。 然后将其乘以单元状态。 因此,我们决定要忘记状态向量的哪些部分,并保留哪些部分。
Now, we also need to put in new information. For the new information, we have to somehow decide what information to input into the cell state. So here, we need two activation functions: One that we call I that is also produced by a sigmoid activation function. Again, matrix multiplication of the hidden state concatenated with the input plus some bias and the sigmoid function as non-linearity. Remember, this value is going to be between 0 and 1 so you could argue that it is kind of selecting something. Then, we have some C tilde which is a kind of update state that is produced by the hyperbolic tangent. This then takes as input some weight matrix W subscript c that is multiplied to the concatenation of hidden and input vector plus some bias. So essentially, we have this index that is then multiplied to the intermediate cell stage C tilde. We could say that the hyperbolic tangent is producing some new cell state and then we select via I which of these indices should be added to the current cell state. So, we multiply with I the newly produced C tilde and add it to the cell state C.
现在,我们还需要添加新信息。 对于新信息,我们必须以某种方式决定将哪些信息输入到单元状态。 所以在这里,我们需要两个激活功能:一个是我们所说的我也由一个S形激活函数产生的。 同样,隐藏状态的矩阵乘法与输入加上一些偏置和S形函数(作为非线性函数)连接在一起。 请记住,该值将在0到1之间,因此您可以说这是一种选择。 然后,我们有一些C波浪号,它是由双曲正切产生的一种更新状态。 然后,这将一些权重矩阵W下标c作为输入,该权重矩阵W下标c乘以隐藏向量和输入向量的串联加上一些偏差。 因此,基本上,我们拥有此索引,然后将其乘以中间单元阶段C tilde。 我们可以说双曲正切正在产生一些新的单元状态,然后我们通过I选择应将这些索引中的哪一个添加到当前单元状态。 因此,我们将新产生的C波浪号与I相乘,并将其添加到单元状态C。
Now, we update as we’ve just seen the complete cell state using a point-wise multiplication with the forget gate of the previous state. Then, we add the elements of the update cell state that have been identified by I with a point-wise multiplication. So, you see the update of the cell state is completely linear only using multiplications and additions.
现在,我们进行更新,因为我们刚刚看到了完整的单元格状态,并使用了前一个状态的“忘记门”进行逐点乘法。 然后,我们使用逐点乘法添加已由I标识的更新单元状态的元素。 因此,您看到单元状态的更新仅使用乘法和加法是完全线性的。
Now, we still have to produce the hidden state and the output. As we have seen in the Elman cell, the output of our network only depends on the hidden state. So, we first update the hidden state by another non-linearity that is then multiplied to a transformation of the cell state. This gives us the new hidden state and from the new hidden state, we produce the output with another non-linearity.
现在,我们仍然必须产生隐藏状态和输出。 正如我们在Elman单元中看到的那样,网络的输出仅取决于隐藏状态。 因此,我们首先通过另一种非线性来更新隐藏状态,然后将该非线性乘以单元状态的转换。 这给了我们新的隐藏状态,并且从新的隐藏状态中,我们产生了另一个非线性的输出。
So, you see these are the update equations. We produce some o which is essentially a proposal for the new hidden state by a sigmoid function. Then, we multiply it with the hyperbole tangent that is generated from the cell state in order to select which elements are actually produced. This gives us the new hidden state. The new hidden state we can then pass through another non-linearity in order to produce the output. You can see here, by the way, that for the update of the hidden state and the production of the new output, we omitted the transformation matrices that are of course required. You could interpret each of these nonlinearities in the network essentially as a universal function approximator. So, we still need the linear part, of course, inside here to reduce vanishing gradients.
因此,您将看到这些是更新方程式。 我们产生一些o ,这实际上是通过S型函数对新隐藏状态的建议。 然后,我们将其与从单元状态生成的双曲线正切相乘,以选择实际生成的元素。 这给了我们新的隐藏状态。 然后,我们可以通过另一个非线性来传递新的隐藏状态,以产生输出。 顺便说一下,您在这里可以看到,对于隐藏状态的更新和新输出的产生,我们省略了当然需要的转换矩阵。 您可以将网络中的所有这些非线性本质上解释为通用函数逼近器。 因此,我们当然仍然需要在这里内部使用线性部分来减小消失梯度。
If you want to train all of this, you can go back and use a very similar recipe as we’ve already seen for the Elman cell. So, you use backpropagation through time in order to update all of the different weight matrices.
如果您想训练所有这些,则可以回过头来,使用与我们在Elman细胞中看到的非常相似的配方。 因此,您将使用反向传播,以更新所有不同的权重矩阵。
Okay. This already brings us to the end of this video. So you’ve seen the long short-term memory cell, the different parts, the different gates, and, of course, this is a very important part of this lecture. So, if you’re preparing for the exam, then I would definitely recommend having a look at how to sketch such a long short-term memory unit. You can see that the LSTM has a lot of advantages. In particular, we can alleviate the problem with the vanishing gradients by the linear transformations in the cell state. By the way, it’s also noteworthy to point out that we somehow include in our long short term memory cell some ideas that we know from computer design. We essentially learn how to manipulate memory cells. We could argue that in the hidden state, we now have the kind of program a kind of finite state machine that then operates on some memory and learns which information to store, which information to delete, and which information to load. So, this is very interesting how these network designs gradually seem to be approaching computer architectures. Of course, there’s much more to say about this. In the next video, we will look into the gated recurrent neural networks which are a kind of simplification of the LSTM cell. You will see that with a slightly slimmer design, we can still get many of the benefits of the LSTM, but much fewer parameters. Ok, so I hope you enjoyed this video and see you next time when we talk about gated recurrent neural networks. Bye-bye!
好的。 这已经将我们带到了该视频的结尾。 因此,您已经看到了很长的短期存储单元,不同的部分,不同的门,当然,这是本讲座中非常重要的一部分。 因此,如果您正在准备考试,那么我绝对建议您看一下如何绘制这么长的短期记忆单元。 您可以看到LSTM有很多优点。 特别地,我们可以通过单元状态中的线性变换来缓解梯度消失的问题。 顺便说一句,还需要指出的是,我们以某种方式在我们的长期短期存储单元中包含了一些我们从计算机设计中了解到的想法。 我们从本质上了解如何操纵存储单元。 我们可以争辩说,在隐藏状态下,我们现在拥有一种程序,即一种有限状态机,然后在某些内存上运行,并了解要存储的信息,要删除的信息以及要加载的信息。 因此,这很有趣,这些网络设计似乎逐渐接近计算机体系结构。 当然,还有更多要说的。 在下一个视频中,我们将研究门控循环神经网络,这是LSTM单元的一种简化。 您会发现,通过稍微苗条的设计,我们仍然可以获得LSTM的许多好处,但参数却少得多。 好的,所以我希望您喜欢这个视频,并且下次在我们谈论门控递归神经网络时再见。 再见!
If you liked this post, you can find more essays here, more educational material on Machine Learning here, or have a look at our Deep LearningLecture. I would also appreciate a follow on YouTube, Twitter, Facebook, or LinkedIn in case you want to be informed about more essays, videos, and research in the future. This article is released under the Creative Commons 4.0 Attribution License and can be reprinted and modified if referenced.
如果你喜欢这篇文章,你可以找到这里更多的文章 ,更多的教育材料,机器学习在这里 ,或看看我们的深入 学习 讲座 。 如果您希望将来了解更多文章,视频和研究信息,也欢迎关注YouTube , Twitter , Facebook或LinkedIn 。 本文是根据知识共享4.0署名许可发布的 ,如果引用,可以重新打印和修改。
RNN民间音乐 (RNN Folk Music)
FolkRNN.orgMachineFolkSession.comThe Glass Herry Comment 14128
FolkRNN.org MachineFolkSession.com 玻璃哈里评论14128
链接 (Links)
Character RNNsCNNs for Machine TranslationComposing Music with RNNs
翻译自: https://towardsdatascience.com/recurrent-neural-networks-part-3-1032d4a67757
循环神经网络 递归神经网络