学习日记2

沟壑星空qq_42946961

已于 2022-07-16 08:07:46 修改

阅读量228

点赞数

文章标签：学习大数据

于 2022-07-07 18:07:53 首次发布

本文链接：https://blog.csdn.net/qq_42946961/article/details/125661595

版权

LSTM

学习大纲

https://blog.csdn.net/phdongou/article/details/113515466

https://www.jianshu.com/p/2f17a3c62cde

keras中的units参数的意思-->这层的隐藏神经单元个数

一个cell共有多少个参数

假设units = 64
根据上图，我们可以计算，假设a向量是128维向量，x向量是28维向量，那么二者concat以后就是156维向量，为了能相乘，那么Wf就应该是(64,156),同理其余三个框，也应该是同样的shape。于是，在第一层就有参数64x156x4 + 64x4个。

若是把cell外面的参数也算进去，那么假设有10个类，那么对于最终的shape为(64,1)的输出at,还要有一个shape为(10,64)的W跟一个shape为(10,1)的b。

Understanding LSTM Networks

Posted on August 27, 2015

Recurrent Neural Networks

人的思考是持续性的，跟过去有关

Traditional neural networks can’t do this, and it seems like a major shortcoming. For example, imagine you want to classify what kind of event is happening at every point in a movie. It’s unclear how a traditional neural network could use its reasoning about previous events in the film to inform later ones.

Recurrent neural networks address this issue. They are networks with loops in them, allowing information to persist.

Recurrent Neural Networks have loops.

In the above diagram, a chunk of neural network, AA, looks at some input Xt and outputs a value ht. A loop allows information to be passed from one step of the network to the next.

上图的循环比较让人困惑. However, if you think a bit more, it turns out that they aren’t all that different than a normal neural network. A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor. Consider what happens if we unroll the loop:

rnn展开

This chain-like nature reveals that recurrent neural networks are intimately related to sequences and lists. They’re the natural architecture of neural network to use for such data.

And they certainly are used! In the last few years, there have been incredible success applying RNNs to a variety of problems: speech recognition, language modeling, translation, image captioning… I’ll leave discussion of the amazing feats one can achieve with RNNs to Andrej Karpathy’s excellent blog post, The Unreasonable Effectiveness of Recurrent Neural Networks. But they really are pretty amazing.

Essential to these successes is the use of “LSTMs,” a very special kind of recurrent neural network which works, for many tasks, much much better than the standard version. Almost all exciting results based on recurrent neural networks are achieved with them. It’s these LSTMs that this essay will explore.

The Problem of Long-Term Dependencies

One of the appeals of RNNs is the idea that they might be able to connect previous information to the present task, such as using previous video frames might inform the understanding of the present frame. If RNNs could do this, they’d be extremely useful. But can they? It depends.

Sometimes, we only need to look at recent information to perform the present task. For example, consider a language model trying to predict the next word based on the previous ones. If we are trying to predict the last word in “the clouds are in the sky,” we don’t need any further context – it’s pretty obvious the next word is going to be sky. In such cases, where the gap between the relevant information and the place that it’s needed is small, RNNs can learn to use the past information.

But there are also cases where we need more context. Consider trying to predict the last word in the text “I grew up in France… I speak fluent French.” Recent information suggests that the next word is probably the name of a language, but if we want to narrow down which language, we need the context of France, from further back. It’s entirely possible for the gap between the relevant information and the point where it is needed to become very large.

Unfortunately, as that gap grows, RNNs become unable to learn to connect the information.

In theory, RNNs are absolutely capable of handling such “long-term dependencies.” A human could carefully pick parameters for them to solve toy problems of this form. Sadly, in practice, RNNs don’t seem to be able to learn them. The problem was explored in depth by Hochreiter (1991) [German] and Bengio, et al. (1994), who found some pretty fundamental reasons why it might be difficult.

Thankfully, LSTMs don’t have this problem!

LSTM Networks

Long Short Term Memory networks – usually just called “LSTMs” – are a special kind of RNN, capable of learning long-term dependencies. They were introduced by Hochreiter &