教程地址:http://deeplearning.net/tutorial/lstm.html
国外一个人的博客,图解比较多,可以对比着看:http://colah.github.io/posts/2015-08-Understanding-LSTMs/
UCSD一个博士写的介绍:http://blog.terminal.com/demistifying-long-short-term-memory-lstm-recurrent-neural-networks/
Geoffrey Hinton在coursera上的Neural Networks for Machine Learning课程第7课介绍了RNN以及LSTM:
https://class.coursera.org/neuralnets-2012-001/lecture
这节代码的源码分析的博客:http://www.cnblogs.com/neopenx/p/4806006.html
Large Movie Review Dataset 数据集:用爬虫在IMDB上收集的影评文字,根据评分情况分为两类。数据集的下载使用及预处理脚本代码详见教程(本节教材提供的代码imdb.py中会自动从网上下载预处理过的数据集)
Model
In a traditional recurrent neural network, during the gradient back-propagation phase, the gradient signal can end up being multiplied a large number of times (as many as the number of timesteps) by the weight matrix associated with the connections between the neurons of the recurrent hidden layer. This means that, the magnitude of weights in the transition matrix can have a strong impact on the learning process.
传统RNN的训练是Back propagation Through Time,即跨时间步的反向传导,这就导致了梯度会被图中紫色权重乘好多好多次(时间跨度多少次就被乘多少次)。紫色权重对学习过程的影响就非常大了。
If the weights in this matrix are small (or, more formally, if the leading eigenvalue of the weight matrix is smaller than 1.0), it can lead to a situation called vanishing gradients where the gradient signal gets so small that learning either becomes very slow or stops working altogether. It can also make more difficult the task of learning long-term dependencies in the data.Conversely, if the weights in this matrix are large (or, again, more formally, if the leading eigenvalue o

最低0.47元/天 解锁文章
2658

被折叠的 条评论
为什么被折叠?



