最近看了一些rnn和lstm的资料,算是科普性的了解了,在此做一个总结。
目录:
1RNN
1)RNN结构
2)RNN(本质)
3) The Problem of Long-Term Dependencies of RNN
2LSTM
1)LSTM结构
2)LSTM结构详解
3)Variants on Long Short Term Memory
3Traditional NN VS RNN&LSTM
4RNN&LSTM 的一些简单应用
**
1RNN
**
1.1RNN结构
为三层神经网络,和普通的神经网络相比,就是在hidden layer的时候,还添加了前一次hiddenlayer的输出。
具体看公式就知道了:
1)forward propagation
2)Back propagation through time (BPTT)(BPTT一般都是在一个时间断t之后执行):
具体推导详见:
http://www.mamicode.com/info-detail-1547845.html
1.2RNN(本质)
RNN的本质是一个数据推断(inference)机器, 它可以寻找两个时间序列之间的关联, 只要数据足够多,就可以得到从x(t)到y(t)的概率分布函数, 从而达到推断和预测的目的。这和HMM模型有着千丝万缕的关系。
—HMM, 隐马尔科夫模型(贝叶斯网)
通过跃迁矩阵,将ht和ht-1关联上,每个节点有实际含义
—RNN
通过神经元之间的链接,将ht与ht-1关联上,神经元只是信息流动的枢纽
1.3The Problem of Long-Term Dependencies of RNN
1)1. recent information to perform the present task
如图,对于当前输入,x3,要是和相邻的recent information x0和x1有关,在h3的时候能够很好的表达。
2)need more context (long-term dependencies)
如图,对于当前输入,xt+1 要是和较早之前的信息x0,x1相关联。则在ht+1的时候,可能很难体现出x0,x1的信息,因为存在梯度消失的问题
具体解释,请看论文
Hochreiter (1991) [German]
http://people.idsia.ch/~juergen/SeppHochreiter1991ThesisAdvisorSchmidhuber.pdf
和
Bengio, et al. (1994)
http://www-dsi.ing.unifi.it/~paolo/ps/tnn-94-gradient.pdf
2.LSTM (Long Short-Term Memory ) (IEEE1997)
这是RNN的结构:
这是LSTM的结构:
可以看到LSTM与RNN唯一的区别就是在隐藏层多了一些单元
2.1LSTM结构
2.2LSTM结构详解:
Code :https://github.com/nicodjimenez/lstm
参考网址:http://colah.github.io/posts/2015-08-Understanding-LSTMs/
2.3Variants on Long Short Term Memory
1)Recurrent nets that time and count (IEEE 2000)
将cell state 的信息加入到三种 gate中
implement code
http://christianherta.de/lehre/dataScience/machineLearning/neuralNetworks/LSTM.php
2)combines the forget and input gates into a single “update gate.”
1-ft的原因是,forget的数据部分需要用curX来补充
3)Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation (EMNLP 2014)
introduce Gated Recurrent Unit(GRU)
这个是现在较为有名的GRU模型,是将前面1)和2)做了一个结合,在进行一些变化后的产物。
此外在LSTM: A Search Space Odyssey Klaus Greff 2015论文中,对各种LSTM模型进行了一个对比,在An Empirical Exploration of Recurrent Network Architectures 2015 Koutnik中测试了各种RNN和LSTM的模型,结果非常有趣,某些task下,rnn的结果要比lstm的要好
3Traditional NN VS RNN&LSTM
最左边的是传统的NN模型的形式,一个输入对应一个输出,比如image classification。
下面各种可能分析:
One to one :from fixed-sized input to fixed-sized output (e.g. image classification)
One to many :Sequence output (e.g. image captioning)
Many to one : Sequence input (e.g. sentiment analysis).
Man to many:
1) Sequence input and sequence output (e.g. Machine Translation)
2) Synced sequence input and output (e.g. video classification label each frame of the video).
4RNN&LSTM 的一些简单应用
4.1Language Models
1)Input :hell
predict the next char ‘o’
2)generate text
例如,输入一个起始文本:’in palo alto’,生成后面的100个单词。
例如,训练唐诗n首,然后输入一个起始字,生成古诗文。
4.2Machine Translation
Sequence to Sequence Learning with Neural Networks(google nips 2015)
●Challenges for traditional Feed Forward Neural Networks are varying source and target lengths
具体怎么做:
A encoder-decoder framework
First a lstm encode the Sequence by get the ht vector ,
Then another lstm decode the ht vector as input
(actually a Language model generates text one by one which is the output Sequence)
这种架构的缺点是:
●缺点:无论之前的context有多长,包含多少信息量,最终都要被压缩成一个几百维的vector。这意味着context越大,最终的state vector会丢失越多的信息
解决方案:●attention 机制:
attention相关论文:
1) Reasoning about Neural Attention google ICLR 2016
2) Neural Machine Translation by Jointly Learning to Align and Translate
3) A Neural Attention Model for Abstractive Sentence Summarization
4) Teaching Machines to Read and Comprehend
4.3 Image Captioning
1)Long-term Recurrent Convolutional Networks for Visual Recognition and Description [CVPR2015]
2)Show and tell: A neural image caption generator (ieee 2015 google)
简单来说,就是先将图片放入cnn里面跑,从隐藏层得到图片表达的向量,再将这个向量作为一个lstm的输入,通过language model生成一段描述性的话。
除此之外还有一些应用:
●OCR
1) “Can we build language-independent OCR using LSTMnetworks?.” (Acm 2013)
● Speech Recognition:
1)Towards end-to-end speech recognition with recurrent neural networks ICML,2014.
● Computer-composed Music:
1)Composing Music With Recurrent Neural Networks
●Also can do mnist classification (启发就是可以这么对图片进行处理,来用rnn跑)
source code :
https://github.com/tgjeon/TensorFlow-Tutorials-for-Time-Series/blob/master/mnist-rnn.ipynb