From RNN To LSTM

最新推荐文章于 2024-08-12 01:32:42 发布

BVL10101111

最新推荐文章于 2024-08-12 01:32:42 发布

阅读量2.4k

点赞数 1

分类专栏： nlp rnn lstm

本文链接：https://blog.csdn.net/BVL10101111/article/details/53886412

版权

nlp 同时被 3 个专栏收录

2 篇文章 0 订阅

订阅专栏

rnn

2 篇文章 0 订阅

订阅专栏

lstm

1 篇文章 0 订阅

订阅专栏

最近看了一些rnn和lstm的资料，算是科普性的了解了，在此做一个总结。

目录：
1RNN
1）RNN结构
2）RNN（本质）
3） The Problem of Long-Term Dependencies of RNN
2LSTM
1）LSTM结构
2）LSTM结构详解
3）Variants on Long Short Term Memory
3Traditional NN VS RNN&LSTM
4RNN&LSTM 的一些简单应用

1RNN

1.1RNN结构
这里写图片描述
为三层神经网络，和普通的神经网络相比，就是在hidden layer的时候，还添加了前一次hiddenlayer的输出。
具体看公式就知道了：
1）forward propagation

2）Back propagation through time （BPTT）（BPTT一般都是在一个时间断t之后执行）：
具体推导详见：
http://www.mamicode.com/info-detail-1547845.html

1.2RNN（本质）
RNN的本质是一个数据推断（inference）机器，它可以寻找两个时间序列之间的关联，只要数据足够多，就可以得到从x（t）到y（t）的概率分布函数，从而达到推断和预测的目的。这和HMM模型有着千丝万缕的关系。
—HMM，隐马尔科夫模型（贝叶斯网）
通过跃迁矩阵，将ht和ht-1关联上，每个节点有实际含义
—RNN
通过神经元之间的链接，将ht与ht-1关联上，神经元只是信息流动的枢纽

1.3The Problem of Long-Term Dependencies of RNN

1）1. recent information to perform the present task
这里写图片描述
如图，对于当前输入，x3，要是和相邻的recent information x0和x1有关，在h3的时候能够很好的表达。

2）need more context （long-term dependencies）
这里写图片描述
如图，对于当前输入，xt+1 要是和较早之前的信息x0，x1相关联。则在ht+1的时候，可能很难体现出x0，x1的信息，因为存在梯度消失的问题

具体解释，请看论文
Hochreiter (1991) [German]
http://people.idsia.ch/~juergen/SeppHochreiter1991ThesisAdvisorSchmidhuber.pdf
和
Bengio, et al. (1994)
http://www-dsi.ing.unifi.it/~paolo/ps/tnn-94-gradient.pdf

2.LSTM （Long Short-Term Memory ) (IEEE1997)

这是RNN的结构：
这里写图片描述
这是LSTM的结构：

可以看到LSTM与RNN唯一的区别就是在隐藏层多了一些单元

2.1LSTM结构
这里写图片描述

2.2LSTM结构详解：
Code ：https://github.com/nicodjimenez/lstm
参考网址：http://colah.github.io/posts/2015-08-Understanding-LSTMs/
这里写图片描述

2.3Variants on Long Short Term Memory
1）Recurrent nets that time and count （IEEE 2000）
将cell state 的信息加入到三种 gate中
这里写图片描述

implement code
http://christianherta.de/lehre/dataScience/machineLearning/neuralNetworks/LSTM.php

2）combines the forget and input gates into a single “update gate.”
这里写图片描述
1-ft的原因是，forget的数据部分需要用curX来补充

3）Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation (EMNLP 2014)
introduce Gated Recurrent Unit(GRU)
这里写图片描述
这个是现在较为有名的GRU模型，是将前面1）和2）做了一个结合，在进行一些变化后的产物。

此外在LSTM: A Search Space Odyssey Klaus Greff 2015论文中，对各种LSTM模型进行了一个对比，在An Empirical Exploration of Recurrent Network Architectures 2015 Koutnik中测试了各种RNN和LSTM的模型，结果非常有趣，某些task下，rnn的结果要比lstm的要好

3Traditional NN VS RNN&LSTM
这里写图片描述
最左边的是传统的NN模型的形式，一个输入对应一个输出，比如image classification。
下面各种可能分析：
One to one :from fixed-sized input to fixed-sized output (e.g. image classification)
One to many :Sequence output (e.g. image captioning)
Many to one : Sequence input (e.g. sentiment analysis).
Man to many:
1) Sequence input and sequence output (e.g. Machine Translation)
2) Synced sequence input and output (e.g. video classification label each frame of the video).

4RNN&LSTM 的一些简单应用
4.1Language Models
1）Input ：hell
predict the next char ‘o’
这里写图片描述
2）generate text
例如，输入一个起始文本：’in palo alto’，生成后面的100个单词。
例如，训练唐诗n首，然后输入一个起始字，生成古诗文。

4.2Machine Translation
Sequence to Sequence Learning with Neural Networks（google nips 2015）
●Challenges for traditional Feed Forward Neural Networks are varying source and target lengths
具体怎么做：
A encoder-decoder framework

First a lstm encode the Sequence by get the ht vector ，

Then another lstm decode the ht vector as input
(actually a Language model generates text one by one which is the output Sequence)
这种架构的缺点是：
●缺点：无论之前的context有多长，包含多少信息量，最终都要被压缩成一个几百维的vector。这意味着context越大，最终的state vector会丢失越多的信息
解决方案：●attention 机制：
这里写图片描述
attention相关论文：
1） Reasoning about Neural Attention google ICLR 2016
2） Neural Machine Translation by Jointly Learning to Align and Translate
3） A Neural Attention Model for Abstractive Sentence Summarization
4） Teaching Machines to Read and Comprehend

4.3 Image Captioning
1)Long-term Recurrent Convolutional Networks for Visual Recognition and Description [CVPR2015]
2)Show and tell: A neural image caption generator (ieee 2015 google)
这里写图片描述
简单来说，就是先将图片放入cnn里面跑，从隐藏层得到图片表达的向量，再将这个向量作为一个lstm的输入，通过language model生成一段描述性的话。

除此之外还有一些应用：
●OCR
1) “Can we build language-independent OCR using LSTMnetworks?.” (Acm 2013)
● Speech Recognition:
1)Towards end-to-end speech recognition with recurrent neural networks ICML,2014.
● Computer-composed Music:
1)Composing Music With Recurrent Neural Networks

●Also can do mnist classification (启发就是可以这么对图片进行处理，来用rnn跑)
source code ：
https://github.com/tgjeon/TensorFlow-Tutorials-for-Time-Series/blob/master/mnist-rnn.ipynb
这里写图片描述