# 技能 | 三次简化一张图: 一招理解LSTM/GRU门控机制

## 引言

RNN是深度学习中用于处理时序数据的关键技术， 目前已在自然语言处理， 语音识别， 视频识别等领域取得重要突破， 然而梯度消失现象制约着RNN的实际应用。LSTM和GRU是两种目前广为使用的RNN变体，它们通过门控机制很大程度上缓解了RNN的梯度消失问题，但是它们的内部结构看上去十分复杂，使得初学者很难理解其中的原理所在。本文介绍”三次简化一张图”的方法，对LSTM和GRU的内部结构进行分析。该方法非常通用，适用于所有门控机制的原理分析。

## 预备知识： RNN

RNN (recurrent neural networks， 注意不是recursiveneural networks)提供了一种处理时序数据的方案。和n-gram只能根据前n-1个词来预测当前词不同， RNN理论上可以根据之前所有的词预测当前词。在每个时刻， 隐层的输出ht依赖于当前词输入xt和前一时刻的隐层状态ht-1:

LSTM
LSTM通过设计精巧的网络结构来缓解梯度消失问题，其数学上的形式化表示如下:

## GRU

GRU是另一种十分主流的RNN衍生物。RNN和LSTM都是在设计网络结构用于缓解梯度消失问题， 只不过是网络结构有所不同。GRU在数学上的形式化表示如下:

## 小结

Bengio, Yoshua, Patrice Simard, and Paolo Frasconi. “Learning long-term
dependencies with gradient descent is difficult.” IEEE transactions on
neural networks 5.2 (1994): 157-166. 2. Cho, Kyunghyun, et al. “Learning phrase representations using RNN
encoder-decoder for statistical machine translation.” arXiv preprint
arXiv:1406.1078 (2014). 3. Chung, Junyoung, et al. “Empirical evaluation of gated recurrent neural
networks on sequence modeling.” arXiv preprint arXiv:1412.3555 (2014). 4. Gers, Felix. “Long short-term memory in recurrent neural networks.” Unpublished PhD dissertation, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland (2001). 5. Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. Deep learning. MIT
press, 2016. 6. Graves, Alex. Supervised sequence labelling with recurrent neural networks. Vol. 385. Heidelberg: Springer, 2012. 7. Greff, Klaus, et al. “LSTM: A search space odyssey.” IEEE transactions on
neural networks and learning systems (2016). 8. He, Kaiming, et al. “Deep residual learning for image recognition.” Proceedings of the IEEE conference on computer vision and pattern
recognition. 2016. 9. He, Kaiming, et al. “Identity mappings in deep residual networks.” European
Conference on Computer Vision. Springer International Publishing, 2016. 10. Hochreiter, Sepp, and Jürgen Schmidhuber. “Long short-term memory.” Neural computation 9.8 (1997): 1735-1780. 11. Jozefowicz, Rafal, Wojciech Zaremba, and Ilya Sutskever. “An empirical
exploration of recurrent network architectures.” Proceedings of the 32nd
International Conference on Machine Learning (ICML-15). 2015. 12. Li, Fei-Fei, Justin Johnson, and Serena Yeung. CS231n: Convolutional Neural
Networks for Visual Recognition. Stanford. 2017. 13. Lipton, Zachary C., John Berkowitz, and Charles Elkan. “A critical review of
recurrent neural networks for sequence learning.” arXiv preprint
arXiv:1506.00019 (2015). 14. Manning, Chris and Richard Socher. CS224n: Natural Language Processing
with Deep Learning. Stanford. 2017. 15. Pascanu, Razvan, Tomas Mikolov, and Yoshua Bengio. “On the difficulty of
training recurrent neural networks.” International Conference on Machine
Learning. 2013. 16. Srivastava, Rupesh Kumar, Klaus Greff, and Jürgen Schmidhuber. “Highway
networks.” arXiv preprint arXiv:1505.00387 (2015). 17. Williams, D. R. G. H. R., and Geoffrey Hinton. “Learning representations by
back-propagating errors.” Nature 323.6088 (1986): 533-538. 18. Zhou, Guo-Bing, et al. “Minimal gated unit for recurrent neural networks.”
International Journal of Automation and Computing 13.3 (2016): 226-234.

### 活动预告

SDCC 2017“人工智能技术实战线上峰会”将在CSDN学院以直播互动的方式举行。