Introduction to Recurrent Neural Networks

最新推荐文章于 2024-07-30 09:32:13 发布

volvet

最新推荐文章于 2024-07-30 09:32:13 发布

阅读量458

点赞数 1

分类专栏：机器学习

本文链接：https://blog.csdn.net/volvet/article/details/80031566

版权

机器学习专栏收录该内容

29 篇文章

订阅专栏

What is RNN

The networks are recurrent because they performance same computations for all the elements of a sequence of input, and the output of each element dependents, in addition to current input, from all the previous commutations.

Why RNN

Sequential type information of the inputs
Video Analysis
Speech Recognition
Machine Translation
RNN have proved to have excellent performance in such problems

RNN Procedure

这里写图片描述

Sigmoid Gradient

这里写图片描述

The Vanish Gradient Problem

Consider the recurrent networks:

h t = σ (U x t + V h t - 1)

$h_t = \sigma(Ux_t+Vh_{t-1})$
then,

h 3 = σ (U x 3 + V (σ (U x 2 + V (σ (U x 1)))))

$h_3 = \sigma(Ux_3+V(\sigma(Ux_2+V(\sigma(Ux_1)))))$

\partial E 3 \partial U = \partial E 3 \partial o u t 3 \partial o u t 3 \partial h 3 \partial h 3 \partial h 2 \partial h 2 \partial h 1 \partial h 1 \partial U

$\frac{\partial E_3}{\partial U}=\frac{\partial E_3}{\partial out_3}\frac{\partial out_3}{\partial h_3}\frac{\partial h_3}{\partial h_2}\frac{\partial h_2}{\partial h_1}\frac{\partial h_1}{\partial U}$

LSTM Cell

这里写图片描述

Input Gate

$g = t a n h (b g + x t U g + h t - 1 V g)$ $g = tanh(b^g+x_tU^g+h_{t-1}V^g)$
$i = σ (b i + x t U i + h t - 1 V i)$ $i=\sigma(b^i+x_tU^i+h_{t-1}V^i)$
$o u t i = g \circ i$ $out_i=g\circ i$
forget gate

$f = σ (b f + x t U f + h t - 1 V f)$ $f = \sigma(b^f+x_tU^f+h_{t-1}V^f)$
$s t = s t - 1 \circ f + g \circ i$ $s_t = s_{t-1}\circ f + g\circ i$
output gate

$o = σ (b o + x t U o + h t - 1 V o)$ $o = \sigma(b^o+x_tU^o+h_{t-1}V^o)$
$h t = t a n h (s t) \circ o$ $h_t = tanh(s_t)\circ o$