What is RNN
The networks are recurrent because they performance same computations for all the elements of a sequence of input, and the output of each element dependents, in addition to current input, from all the previous commutations.
Why RNN
- Sequential type information of the inputs
Video Analysis
Speech Recognition
Machine Translation - RNN have proved to have excellent performance in such problems
RNN Procedure
Sigmoid Gradient
The Vanish Gradient Problem
Consider the recurrent networks:
ht=σ(Uxt+Vht−1)
h
t
=
σ
(
U
x
t
+
V
h
t
−
1
)
then,
h3=σ(Ux3+V(σ(Ux2+V(σ(Ux1)))))
h
3
=
σ
(
U
x
3
+
V
(
σ
(
U
x
2
+
V
(
σ
(
U
x
1
)
)
)
)
)
∂E3∂U=∂E3∂out3∂out3∂h3∂h3∂h2∂h2∂h1∂h1∂U
∂
E
3
∂
U
=
∂
E
3
∂
o
u
t
3
∂
o
u
t
3
∂
h
3
∂
h
3
∂
h
2
∂
h
2
∂
h
1
∂
h
1
∂
U
LSTM Cell
Input Gate
g=tanh(bg+xtUg+ht−1Vg) g = t a n h ( b g + x t U g + h t − 1 V g )
i=σ(bi+xtUi+ht−1Vi) i = σ ( b i + x t U i + h t − 1 V i )
outi=g∘i o u t i = g ∘ iforget gate
f=σ(bf+xtUf+ht−1Vf) f = σ ( b f + x t U f + h t − 1 V f )
st=st−1∘f+g∘i s t = s t − 1 ∘ f + g ∘ ioutput gate
o=σ(bo+xtUo+ht−1Vo) o = σ ( b o + x t U o + h t − 1 V o )
ht=tanh(st)∘o h t = t a n h ( s t ) ∘ o
Reducing The Problem
∂st∂st−1=f
∂
s
t
∂
s
t
−
1
=
f
Reference
- http://adventuresinmachinelearning.com/recurrent-neural-networks-lstm-tutorial-tensorflow/
- Deep Learning with Tensorflow