Karpathy, Andrej, Justin Johnson, and Li Fei-Fei. “Visualizing and understanding recurrent networks” arXiv preprint arXiv:1506.02078 (2015). (Citations: 79).
1 RNN
RNN has form
Where W varies between layers but is shared through time. ⃗ x is the input from the layer below.
It was observed that the back-propagation dynamics caused the gradients in an RNN to either vanish or explode.
2 LSTM
The exploding gradient concern can be alleviated with a heuristic of clipping the gradients, and LSTMs were designed to mitigate the vanishing gradient problem. In addition to a ⃗ LSTMs also maintain a memory vector ⃗ c . At each time step the hidden state vector h, LSTM can choose to read from, write to,