LSTM
- With gated RNN,the network learns to which info is remembered and which should be forgot over a long duration (through forgot gate).
- distinguish between the cell state and hidden state, the former aims to maintain a long term dependency, the latter is just the input of (forget, input and gate gate) and output of output gate.
- The introduction of cell state in LSTM is the primary reason why the vanishing or exploding gradient is mitigated. Pls see Tutorial here.
Others
- Eg of image captioning:
- combination of CNN and RNN
- CNN takes the input of an image and output a feature vector
- then this feature vector is input into RNN as something like a hidden state (but actually not!!), with conversion matrix: Wih W i h
- Gradient cliping:
- solving two problems: sharp cliff in parameter space and exploding gradient space.
The basic idea is to recall that the gradient specifies not the optimal step size, but only the optimal direction within an infinitesimal region.
The objective function for highly nonlinear deep neural networks or forrecurrent neural networks often contains sharp nonlinearities in parameter space resulting from the multiplication of several parameters.
Thus limit the gradient size by a predefined threshold.
- Exploding and vanishing gradient:
- It is sufficient for λ1<1γ λ 1 < 1 γ for the vanishing gradient occurs.
- The necessary condition for exploding gradient is the largest singular value λ1>1γ λ 1 > 1 γ
Generative Model:
- Training example of GAN:
- we sample a mini-batch of m noise example { z(1),⋯,z<