Recurrent Neural Networks: Process Sequences
- one to one: Vanilla Neural Networks. raw
- one to many: eg. Image Captioning image -> sequence of words
- many to one: e.g. Sentiment Classification sequence of words -> sentiment
- many to many: e.g. Machine Translation seq of words -> seq of words
(Vanilla) Recurrent Neural Network
h
t
=
f
(
h
t
−
1
,
x
t
)
h_t = f(h_{t-1},x_t)
ht=f(ht−1,xt)
h
t
=
t
a
n
h
(
W
h
h
t
−
1
+
W
x
x
t
)
h_t = tanh(W_{h} h_{t-1}+W_{x}x_t)
ht=tanh(Whht−1+Wxxt)
y
=
W
y
h
t
y = W_yh_t
y=Wyht
Truncated backpropagation through time
- Run forward and backward through chunks of the sequence instead of whole sequence
Image Captioning with Attention
- CNN网络生成L个D维的feature,代表L个location的feature
- RNN迭代每一步生成一个L个位置的分布向量,表示图片中L个位置中每个位置的attention权重
Long Short Term Memory (LSTM)
Difference to avoid gradient vanishment:
- Backpropagation from c t c_t ct to c t − 1 c_{t-1} ct−1 only elementwise multiplication by f, no matrix multiply by W.
- f is different at every step. In Vanilla RNN, it always multiply the same matrix.