RNN, LSTM, GRU, SRU, Multi-Dimensional LSTM, Grid LSTM, Graph LSTM系列解读

RNN/Stacked RNN

rnn一般根据输入和输出的数目分为5种

1. 一对一 最简单的rnn
2. 一对多 Image Captioning(image -> sequence of words)
3. 多对一 Sentiment Classification(sequence of words -> sentiment)
4. 多对多: 时序不齐 Machine Translation(seq of words -> seq of words)
5. 多对多: 时序对齐 Video classification on frame level

# pytorch代码示例
'''

'''
# rnn_single.weight_ih [torch.FloatTensor of size 200x100]
# rnn_single.weight_hh [torch.FloatTensor of size 200x200]
rnn_single = nn.RNNCell(input_size=100, hidden_size=200)
# 构造一个序列，长为 6，batch 是 5， 特征是 100
x = Variable(torch.randn(6, 5, 100)) # 这是 rnn 的输入格式
# 定义初始的记忆状态
h_t = Variable(torch.zeros(5, 200))
# 传入 rnn
out = []
for i in range(6): # 通过循环 6 次作用在整个序列上
h_t = rnn_single(x[i], h_t)
out.append(h_t)
# out.shape: torch.Size([6, 5, 200])
# ------------------------------
rnn_seq = nn.RNN(100, 200)
out, h_t = rnn_seq(x) # 使用默认的全 0 隐藏状态
# 自己定义初始的隐藏状态 [torch.FloatTensor of size 1x5x200]
# number_layer*bidirectional, batch, hidden_size
# 如过h_t的shape中没有number_layer*bidirectional(True:2,False:1)这个维度
# 即h_0 = Variable(torch.randn(5, 200))那么每层每个方向都用一样的隐状态
h_0 = Variable(torch.randn(1, 5, 200))
out, h_t = rnn_seq(x, h_0)
# out.shape: torch.Size([6, 5, 200])


# input_size hidden_size
rnn_seq = nn.RNN(50, 100, num_layers=2)
# weight_ih (hidden_size, layer_input_size)
print rnn_seq.weight_ih_l0.shape # (100L, 50L) 第一层的输入是50
print rnn_seq.weight_ih_l1.shape # (100L, 100L) 第二层的输入是100，即上一个RNN的输出h
rnn_input = Variable(torch.randn(10, 3, 50))
out, h = rnn_seq(rnn_input)
# h: number_layer*bidirectional, batch, hidden_size
# h: 2*1,3,100
print h.shape # (2L, 3L, 100L) 时序方向w共享，depth方向w不同
# out: seq_len, batch, num_directions * hidden_size
print out.shape # (10L, 3L, 100L)



Bidirectional RNN(双向RNN)##

# input_size hidden_size
rnn_seq = nn.RNN(50, 100, num_layers=2, bidirectional=True)
# weight_ih (hidden_size, layer_input_size)
# layer_input_size = hidden_size * num_directions
print rnn_seq.weight_ih_l0.shape # (100L, 50L) 第一层的输入是50
print rnn_seq.weight_ih_l1.shape # (100L, 200) 第二层的输入是1200，即上一个RNN的输出h
rnn_input = Variable(torch.randn(10, 3, 50))
out, h = rnn_seq(rnn_input)
# h: number_layer*bidirectional, batch, hidden_size
# h: 2*2,3,100
print h.shape # (4L, 3L, 100L) 时序方向w共享，depth方向w不同
print out.shape # (10L, 3L, 100L)


LSTM/Stacked LSTM

lstm_seq = nn.LSTM(50, 100, num_layers=2) # 输入维度 50，输出 100，两层
# weight_ih (gate_size, layer_input_size)
print lstm_seq.weight_ih_l0.shape # (400L, 50L) 注意是ih
print lstm_seq.weight_ih_l1.shape # (400L, 100L) 输入的维度从50变成100了，即上一层的输出c作为了输入
print lstm_seq.weight_hh_l0 # 第一层的 h_t权重 [400x100] 对应于4个gate的权重 注意是hh
lstm_input = Variable(torch.randn(10, 3, 50))
out, (h, c) = lstm_seq(lstm_input)
print h.shape # (2L, 3L, 100L) 时序方向w共享，depth方向w不同
print c.shape # (2L, 3L, 100L) 时序方向w共享，depth方向w不同
print out.shape # (10L, 3L, 100L)


Backpropagation from ct to ct-1 only elementwise multiplication by f, no matrix multiply by W —cs231n_2017_lecture10.pdf, page99

GRU

• 新增 reset gate，即图中的 r 开关
• 将输入门和遗忘门合并为“update gate”，即图中的 z 开关
• 将细胞状态 C 和隐藏状态 m 合并为 h
• 省掉了输出门
与LSTM相比，GRU内部少了一个”门控“，参数比LSTM少，但是却也能够达到与LSTM相当的功能。考虑到硬件的计算能力和时间成本，因而很多时候我们也就会选择更加”实用“的GRU啦。
gru_seq = nn.GRU(4, 5, num_layers=1, bidirectional=False) # 输入维度 4，输出 5
print gru_seq.weight_ih_l0.shape # (15L, 4L)
print gru_seq.weight_hh_l0.shape # (15L, 5L) 对应于2个gate和公式(3)中的权重


Multi-Dimensional LSTM

As the number of paths in a grid grows combinatorially with the size of each dimension and the total number of dimensions N, the values in m can grow at the same rate due to the unconstrained summation in Eq. 4(上述公式). This can cause instability for large grids, and adding cells along the depth dimension increases N and exacerbates the problem —Grid LSTM paper

3D的也一样，3对h’和m’

• 广告
• 抄袭
• 版权
• 政治
• 色情
• 无意义
• 其他

120