笔记来自课程《Pytorch深度学习实践》Lecture 12
RNN结构
RNN Cell
cell = torch.nn.RNNCell(input_size=input_size, hidden_size=hidden_size)
hidden = cell(input, hidden)
cell()中的input - input of shape (batch, input_size)
cell()中的hidden - hidden of shape (batch, hidden_size)等号前的hidden - hidden of shape (batch, hidden_size)
• Suppose we have sequence with below properties:
• 𝒃𝒂𝒕𝒄𝒉𝑺𝒊𝒛𝒆 = 1
• 𝒔𝒆𝒒𝑳𝒆𝒏 = 3
• 𝒊𝒏𝒑𝒖𝒕𝑺𝒊𝒛𝒆 = 4
• h𝑖𝑑𝑑𝑒𝑛𝑆𝑖𝑧𝑒 = 2
• So the shape of inputs and outputs of RNNCell:
• 𝑖𝑛𝑝𝑢𝑡.𝑠h𝑎𝑝𝑒 = (𝑏𝑎𝑡𝑐h𝑆𝑖𝑧𝑒,𝑖𝑛𝑝𝑢𝑡𝑆𝑖𝑧𝑒)
• 𝑜𝑢𝑡𝑝𝑢𝑡. 𝑠h𝑎𝑝𝑒 = (𝑏𝑎𝑡𝑐h𝑆𝑖𝑧𝑒, h𝑖𝑑𝑑𝑒𝑛𝑆𝑖𝑧𝑒)
• The sequence can be warped in one Tensor with shape:
• 𝑑𝑎𝑡𝑎𝑠𝑒𝑡.𝑠h𝑎𝑝𝑒= (𝒔𝒆𝒒𝑳𝒆𝒏, 𝒃𝒂𝒕𝒄𝒉𝑺𝒊𝒛𝒆, 𝒊𝒏𝒑𝒖𝒕𝑺𝒊𝒛𝒆)
代码:
import torch
batch_size = 1
seq_len = 3
input_size = 4
hidden_size = 2
cell = torch.nn.RNNCell(input_size=input_size, hidden_size=hidden_size)
# (seq, batch, features)
dataset = torch.randn(seq_len, batch_size, input_size)
hidden = torch.zeros(batch_size, hidden_size)
for idx, input in enumerate(dataset):
print('=' * 20, idx, '=' * 20)
print('Input size: ', input.shape)
hidden = cell(input, hidden)
print('outputs size: ', hidden.shape)
print(hidden)
输出结果:
========== 0 ==========
Input size: torch.Size([1, 4])
outputs size: torch.Size([1, 2])
tensor([[-0.1579, 0.5140]])
========== 1 ==========
Input size: torch.Size([1, 4])
outputs size: torch.Size([1, 2])
tensor([[-0.9577, 0.6502]])
========== 2 ==========
Input size: torch.Size([1, 4])
outputs size: torch.Size([1, 2])
tensor([[-0.7661, -0.9128]])
使用RNN (instead of RNN Cell)
cell = torch.nn.RNN(input_size=input_size, hidden_size=hidden_size,
num_layers=num_layers)
out, hidden = cell(inputs, hidden)
inputs就是x1一直到xn,也就是输入序列,cell里的hidden指的是h0
out是h1到hn,hidden(等号前的)是hn,如下图
• Suppose we have sequence with below properties:
• 𝒃𝒂𝒕𝒄𝒉𝑺𝒊𝒛𝒆
• 𝑠𝑒𝑞𝐿𝑒𝑛
• 𝑖𝑛𝑝𝑢𝑡𝑆𝑖𝑧𝑒,𝒉𝒊𝒅𝒅𝒆𝒏𝑺𝒊𝒛𝒆,
• 𝒏𝒖𝒎𝑳𝒂𝒚𝒆𝒓𝒔
• The shape of input and h_0 of RNN:
• 𝑖𝑛𝑝𝑢𝑡.𝑠h𝑎𝑝𝑒 = 𝑠𝑒𝑞𝐿𝑒𝑛,𝑏𝑎𝑡𝑐h𝑆𝑖𝑧𝑒,𝑖𝑛𝑝𝑢𝑡𝑆𝑖𝑧𝑒
• h_0. 𝑠h𝑎𝑝𝑒 = 𝑛𝑢𝑚𝐿𝑎𝑦𝑒𝑟𝑠, 𝑏𝑎𝑡𝑐h𝑆𝑖𝑧𝑒, h𝑖𝑑𝑑𝑒𝑛𝑆𝑖𝑧𝑒
• The shape of output and h_n of RNN:
•𝑜𝑢𝑡𝑝𝑢𝑡.𝑠h𝑎𝑝𝑒 = 𝑠𝑒𝑞𝐿𝑒𝑛, 𝑏𝑎𝑡𝑐h𝑆𝑖𝑧𝑒, h𝑖𝑑𝑑𝑒𝑛𝑆𝑖𝑧𝑒
•h_𝑛.𝑠h𝑎𝑝𝑒 = 𝒏𝒖𝒎𝑳𝒂𝒚𝒆𝒓𝒔, 𝒃𝒂𝒕𝒄𝒉𝑺𝒊𝒛𝒆, 𝒉𝒊𝒅𝒅𝒆𝒏𝑺𝒊𝒛𝒆
代码如下:
import torch
batch_size = 1
seq_len = 3
input_size = 4
hidden_size = 2
num_layers = 1
cell = torch.nn.RNN(input_size=input_size, hidden_size=hidden_size,
num_layers=num_layers)
# (seqLen, batchSize, inputSize)
inputs = torch.randn(seq_len, batch_size, input_size)
hidden = torch.zeros(num_layers, batch_size, hidden_size)
out, hidden = cell(inputs, hidden)
print('Output size:', out.shape)
print('Output:', out)
print('Hidden size: ', hidden.shape)
print('Hidden: ', hidden)
例:完成一个序列到序列的映射
如:hello --> ohlol
h
代码:
import torch
input_size = 4
hidden_size = 4
batch_size = 1
idx2char = ['e', 'h', 'l', 'o']
x_data = [1, 0, 2, 2, 3]
y_data = [3, 1, 2, 3, 2]
one_hot_lookup = [[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]]
x_one_hot = [one_hot_lookup[x] for x in x_data]
inputs = torch.Tensor(x_one_hot).view(-1, batch_size, input_size)
labels = torch.LongTensor(y_data).view(-1, 1)
倒数第二行 - Reshape the inputs to (𝒔𝒆𝒒𝑳𝒆𝒏, 𝒃𝒂𝒕𝒄𝒉𝑺𝒊𝒛𝒆, 𝒊𝒏𝒑𝒖𝒕𝑺𝒊𝒛𝒆)
倒数第一行 - Reshape the labels to (𝒔𝒆𝒒𝑳𝒆𝒏, 𝟏)
class Model(torch.nn.Module):
def __init__(self, input_size, hidden_size, batch_size):
super(Model, self).__init
self.batch_size = batch_size
self.input_size = input_size
self.hidden_size = hidden_size
self.rnncell = torch.nn.RNNCell(input_size=self.input_size,
hidden_size=self.hidden_size)
def forward(self, input, hidden):
hidden = self.rnncell(input, hidden)
return hidden
def init_hidden(self):
return torch.zeros(self.batch_size, self.hidden_size)
net = Model(input_size, hidden_size, batch_size)
init_hidden() - 生成h0
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.1)
for epoch in range(15):
loss = 0
optimizer.zero_grad()
hidden = net.init_hidden()
print('Predicted string: ', end='')
for input, label in zip(inputs, labels):
hidden = net(input, hidden)
loss += criterion(hidden, label)
_, idx = hidden.max(dim=1)
print(idx2char[idx.item()], end='')
loss.backward()
optimizer.step()
print(', Epoch [%d/15] loss=%.4f' % (epoch+1, loss.item()))
结果:
Predicted string: loeeh, Epoch [1/15] loss=8.0117
Predicted string: oooeh, Epoch [2/15] loss=7.2082
Predicted string: ooooh, Epoch [3/15] loss=6.6208
Predicted string: ooooh, Epoch [4/15] loss=6.1802
Predicted string: ooooh, Epoch [5/15] loss=5.8060
Predicted string: ooooh, Epoch [6/15] loss=5.4739
Predicted string: ooooh, Epoch [7/15] loss=5.1593
Predicted string: ooool, Epoch [8/15] loss=4.8593
Predicted string: ooool, Epoch [9/15] loss=4.5819
Predicted string: ohool, Epoch [10/15] loss=4.3287
Predicted string: ohlol, Epoch [11/15] loss=4.0909
Predicted string: ohlol, Epoch [12/15] loss=3.8573
Predicted string: ohlol, Epoch [13/15] loss=3.6246
Predicted string: ohlol, Epoch [14/15] loss=3.4007
Predicted string: ohlol, Epoch [15/15] loss=3.2005
使用RNN
class Model(torch.nn.Module):
def __init__(self, input_size, hidden_size, batch_size, num_layers=1):
super(Model, self).__init__()
self.num_layers = num_layers
self.batch_size = batch_size
self.input_size = input_size
self.hidden_size = hidden_size
self.rnn = torch.nn.RNN(input_size=self.input_size,
hidden_size=self.hidden_size,
num_layers=num_layers)
def forward(self, input):
hidden = torch.zeros(self.num_layers,
self.batch_size,
self.hidden_size)
out, _ = self.rnn(input, hidden)
return out.view(-1, self.hidden_size)
net = Model(input_size, hidden_size, batch_size, num_layers)
idx2char = ['e', 'h', 'l', 'o']
x_data = [1, 0, 2, 2, 3]
y_data = [3, 1, 2, 3, 2]
one_hot_lookup = [[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]]
x_one_hot = [one_hot_lookup[x] for x in x_data]
inputs = torch.Tensor(x_one_hot).view(seq_len, batch_size, input_size)
labels = torch.LongTensor(y_data)
• One-hot encoding of words and characters
• The one-hot vectors are high-dimension.
• The one-hot vectors are sparse.
• The one-hot vectors are hardcoded.
• Do we have a way to associate a vector with a word/character with following specification:
• Lower-dimension
• Dense
• Learned from data
• A popular and powerful way is called EMBEDDING.
one-hot与embedding的对比图
因此可以这样优化模型:
代码:
class Model(torch.nn.Module):
def __init__(self):
super(Model, self).__init__()
self.emb = torch.nn.Embedding(input_size, embedding_size)
self.rnn = torch.nn.RNN(input_size=embedding_size,
hidden_size=hidden_size,
num_layers=num_layers,
batch_first=True)
self.fc = torch.nn.Linear(hidden_size, num_class)
def forward(self, x):
hidden = torch.zeros(num_layers, x.size(0), hidden_size)
x = self.emb(x) # (batch, seqLen, embeddingSize)
x, _ = self.rnn(x, hidden)
x = self.fc(x)
return x.view(-1, num_class)