RNN - Pytorch

笔记来自课程《Pytorch深度学习实践》Lecture 12

RNN结构

RNN Cell

cell = torch.nn.RNNCell(input_size=input_size, hidden_size=hidden_size)
 
hidden = cell(input, hidden)

cell()中的input - input of shape (batch, input_size)
cell()中的hidden - hidden of shape (batch, hidden_size)

等号前的hidden - hidden of shape (batch, hidden_size)

• Suppose we have sequence with below properties:

• 𝒃𝒂𝒕𝒄𝒉𝑺𝒊𝒛𝒆 = 1
• 𝒔𝒆𝒒𝑳𝒆𝒏 = 3
• 𝒊𝒏𝒑𝒖𝒕𝑺𝒊𝒛𝒆 = 4
• h𝑖𝑑𝑑𝑒𝑛𝑆𝑖𝑧𝑒 = 2

• So the shape of inputs and outputs of RNNCell:

• 𝑖𝑛𝑝𝑢𝑡.𝑠h𝑎𝑝𝑒 = (𝑏𝑎𝑡𝑐h𝑆𝑖𝑧𝑒,𝑖𝑛𝑝𝑢𝑡𝑆𝑖𝑧𝑒)
• 𝑜𝑢𝑡𝑝𝑢𝑡. 𝑠h𝑎𝑝𝑒 = (𝑏𝑎𝑡𝑐h𝑆𝑖𝑧𝑒, h𝑖𝑑𝑑𝑒𝑛𝑆𝑖𝑧𝑒)

• The sequence can be warped in one Tensor with shape:

• 𝑑𝑎𝑡𝑎𝑠𝑒𝑡.𝑠h𝑎𝑝𝑒= (𝒔𝒆𝒒𝑳𝒆𝒏, 𝒃𝒂𝒕𝒄𝒉𝑺𝒊𝒛𝒆, 𝒊𝒏𝒑𝒖𝒕𝑺𝒊𝒛𝒆)

代码:

import torch
batch_size = 1 
seq_len = 3 
input_size = 4 
hidden_size = 2

cell = torch.nn.RNNCell(input_size=input_size, hidden_size=hidden_size)
# (seq, batch, features)
dataset = torch.randn(seq_len, batch_size, input_size) 
hidden = torch.zeros(batch_size, hidden_size)

for idx, input in enumerate(dataset): 
    print('=' * 20, idx, '=' * 20) 
    print('Input size: ', input.shape)
    
    hidden = cell(input, hidden)
    
    print('outputs size: ', hidden.shape) 
    print(hidden)

输出结果:

========== 0 ==========
Input size: torch.Size([1, 4])
outputs size: torch.Size([1, 2])
tensor([[-0.1579, 0.5140]])
========== 1 ==========
Input size: torch.Size([1, 4])
outputs size: torch.Size([1, 2])
tensor([[-0.9577, 0.6502]])
========== 2 ==========
Input size: torch.Size([1, 4])
outputs size: torch.Size([1, 2])
tensor([[-0.7661, -0.9128]])

使用RNN (instead of RNN Cell)

cell = torch.nn.RNN(input_size=input_size, hidden_size=hidden_size,
                    num_layers=num_layers)

 
out, hidden = cell(inputs, hidden)

inputs就是x1一直到xn,也就是输入序列,cell里的hidden指的是h0

out是h1到hn,hidden(等号前的)是hn,如下图

 

• Suppose we have sequence with below properties:

• 𝒃𝒂𝒕𝒄𝒉𝑺𝒊𝒛𝒆
• 𝑠𝑒𝑞𝐿𝑒𝑛
• 𝑖𝑛𝑝𝑢𝑡𝑆𝑖𝑧𝑒,𝒉𝒊𝒅𝒅𝒆𝒏𝑺𝒊𝒛𝒆,
• 𝒏𝒖𝒎𝑳𝒂𝒚𝒆𝒓𝒔

• The shape of input and h_0 of RNN:

• 𝑖𝑛𝑝𝑢𝑡.𝑠h𝑎𝑝𝑒 = 𝑠𝑒𝑞𝐿𝑒𝑛,𝑏𝑎𝑡𝑐h𝑆𝑖𝑧𝑒,𝑖𝑛𝑝𝑢𝑡𝑆𝑖𝑧𝑒
• h_0. 𝑠h𝑎𝑝𝑒 = 𝑛𝑢𝑚𝐿𝑎𝑦𝑒𝑟𝑠, 𝑏𝑎𝑡𝑐h𝑆𝑖𝑧𝑒, h𝑖𝑑𝑑𝑒𝑛𝑆𝑖𝑧𝑒

• The shape of output and h_n of RNN:

•𝑜𝑢𝑡𝑝𝑢𝑡.𝑠h𝑎𝑝𝑒 = 𝑠𝑒𝑞𝐿𝑒𝑛, 𝑏𝑎𝑡𝑐h𝑆𝑖𝑧𝑒, h𝑖𝑑𝑑𝑒𝑛𝑆𝑖𝑧𝑒
•h_𝑛.𝑠h𝑎𝑝𝑒 = 𝒏𝒖𝒎𝑳𝒂𝒚𝒆𝒓𝒔, 𝒃𝒂𝒕𝒄𝒉𝑺𝒊𝒛𝒆, 𝒉𝒊𝒅𝒅𝒆𝒏𝑺𝒊𝒛𝒆

代码如下:

import torch

batch_size = 1 
seq_len = 3 
input_size = 4 
hidden_size = 2 
num_layers = 1

cell = torch.nn.RNN(input_size=input_size, hidden_size=hidden_size,      
                    num_layers=num_layers)

# (seqLen, batchSize, inputSize)
inputs = torch.randn(seq_len, batch_size, input_size) 
hidden = torch.zeros(num_layers, batch_size, hidden_size)
out, hidden = cell(inputs, hidden)

print('Output size:', out.shape) 
print('Output:', out)
print('Hidden size: ', hidden.shape) 
print('Hidden: ', hidden)

 例:完成一个序列到序列的映射

如:hello --> ohlol

 

h 

代码:

import torch
input_size = 4 
hidden_size = 4 
batch_size = 1

 

idx2char = ['e', 'h', 'l', 'o'] 
x_data = [1, 0, 2, 2, 3]
y_data = [3, 1, 2, 3, 2]

one_hot_lookup = [[1, 0, 0, 0], 
                  [0, 1, 0, 0], 
                  [0, 0, 1, 0], 
                  [0, 0, 0, 1]]
x_one_hot = [one_hot_lookup[x] for x in x_data]

inputs = torch.Tensor(x_one_hot).view(-1, batch_size, input_size) 
labels = torch.LongTensor(y_data).view(-1, 1)

倒数第二行 - Reshape the inputs to (𝒔𝒆𝒒𝑳𝒆𝒏, 𝒃𝒂𝒕𝒄𝒉𝑺𝒊𝒛𝒆, 𝒊𝒏𝒑𝒖𝒕𝑺𝒊𝒛𝒆)

倒数第一行 - Reshape the labels to (𝒔𝒆𝒒𝑳𝒆𝒏, 𝟏)

class Model(torch.nn.Module):
    def __init__(self, input_size, hidden_size, batch_size):
        super(Model, self).__init
        self.batch_size = batch_size
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.rnncell = torch.nn.RNNCell(input_size=self.input_size,
                                        hidden_size=self.hidden_size)
    def forward(self, input, hidden): 
        hidden = self.rnncell(input, hidden)
        return hidden
    
    def init_hidden(self):
        return torch.zeros(self.batch_size, self.hidden_size)

net = Model(input_size, hidden_size, batch_size)

init_hidden() - 生成h0

criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.1)
for epoch in range(15): 
    loss = 0
    optimizer.zero_grad()
    hidden = net.init_hidden() 
    print('Predicted string: ', end='')
    for input, label in zip(inputs, labels):
        hidden = net(input, hidden)
        loss += criterion(hidden, label)
        _, idx = hidden.max(dim=1)
        print(idx2char[idx.item()], end='') 
    loss.backward()
    optimizer.step()
    print(', Epoch [%d/15] loss=%.4f' % (epoch+1, loss.item()))

结果:

Predicted string: loeeh, Epoch [1/15] loss=8.0117
Predicted string: oooeh, Epoch [2/15] loss=7.2082
Predicted string: ooooh, Epoch [3/15] loss=6.6208
Predicted string: ooooh, Epoch [4/15] loss=6.1802
Predicted string: ooooh, Epoch [5/15] loss=5.8060
Predicted string: ooooh, Epoch [6/15] loss=5.4739
Predicted string: ooooh, Epoch [7/15] loss=5.1593
Predicted string: ooool, Epoch [8/15] loss=4.8593
Predicted string: ooool, Epoch [9/15] loss=4.5819
Predicted string: ohool, Epoch [10/15] loss=4.3287
Predicted string: ohlol, Epoch [11/15] loss=4.0909
Predicted string: ohlol, Epoch [12/15] loss=3.8573
Predicted string: ohlol, Epoch [13/15] loss=3.6246
Predicted string: ohlol, Epoch [14/15] loss=3.4007
Predicted string: ohlol, Epoch [15/15] loss=3.2005

使用RNN

class Model(torch.nn.Module):
    def __init__(self, input_size, hidden_size, batch_size, num_layers=1):
        super(Model, self).__init__()
        self.num_layers = num_layers
        self.batch_size = batch_size
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.rnn = torch.nn.RNN(input_size=self.input_size,
                                    hidden_size=self.hidden_size,
                                    num_layers=num_layers)
    def forward(self, input):
        hidden = torch.zeros(self.num_layers,
                             self.batch_size,
                             self.hidden_size) 
        out, _ = self.rnn(input, hidden)
        return out.view(-1, self.hidden_size)

net = Model(input_size, hidden_size, batch_size, num_layers)
idx2char = ['e', 'h', 'l', 'o'] 
x_data = [1, 0, 2, 2, 3]
y_data = [3, 1, 2, 3, 2]
one_hot_lookup = [[1, 0, 0, 0], 
                  [0, 1, 0, 0], 
                  [0, 0, 1, 0], 
                  [0, 0, 0, 1]]
x_one_hot = [one_hot_lookup[x] for x in x_data]

inputs = torch.Tensor(x_one_hot).view(seq_len, batch_size, input_size)
labels = torch.LongTensor(y_data)

• One-hot encoding of words and characters

• The one-hot vectors are high-dimension.
• The one-hot vectors are sparse.
• The one-hot vectors are hardcoded.

• Do we have a way to associate a vector with a word/character with following specification:

• Lower-dimension
• Dense
• Learned from data

• A popular and powerful way is called EMBEDDING.

one-hot与embedding的对比图

因此可以这样优化模型:

 

代码:

class Model(torch.nn.Module): 
    def __init__(self):
        super(Model, self).__init__()
        self.emb = torch.nn.Embedding(input_size, embedding_size) 
        self.rnn = torch.nn.RNN(input_size=embedding_size,
                                hidden_size=hidden_size, 
                                num_layers=num_layers, 
                                batch_first=True)
        self.fc = torch.nn.Linear(hidden_size, num_class)
    def forward(self, x):
        hidden = torch.zeros(num_layers, x.size(0), hidden_size) 
        x = self.emb(x) # (batch, seqLen, embeddingSize)
        x, _ = self.rnn(x, hidden)
        x = self.fc(x)
        return x.view(-1, num_class)

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值