这节课知识密度好大,加上之前没怎么仔细了解过RNN,看得整个人懵懵的,但整理下来感觉难度还可以,所以准备多记录一下理论帮助大家一起学习~
一、理论知识&课堂代码
1 RNN单元计算过程
一个rnn单元本质上就是一个线性层,只是这个线性层一直在被输入序列循环使用
rnn单元的输入一共有两个:当前的序列输入和隐层输出(也就是上一次自身的输出)。因此最开始进行输入时我们需要根据是否有先验知识设定一个,没有则设为0;
RNN单元的输出则为;
他的计算过程如下图,假设的维度是hidden_size,的维度是input_size,RNN单元内部通过两个形状不同的,将,映射到Hidden_size维,相加后经过激活函数输出。虽然这里表示的是2个,但实际上把和拼起来,用一个,也就是一个线性层就能实现全部计算。
2 RNN的构造:
RNN的构造方式有两种
(1) 构建RNNCell
需要循环使用
输入:input(单个序列输入,shape:(batch, input_size)), hidden(或者,shape:(batch,hidden_size)),
输出:(shape:(batch, hidden_size)
#定义:
cell = torch.nn.RNNCell(input_size=input_size, hidden_size=hidden_size)
#调用 input of shape:(batch, input_size) ; hidden of shape:(batch_size, hidden_size)
hidden = cell(input, hidden)
import torch
batch_size = 1
seq_len = 3
input_size = 4
hidden_size = 2
cell = torch.nn.RNNCell(input_size=input_size, hidden_size=hidden_size)
#(seq, batch, features)注意dataset的shape:(seqlen,batch_size,input_size)
dataset = torch.randn(seq_len, batch_size, input_size)
hidden = torch.zeros(batch_size, hidden_size)
for idx, input in enumerate(dataset):
print('='*20, idx, '='*20)
print('input size:', input.shape)
hidden = cell(input, hidden)
print('output size:', hidden.shape)
print(hidden)
(2) 构建RNN
直接构建多层结构,如下图所示。
输入:input(整个输入序列,shape:(seqSize, batch, inpiy_size)), hidden_size(, shape:(numLayers,batch,hidden_size)), num_layers(RNN的层数)
输出:out为~(shape:(seqLen,batch,hidden_size)),hidden为(shape:(numLayers,batch,hidden_size))
#定义
cell = torch.nn.RNN(input_size=input_size, hidden_size=hidden_size, num_layers=num_layers)
#调用
out, hidden = cell(inputs, hidden)
import torch
batch_size = 1
seq_len = 3
hidden_size = 2
input_size = 4
num_layers = 1
cell = torch.nn.RNN(input_size= input_size, hidden_size=hidden_size, num_layers=num_layers)
input = torch.rand(seq_len, batch_size, input_size)
hidden = torch.zeros(num_layers, batch_size, hidden_size)
out, hidden = cell(input, hidden)
print('Output size:', out.shape)
print('Output:', out)
print('Hidden size: ', hidden.shape)
print('Hidden: ', hidden)
3 训练模型:‘hello’→‘ohlol’
notice:这里会把输入序列先转为one hot编码,因为输入得是由数字构成的向量。具体来说,就是先对出现的所有字母都构造一个词典,也就是给个索引。比如ehlo分别是0123,那么‘hello’对应的就是[1,0,2,2,3]。
接着把它转变成向量,向量的长度就是词典的元素的数量。比如我们构建的词典一共由 ehlo 4个字母组成,而‘l’对应的是‘2’,那么转成相应的向量就是[0,0,1,0]。
这个操作在我的理解里其实就相当于给所有出现的字母贴上个label,然后把字母转换成这个字母在词典里的概率分布作为模型的输入,通过模型计算得到输出,在计算loss时通过softmax得到另一个概率分布。
那y为什么不转成onehot?因为计算损失时我们给的y就是label,label在CrossEntropyLoss里会自己转成一个概率分布来计算损失,所以这里的y和之前一样,直接用label就可以了
(1) RNNCell
import torch
input_size = 4
hidden_size = 4
batch_size = 1
idx2char = ['e', 'h', 'l', 'o']
x_data = [1, 0, 2, 2, 3]
y_data = [3, 1, 2, 3, 2]
one_hot_lookup = [[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]]
x_one_hot = [one_hot_lookup[x] for x in x_data]
inputs = torch.Tensor(x_one_hot).view(-1, batch_size, input_size) #-1表示自动计算 seqlen, batch, input size
labels = torch.LongTensor(y_data).view(-1, 1) #seqlen, 1
class Model(torch.nn.Module): #batchsize主要用来生成初始的h0,如果模型在外面设定好了这里也可以不定义
def __init__(self, input_size, hidden_size, batch_size):
super(Model, self).__init__()
self.input_size = input_size
self.batch_size = batch_size
self.hidden_size = hidden_size
self.rnncell = torch.nn.RNNCell(input_size=self.input_size, hidden_size=self.hidden_size)
def forward(self, input, hidden):
hidden = self.rnncell(input, hidden)
return hidden
def init_hidden(self):
return torch.zeros(self.batch_size, self.hidden_size)
net = Model(input_size, hidden_size, batch_size)
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.1)
for epoch in range(30):
loss = 0
optimizer.zero_grad()
hidden = net.init_hidden() #初始化h0
print('Predicted String:',end='')
for input, label in zip(inputs, labels):
hidden = net(input, hidden)
loss += criterion(hidden, label) #这里的损失是要构建计算图的,不要用item
_, idx = hidden.max(dim=1)
print(idx2char[idx.item()],end='')
loss.backward()
optimizer.step()
print(', Epoch[%d/15] loss=%.4f' % (epoch+1, loss.item()))
(2) RNN
import torch
input_size = 4
hidden_size = 4
batch_size = 1
num_layers = 1
seq_len = 5
idx2char = ['e', 'h', 'l', 'o']
x_data = [1, 0, 2, 2, 3]
y_data = [3, 1, 2, 3, 2]
one_hot_lookup = [[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]]
x_one_hot = [one_hot_lookup[x] for x in x_data]
class RNNModel(torch.nn.Module):
def __init__(self, input_size, hidden_size, batch_size, num_layers=1):
super(RNNModel, self).__init__()
self.input_size = input_size
self.batch_size = batch_size
self.hidden_size = hidden_size
self.num_layers = num_layers
self.rnn = torch.nn.RNN(input_size=self.input_size, hidden_size=self.hidden_size, num_layers=self.num_layers)
def forward(self, input):
hidden = torch.zeros(self.num_layers,
self.batch_size,
self.hidden_size)
out, _ = self.rnn(input, hidden)
return out.view(-1, self.hidden_size)
net = RNNModel(input_size, hidden_size, batch_size, num_layers)
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.05)
for epoch in range(15):
optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
_,idx = outputs.max(dim=1)
idx = idx.data.numpy()
print('Predicted:',''.join([idx2char[x] for x in idx]), end='')
print(',Epoch [%d/15] loss = %.3f' % (epoch+1, loss.item()))
4 改进
独热向量缺点:1.维度高 2.过于稀疏 3.硬编码,而非学习出来的
改善思路:低维、稠密、学习得到
方案:Embedding,将高维、稀疏的独热编码映射到低维、稠密的空间里(也就是降维)
(线性层:当hidden_size与类别数不一致时,保证输出的维度和类别数一致)
import torch
input_size = 4
hidden_size = 8
batch_size = 1
num_layers = 2
seq_len = 5
embedding_size = 10
num_class = 4
idx2char = ['e', 'h', 'l', 'o']
x_data = [[1, 0, 2, 2, 3]]
y_data = [3, 1, 2, 3, 2]
inputs = torch.LongTensor(x_data)
labels = torch.LongTensor(y_data)
class model(torch.nn.Module):
def __init__(self):
super(model, self).__init__()
self.emb = torch.nn.Embedding(input_size, embedding_size)
self.rnn = torch.nn.RNN(input_size=embedding_size, hidden_size=hidden_size,num_layers=num_layers,batch_first=True)
self.fc = torch.nn.Linear(hidden_size, num_class)
def forward(self, x):
hidden = torch.zeros(num_layers, x.size(0), hidden_size)
x = self.emb(x)
x, _ = self.rnn(x, hidden)
x = self.fc(x)
return x.view(-1, num_class)
net = model()
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.05)
for epoch in range(15):
optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
_, idx = outputs.max(dim=1)
idx = idx.data.numpy()
print('Predicted:',''.join([idx2char[x] for x in idx]), end='')
print(',Epoch [%d/15] loss = %.3f' % (epoch+1, loss.item()))
二、课堂作业
做作业的时候我没有仔细去看他的理论,在pytorch文档里看了下输入输出,改改模型跑通了就结束了,因为我主要是想学工具的。大家有兴趣的话可以自行深入理解~
1.LSTM
文档:LSTM — PyTorch 2.4 documentation
class LSTMmodel(torch.nn.Module):
def __init__(self):
super(LSTMmodel, self).__init__()
self.emb = torch.nn.Embedding(input_size, embedding_size)
self.LSTM = torch.nn.LSTM(input_size=embedding_size, hidden_size=hidden_size,num_layers=num_layers, batch_first=True)
self.fc = torch.nn.Linear(hidden_size, num_class)
def forward(self, x):
hidden = torch.zeros(num_layers, x.size(0), hidden_size)
c0 = torch.zeros(num_layers, x.size(0), hidden_size)
x = self.emb(x)
x, _ = self.LSTM(x,(hidden, c0))
x = self.fc(x)
return x.view(-1, num_class)
net = LSTMmodel()
2.GRU
文档:GRU — PyTorch 2.4 documentation
class GRUmodel(torch.nn.Module):
def __init__(self):
super(GRUmodel, self).__init__()
self.emb = torch.nn.Embedding(input_size, embedding_size)
self.GRU = torch.nn.GRU(input_size=embedding_size, hidden_size=hidden_size,num_layers=num_layers, batch_first=True)
self.fc = torch.nn.Linear(hidden_size, num_class)
def forward(self, x):
hidden = torch.zeros(num_layers, x.size(0), hidden_size)
x = self.emb(x)
x, _ = self.GRU(x, hidden)
x = self.fc(x)
return x.view(-1, num_class)
net = GRUmodel()