pytorch深度学习实践(刘二大人)课堂代码&作业——循环神经网络基础篇

       这节课知识密度好大,加上之前没怎么仔细了解过RNN,看得整个人懵懵的,但整理下来感觉难度还可以,所以准备多记录一下理论帮助大家一起学习~

一、理论知识&课堂代码

1 RNN单元计算过程

一个rnn单元本质上就是一个线性层,只是这个线性层一直在被输入序列循环使用

rnn单元的输入一共有两个:当前的序列输入x_{t}和隐层输出(也就是上一次自身的输出)h_{t-1}。因此最开始进行输入时我们需要根据是否有先验知识设定一个h_{0},没有则设为0;

RNN单元的输出则为h_{t}

他的计算过程如下图,假设h_{t}的维度是hidden_size,x_{t}的维度是input_size,RNN单元内部通过两个形状不同的wbh_{t-1}x_{t}映射到Hidden_size维,相加后经过激活函数输出。虽然这里表示的是2个w,但实际上把h_{t-1}x_{t}拼起来,用一个w,也就是一个线性层就能实现全部计算。

 2 RNN的构造:

RNN的构造方式有两种

(1) 构建RNNCell

需要循环使用

输入:input(单个序列输入x_{t},shape:(batch, input_size)), hidden(h_{0}或者h_{t-1},shape:(batch,hidden_size)),

输出:h_{t}(shape:(batch, hidden_size)

#定义:
cell = torch.nn.RNNCell(input_size=input_size, hidden_size=hidden_size)
#调用  input of shape:(batch, input_size) ; hidden of shape:(batch_size, hidden_size)
hidden = cell(input, hidden)
import torch

batch_size = 1
seq_len = 3
input_size = 4
hidden_size = 2

cell = torch.nn.RNNCell(input_size=input_size, hidden_size=hidden_size)

#(seq, batch, features)注意dataset的shape:(seqlen,batch_size,input_size)
dataset = torch.randn(seq_len, batch_size, input_size)
hidden = torch.zeros(batch_size, hidden_size)

for idx, input in enumerate(dataset):
    print('='*20, idx, '='*20)
    print('input size:', input.shape)

    hidden = cell(input, hidden)

    print('output size:', hidden.shape)
    print(hidden)

(2) 构建RNN

直接构建多层结构,如下图所示。

输入:input(整个输入序列,shape:(seqSize, batch, inpiy_size)), hidden_size(h_{0}, shape:(numLayers,batch,hidden_size)), num_layers(RNN的层数)

输出:out为h_{1}~h_{N}(shape:(seqLen,batch,hidden_size)),hidden为h_{N}(shape:(numLayers,batch,hidden_size))

#定义
cell = torch.nn.RNN(input_size=input_size, hidden_size=hidden_size, num_layers=num_layers)
#调用
out, hidden = cell(inputs, hidden)
import torch

batch_size = 1
seq_len = 3
hidden_size = 2
input_size = 4
num_layers = 1

cell = torch.nn.RNN(input_size= input_size, hidden_size=hidden_size, num_layers=num_layers)

input = torch.rand(seq_len, batch_size, input_size)
hidden = torch.zeros(num_layers, batch_size, hidden_size)

out, hidden = cell(input, hidden)

print('Output size:', out.shape)
print('Output:', out)
print('Hidden size: ', hidden.shape)
print('Hidden: ', hidden)

3 训练模型:‘hello’→‘ohlol’

notice:这里会把输入序列先转为one hot编码,因为输入得是由数字构成的向量。具体来说,就是先对出现的所有字母都构造一个词典,也就是给个索引。比如ehlo分别是0123,那么‘hello’对应的就是[1,0,2,2,3]。

接着把它转变成向量,向量的长度就是词典的元素的数量。比如我们构建的词典一共由 ehlo 4个字母组成,而‘l’对应的是‘2’,那么转成相应的向量就是[0,0,1,0]。

这个操作在我的理解里其实就相当于给所有出现的字母贴上个label,然后把字母转换成这个字母在词典里的概率分布作为模型的输入,通过模型计算得到输出,在计算loss时通过softmax得到另一个概率分布。

那y为什么不转成onehot?因为计算损失时我们给的y就是label,label在CrossEntropyLoss里会自己转成一个概率分布来计算损失,所以这里的y和之前一样,直接用label就可以了

(1) RNNCell

import torch

input_size = 4
hidden_size = 4
batch_size = 1

idx2char = ['e', 'h', 'l', 'o']
x_data = [1, 0, 2, 2, 3]
y_data = [3, 1, 2, 3, 2]

one_hot_lookup = [[1, 0, 0, 0],
                  [0, 1, 0, 0],
                  [0, 0, 1, 0],
                  [0, 0, 0, 1]]
x_one_hot = [one_hot_lookup[x] for x in x_data]

inputs = torch.Tensor(x_one_hot).view(-1, batch_size, input_size) #-1表示自动计算 seqlen, batch, input size
labels = torch.LongTensor(y_data).view(-1, 1) #seqlen, 1

class Model(torch.nn.Module):  #batchsize主要用来生成初始的h0,如果模型在外面设定好了这里也可以不定义
    def __init__(self, input_size, hidden_size, batch_size):
        super(Model, self).__init__()
        self.input_size = input_size
        self.batch_size = batch_size
        self.hidden_size = hidden_size
        self.rnncell = torch.nn.RNNCell(input_size=self.input_size, hidden_size=self.hidden_size)

    def forward(self, input, hidden):
        hidden = self.rnncell(input, hidden)
        return hidden

    def init_hidden(self):
        return torch.zeros(self.batch_size, self.hidden_size)
    
net = Model(input_size, hidden_size, batch_size)

criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.1)

for epoch in range(30):
    loss = 0
    optimizer.zero_grad()
    hidden = net.init_hidden()  #初始化h0
    print('Predicted String:',end='')
    for input, label in zip(inputs, labels):
        hidden = net(input, hidden)
        loss += criterion(hidden, label)  #这里的损失是要构建计算图的,不要用item
        _, idx = hidden.max(dim=1)
        print(idx2char[idx.item()],end='')
    loss.backward()
    optimizer.step()
    print(', Epoch[%d/15] loss=%.4f' % (epoch+1, loss.item()))

(2) RNN

import torch

input_size = 4
hidden_size = 4
batch_size = 1
num_layers = 1
seq_len = 5

idx2char = ['e', 'h', 'l', 'o']
x_data = [1, 0, 2, 2, 3]
y_data = [3, 1, 2, 3, 2]

one_hot_lookup = [[1, 0, 0, 0],
                  [0, 1, 0, 0],
                  [0, 0, 1, 0],
                  [0, 0, 0, 1]]
x_one_hot = [one_hot_lookup[x] for x in x_data]

class RNNModel(torch.nn.Module):
    def __init__(self, input_size, hidden_size, batch_size, num_layers=1):
        super(RNNModel, self).__init__()
        self.input_size = input_size
        self.batch_size = batch_size
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.rnn = torch.nn.RNN(input_size=self.input_size, hidden_size=self.hidden_size, num_layers=self.num_layers)
    
    def forward(self, input):
        hidden = torch.zeros(self.num_layers,
                             self.batch_size,
                             self.hidden_size)
        out, _ = self.rnn(input, hidden)
        return out.view(-1, self.hidden_size)

net = RNNModel(input_size, hidden_size, batch_size, num_layers)

criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.05)

for epoch in range(15):
    optimizer.zero_grad()
    outputs = net(inputs)
    loss = criterion(outputs, labels)
    loss.backward()
    optimizer.step()

    _,idx = outputs.max(dim=1)
    idx = idx.data.numpy()
    print('Predicted:',''.join([idx2char[x] for x in idx]), end='')
    print(',Epoch [%d/15] loss = %.3f' % (epoch+1, loss.item()))

4 改进

独热向量缺点:1.维度高 2.过于稀疏 3.硬编码,而非学习出来的

改善思路:低维、稠密、学习得到

方案:Embedding,将高维、稀疏的独热编码映射到低维、稠密的空间里(也就是降维)

(线性层:当hidden_size与类别数不一致时,保证输出的维度和类别数一致)

import torch

input_size = 4
hidden_size = 8
batch_size = 1
num_layers = 2
seq_len = 5
embedding_size = 10
num_class = 4

idx2char = ['e', 'h', 'l', 'o']
x_data = [[1, 0, 2, 2, 3]]
y_data = [3, 1, 2, 3, 2]


inputs = torch.LongTensor(x_data)
labels = torch.LongTensor(y_data)

class model(torch.nn.Module):
    def __init__(self):
        super(model, self).__init__()
        self.emb = torch.nn.Embedding(input_size, embedding_size)
        self.rnn = torch.nn.RNN(input_size=embedding_size, hidden_size=hidden_size,num_layers=num_layers,batch_first=True)
        self.fc = torch.nn.Linear(hidden_size, num_class)

    def forward(self, x):
        hidden = torch.zeros(num_layers, x.size(0), hidden_size)
        x = self.emb(x)
        x, _ = self.rnn(x, hidden)
        x = self.fc(x)
        return x.view(-1, num_class)
    
net = model()

criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.05)

for epoch in range(15):
    optimizer.zero_grad()
    outputs = net(inputs)
    loss = criterion(outputs, labels)
    loss.backward()
    optimizer.step()

    _, idx = outputs.max(dim=1)
    idx = idx.data.numpy()
    print('Predicted:',''.join([idx2char[x] for x in idx]), end='')
    print(',Epoch [%d/15] loss = %.3f' % (epoch+1, loss.item()))   

二、课堂作业

做作业的时候我没有仔细去看他的理论,在pytorch文档里看了下输入输出,改改模型跑通了就结束了,因为我主要是想学工具的。大家有兴趣的话可以自行深入理解~

1.LSTM

文档:LSTM — PyTorch 2.4 documentation

class LSTMmodel(torch.nn.Module):
    def __init__(self):
        super(LSTMmodel, self).__init__()
        self.emb = torch.nn.Embedding(input_size, embedding_size)
        self.LSTM = torch.nn.LSTM(input_size=embedding_size, hidden_size=hidden_size,num_layers=num_layers, batch_first=True)
        self.fc = torch.nn.Linear(hidden_size, num_class)

    def forward(self, x):
        hidden = torch.zeros(num_layers, x.size(0), hidden_size)
        c0 = torch.zeros(num_layers, x.size(0), hidden_size)
        x = self.emb(x)
        x, _ = self.LSTM(x,(hidden, c0))
        x = self.fc(x)
        return x.view(-1, num_class)
    
net = LSTMmodel()
 

2.GRU

文档:GRU — PyTorch 2.4 documentation

class GRUmodel(torch.nn.Module):
    def __init__(self):
        super(GRUmodel, self).__init__()
        self.emb = torch.nn.Embedding(input_size, embedding_size)
        self.GRU = torch.nn.GRU(input_size=embedding_size, hidden_size=hidden_size,num_layers=num_layers, batch_first=True)
        self.fc = torch.nn.Linear(hidden_size, num_class)

    def forward(self, x):
        hidden = torch.zeros(num_layers, x.size(0), hidden_size)
        x = self.emb(x)
        x, _ = self.GRU(x, hidden)
        x = self.fc(x)
        return x.view(-1, num_class)

net = GRUmodel()

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值