刘二大人 PyTorch深度学习实践 笔记 P12 循环神经网络(基础篇)
P12 循环神经网络(基础篇)
一、基本概念
1、Basic_RNN
Basic_RNN 直觉上可能觉得结构很复杂,但是其实很简单,就是对之前线性层的复用。它是专门用于处理具有时间序列的数据的问题的网络,采用权重共享的概念来减少需要训练的权重的数量。
举例: 用前三天的温度、气压、是否下雨,去预测第四天是否下雨。
不仅要考虑x1(第一天的天气状况)、x2(第二天的天气状况)、x3(第三天的天气状况)之间的连接关系,还要考虑到它们之前的先后时间序列关系,因为后一天的天气状况跟前一天的天气有很大程度的关系,跟前几天也有一些关系。
自然语言,金融股市等等,都是根据时间序列的数据,因此我们需要采用 RNN 来处理相关的数据。
2、RNN Cell
RNN Cell 的本质是一个线性层 linear ,只不过我们是采用共享权重的方式不断地使用同一个线性层,将时序数据每一项 xi 都送进 RNN Cell,只不过我们再将下一个输入 xi+1 送进线性层的同时顺手把上一个隐藏层的输出 hidden 一起传回到线性层 linear 。以此类推,循环计算 xi+2 、xi+3 …
对于 x1 也是如此,额外需要一个 h0,如果有先验知识,就把先验知识作为 h0 送给RNN,例如我们想通过图像来生成文本,我们就用 CNN+fc(全连接测层)生成 h0 来作为输入。如果没有先验知识我们就把 h0 设置成与 h1、h2… 都一样的维度,设成全 0 即可。
RNN Cell 具体实现,采用一个线性层不断地循环计算,类似采用 for 函数的形式。
RNN具体计算过程,tanh范围是 [-1, 1]
二、RNN 的两种实现方式
- 创建 RNN Cell,自己写处理序列的循环
- 直接使用 RNN Cell
1、RNN Cell 单元模块 的实现
输入是 batch_size * input_size,输出 batch_size * hidden_size,共有 seq_len 批。
代码:
# 定义实例化的cell
cell = torch.nn.RNNCell(input_size=input_size, hidden_size=hidden_size)
#调用cell
# input 当前某一个时刻输入值的维度 batch,input_size
# hidden 当前时刻的隐层的维度 batch,hidden_size
hidden = cell(input, hidden)
举例:
import torch
# 定义参数
batch_size = 1
seq_len = 3
input_size = 4
hidden_size = 2
# 构造 RNN Cell,需要两个参数 input_size 和 hidden_size
cell = torch.nn.RNNCell(input_size=input_size, hidden_size=hidden_size)
# 最重要的:把维度弄清楚
# seq_len放在前面,相对来说比较自然
dataset = torch.randn(seq_len, batch_size, input_size)
# 隐藏层设置成全0
hidden = torch.zeros(batch_size, hidden_size)
# 训练循环
for idx, input in enumerate(dataset):
print('=' * 20, idx, '=' * 20)
# input 当前某一个时刻输入值的维度 batch,input_size 1, 4
print('Input size:', input.shape)
print(input)
hidden = cell(input, hidden)
# hidden 当前时刻的隐层的维度 batch,hidden_size 1, 2
print('output size: ', hidden.shape)
print(hidden)
输出:
==================== 0 ====================
Input size: torch.Size([1, 4])
tensor([[-0.3654, 0.4623, -0.2061, 0.6364]])
output size: torch.Size([1, 2])
tensor([[0.6059, 0.3521]], grad_fn=<TanhBackward0>)
==================== 1 ====================
Input size: torch.Size([1, 4])
tensor([[ 0.5143, -0.4748, 1.5344, 0.3012]])
output size: torch.Size([1, 2])
tensor([[ 0.6909, -0.6904]], grad_fn=<TanhBackward0>)
==================== 2 ====================
Input size: torch.Size([1, 4])
tensor([[-0.6538, -0.5483, 1.3151, -0.5661]])
output size: torch.Size([1, 2])
tensor([[0.7328, 0.1342]], grad_fn=<TanhBackward0>)
2、直接使用 RNN Cell 的实现
代码:
# inputs 是全部序列的输入的维度 seqSize,batch,input_size
cell = torch.nn.RNN(input_size=input_size, hidden_size=hidden_size, num_layers=num_layers)
# hidden 是所有隐藏层的隐层维度 numLayers,batch,hidden_size
out, hidden = cell(inputs, hidden)
# out 指 h1~hn seqLen,batch,hidden_size
代码实现:
import torch
# 定义参数
batch_size = 1
seq_len = 3
input_size = 4
hidden_size = 2
# 多了一个参数
num_layers = 1
# 实例化
cell = torch.nn.RNN(input_size=input_size, hidden_size=hidden_size,
num_layers=num_layers)
# inputs: seqSize,batch,input_size
inputs = torch.randn(seq_len, batch_size, input_size)
print('Input size:', inputs.shape)
print(inputs)
# hidden: numLayers,batch,hidden_size
hidden = torch.zeros(num_layers, batch_size, hidden_size)
print('Hidden size:', hidden.shape)
print('Hidden:', hidden)
# 直接使用RNN,不需要再写循环了
# out 指 h1~hn seqLen,batch,hidden_size
out, hidden = cell(inputs, hidden)
print('Output size:', out.shape)
print('Output:', out)
输出:
Input size: torch.Size([3, 1, 4])
tensor([[[ 0.3477, -0.6693, 0.4090, 0.5715]],
[[-0.5370, 0.6952, 1.0242, -2.5068]],
[[-1.7401, 1.9239, 0.7664, 1.1838]]])
Hidden size: torch.Size([1, 1, 2])
Hidden: tensor([[[0., 0.]]])
Output size: torch.Size([3, 1, 2])
Output: tensor([[[ 0.8264, 0.1188]],
[[ 0.7483, -0.9279]],
[[ 0.6931, -0.9795]]], grad_fn=<StackBackward0>)
batch_first 设置为 True,batch_size 和 seq_len 需要更换一下位置,系统会自动更换其位置,输出纬度与上例相同。
import torch
# 定义参数
batch_size = 1
seq_len = 3
input_size = 4
hidden_size = 2
# 多了一个参数
num_layers = 1
# 实例化
cell = torch.nn.RNN(input_size=input_size, hidden_size=hidden_size,
num_layers=num_layers, batch_first=True)
# inputs: seqSize,batch,input_size
# inputs = torch.randn(seq_len, batch_size, input_size)
inputs = torch.randn(batch_size, seq_len, input_size)
print('Input size:', inputs.shape)
print(inputs)
# hidden: numLayers,batch,hidden_size
hidden = torch.zeros(num_layers, batch_size, hidden_size)
print('Hidden size:', hidden.shape)
print('Hidden:', hidden)
# 直接使用RNN,不需要再写循环了
# out 指 h1~hn seqLen,batch,hidden_size
out, hidden = cell(inputs, hidden)
print('Output size:', out.shape)
print('Output:', out)
输出:
Input size: torch.Size([1, 3, 4])
tensor([[[ 1.0311, -0.1898, -0.2171, -0.3104],
[-1.6667, 0.2725, 0.9781, 0.4878],
[-1.3343, -2.2843, -0.1111, -0.8907]]])
Hidden size: torch.Size([1, 1, 2])
Hidden: tensor([[[0., 0.]]])
Output size: torch.Size([1, 3, 2])
Output: tensor([[[-0.0367, -0.1935],
[ 0.0298, 0.4441],
[-0.2344, -0.0832]]], grad_fn=<TransposeBackward1>)
3、使用 RNN Cell 实现 RNN
训练一个序列由 “hello” 转换为 “ohlol" 。
因为字符没法往神经网络中输入,所以我们首先需要先将字符向量化,先根据字符构造一个字典,然后把输入转换成独热向量之后再进行输入。
由图可以看出,这是一个四分类问题,判断输出的字符属于哪一类。
代码实现:
import torch
import matplotlib.pyplot as plt
# 1、准备数据
input_size = 4
hidden_size = 4
batch_size = 1
idx2char = ['e', 'h', 'l', 'o']
x_data = [1, 0, 2, 2, 3] # hell0
y_data = [3, 1, 2, 3, 2] # ohlol
one_hot_lookup = [[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]]
# 将数据转换为独热向量
# data: seqLen(5) * input_size(4)
x_one_hot = [one_hot_lookup[x] for x in x_data]
# 把 inputs 改成 seqLen, batch_size, input_size 的 维度
inputs = torch.Tensor(x_one_hot).view(-1, batch_size, input_size)
# 把 labels 改成 seqLen * 1 的 维度
labels = torch.LongTensor(y_data).view(-1, 1)
# 2、定义模型
class Model(torch.nn.Module):
def __init__(self, input_size, hidden_size, batch_size):
super(Model, self).__init__()
self.batch_size = batch_size
self.input_size = input_size
self.hidden_size = hidden_size
# shape of inputs: batchSize,inputSize
# shape of hidden: batchSize, hiddenSize
self.rnncell = torch.nn.RNNCell(input_size=self.input_size,
hidden_size=self.hidden_size)
def forward(self, input, hidden):
hidden = self.rnncell(input, hidden)
return hidden
def init_hidden(self):
# 生成一个默认的 batchSize * hiddenSize 全0的初始隐层h0
# batchSize 只有在需要构造 h0 的时候才需要
return torch.zeros(self.batch_size, self.hidden_size)
net = Model(input_size, hidden_size, batch_size)
# 3、定义损失函数和优化器
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.1)
# 4、训练
loss_list = []
epoch_list = []
for epoch in range(15):
loss = 0
optimizer.zero_grad() # 梯度先归0
hidden = net.init_hidden() # 先初始化,先算h0
print('Predicted string: ', end='')
# inputs: seqLen * batch_size * input_size 按seq取数据
# labels: seqlen * 1
for input, label in zip(inputs, labels):
# input: batch_size * input_size
# label: 1
hidden = net(input, hidden)
# loss 没有用 item,因为在计算构造图时整个序列 loss 的和才是最终的损失
loss += criterion(hidden, label)
_, idx = hidden.max(dim=1) # 找最大值的下标
print(idx2char[idx.item()], end='')
epoch_list.append(epoch)
loss_list.append(loss.item())
loss.backward()
optimizer.step()
print(', Epoch [%d/15] loss=%.4f' % (epoch+1, loss.item()))
plt.plot(epoch_list, loss_list)
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.show()
输出:
Predicted string: lhlee, Epoch [1/15] loss=7.2482
Predicted string: lhlel, Epoch [2/15] loss=5.6591
Predicted string: lhlol, Epoch [3/15] loss=4.3445
Predicted string: ohlol, Epoch [4/15] loss=3.5968
Predicted string: ohlol, Epoch [5/15] loss=3.2311
Predicted string: ohlol, Epoch [6/15] loss=2.9910
Predicted string: ohlol, Epoch [7/15] loss=2.8092
Predicted string: ohlol, Epoch [8/15] loss=2.6668
Predicted string: ohlol, Epoch [9/15] loss=2.5524
Predicted string: ohlol, Epoch [10/15] loss=2.4461
Predicted string: ohlol, Epoch [11/15] loss=2.3392
Predicted string: ohlol, Epoch [12/15] loss=2.2458
Predicted string: ohlol, Epoch [13/15] loss=2.1786
Predicted string: ohlol, Epoch [14/15] loss=2.1312
Predicted string: ohlol, Epoch [15/15] loss=2.0881
可以看到 loss 损失在不断地减少 ,RNN 可以把一个序列慢慢地变为另一个目标序列。
4、直接使用 RNN 实现
代码实现:
import torch
import matplotlib.pyplot as plt
# 1、准备数据
input_size = 4
hidden_size = 4
num_layers = 1
batch_size = 1
seq_len = 5
idx2char = ['e', 'h', 'l', 'o']
x_data = [1, 0, 2, 2, 3] # hell0
y_data = [3, 1, 2, 3, 2] # ohlol
one_hot_lookup = [[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]]
# 将数据转换为独热向量
# data: seqLen(5) * input_size(4)
x_one_hot = [one_hot_lookup[x] for x in x_data]
inputs = torch.Tensor(x_one_hot).view(seq_len, batch_size, input_size)
# label: (seqlen * batch_size, 1)
labels = torch.LongTensor(y_data)
# 2、定义模型
class Model(torch.nn.Module):
def __init__(self, input_size, hidden_size, batch_size, num_layers=1):
super(Model, self).__init__()
self.num_layers = num_layers
self.batch_size = batch_size # 构造隐层h0
self.input_size = input_size
self.hidden_size = hidden_size
# shape of inputs: batchSize,inputSize
# shape of hidden: batchSize, hiddenSize
self.rnn = torch.nn.RNN(input_size=self.input_size,
hidden_size=self.hidden_size,
num_layers=num_layers)
def forward(self, input):
# hidden: num_layers,batch_size,hidden_size
hidden = torch.zeros(self.num_layers,
self.batch_size,
self.hidden_size)
# out: seqlen * batch_size, hidden_size
out, _ = self.rnn(input, hidden)
return out.view(-1, self.hidden_size)
net = Model(input_size, hidden_size, batch_size, num_layers)
# 3、定义损失函数和优化器
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.1)
# 4、训练
loss_list = []
epoch_list = []
for epoch in range(15):
optimizer.zero_grad() # 梯度先归 0
# inputs: seqLen * batch_size * input_size
# outpus: seqLen * batch_size * hidden_size
# labels: seqLen * 1
outputs = net(inputs)
loss = criterion(outputs, labels)
epoch_list.append(epoch)
loss_list.append(loss.item())
loss.backward()
optimizer.step()
_, idx = outputs.max(dim=1)
idx = idx.data.numpy()
print('Predicted:', ''.join([idx2char[x] for x in idx]), end='')
print(', Epoch [%d/15] loss=%.3f' % (epoch+1, loss.item()))
plt.plot(epoch_list, loss_list)
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.show()
输出:
Predicted: lllll, Epoch [1/15] loss=1.288
Predicted: ollol, Epoch [2/15] loss=1.108
Predicted: olool, Epoch [3/15] loss=1.003
Predicted: ololl, Epoch [4/15] loss=0.929
Predicted: oholl, Epoch [5/15] loss=0.872
Predicted: oholl, Epoch [6/15] loss=0.820
Predicted: oholl, Epoch [7/15] loss=0.772
Predicted: oholl, Epoch [8/15] loss=0.733
Predicted: oholl, Epoch [9/15] loss=0.696
Predicted: oholl, Epoch [10/15] loss=0.667
Predicted: oholl, Epoch [11/15] loss=0.646
Predicted: oholl, Epoch [12/15] loss=0.627
Predicted: oholl, Epoch [13/15] loss=0.612
Predicted: oholl, Epoch [14/15] loss=0.599
Predicted: oholl, Epoch [15/15] loss=0.588
5、one-hot 改进为 embedding
独热向量的缺点:
- 纬度过高
- 向量过于稀疏
- 硬编码,数据之间没有什么联系
嵌入层: 高维映射到低维,也就是常说的数据降维。
代码实现:
import torch
import matplotlib.pyplot as plt
num_class = 4
input_size = 4
hidden_size = 8
embedding_size = 10
num_layers = 2
batch_size = 1
seq_len = 5
idx2char = ['e', 'h', 'l', 'o']
x_data = [[1, 0, 2, 2, 3]] # (batch, seq_len)
y_data = [3, 1, 2, 3, 2] # (batch * seq_len)
inputs = torch.LongTensor(x_data)
labels = torch.LongTensor(y_data)
class Model(torch.nn.Module):
def __init__(self):
super(Model, self).__init__()
self.emb = torch.nn.Embedding(input_size, embedding_size)
self.rnn = torch.nn.RNN(input_size=embedding_size,
hidden_size=hidden_size,
num_layers=num_layers,
batch_first=True)
self.fc = torch.nn.Linear(hidden_size, num_class)
def forward(self, x):
hidden = torch.zeros(num_layers, x.size(0), hidden_size)
x = self.emb(x)
x, _ = self.rnn(x, hidden)
x = self.fc(x)
return x.view(-1, num_class)
net = Model()
criterion = torch.nn.CrossEntropyLoss()
# lr=0.05损失反而会增大
optimizer = torch.optim.Adam(net.parameters(), lr=0.1)
loss_list = []
epoch_list = []
for epoch in range(15):
optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs, labels)
epoch_list.append(epoch)
loss_list.append(loss.item())
loss.backward()
optimizer.step()
_, idx = outputs.max(dim=1)
idx = idx.data.numpy()
print('Predicted: ', ''.join([idx2char[x] for x in idx]), end='')
print(', Epoch [%d/15] loss = %.3f' % (epoch+1, loss.item()))
plt.plot(epoch_list, loss_list)
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.show()
输出:
Predicted: lleel, Epoch [1/15] loss = 1.413
Predicted: ollol, Epoch [2/15] loss = 1.036
Predicted: ohloh, Epoch [3/15] loss = 0.719
Predicted: ohloh, Epoch [4/15] loss = 0.499
Predicted: ohlol, Epoch [5/15] loss = 0.311
Predicted: ohlol, Epoch [6/15] loss = 0.228
Predicted: ohlol, Epoch [7/15] loss = 0.148
Predicted: ohlol, Epoch [8/15] loss = 0.084
Predicted: ohlol, Epoch [9/15] loss = 0.047
Predicted: ohlol, Epoch [10/15] loss = 0.028
Predicted: ohlol, Epoch [11/15] loss = 0.019
Predicted: ohlol, Epoch [12/15] loss = 0.013
Predicted: ohlol, Epoch [13/15] loss = 0.009
Predicted: ohlol, Epoch [14/15] loss = 0.007
Predicted: ohlol, Epoch [15/15] loss = 0.005
可以看到损失明显变小了。
三、作业
作业1:使用LSTM
LSTM 的 计算公式
多一条路,减少梯度消失的症状~
I LSTM
代码实现:
import torch
import matplotlib.pyplot as plt
# 1、准备数据
input_size = 4
batch_size = 1
idx2char = ['e', 'h', 'l', 'o']
x_data = [1, 0, 2, 2, 3] # hell0
y_data = [3, 1, 2, 3, 2] # ohlol
one_hot_lookup = [[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]]
# 将数据转换为独热向量
# data: seqLen(5) * input_size(4)
x_one_hot = [one_hot_lookup[x] for x in x_data]
# 把 inputs 改成 seqLen, batch_size, input_size 的 维度
inputs = torch.Tensor(x_one_hot).view(-1, batch_size, input_size)
# 把 labels 改成 seqLen * 1 的 维度
labels = torch.LongTensor(y_data).view(-1, 1)
# 2、定义模型
class Model(torch.nn.Module):
def __init__(self):
super(Model, self).__init__()
self.lineari = torch.nn.Linear(4, 4)
self.linearf = torch.nn.Linear(4, 4)
self.linearc = torch.nn.Linear(4, 4)
self.linearo = torch.nn.Linear(4, 4)
self.sigmoid = torch.nn.Sigmoid()
self.tanh = torch.nn.Tanh()
self.batch_size = batch_size
self.input_size = input_size
def forward(self, x, hidden, C):
i = self.sigmoid(self.lineari(x) + self.lineari(hidden))
f = self.sigmoid(self.linearf(x) + self.linearf(hidden))
c = self.sigmoid(self.linearc(x) + self.linearc(hidden))
o = self.sigmoid(self.linearo(x) + self.linearo(hidden))
C = f * C + i * c # 候选状态x输入状态+遗忘状态x上一个细胞状态,得到此次细胞状态
hidden = o * self.tanh(C) # 此次得到的细胞状态进行激活后,再乘以输出门,最后得到隐藏层输出
return hidden, C
def init_hidden(self):
# 生成一个默认的 batchSize * hiddenSize 全0的初始隐层h0
# batchSize 只有在需要构造 h0 的时候才需要
return torch.zeros(self.batch_size, self.input_size)
net = Model()
# 3、定义损失函数和优化器
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.03)
# 4、训练
loss_list = []
epoch_list = []
for epoch in range(100):
loss = 0
optimizer.zero_grad() # 梯度先归0
hidden = net.init_hidden()
C = net.init_hidden()
print('Predicted string: ', end='')
for input, label in zip(inputs, labels):
# input: batch_size * input_size
# label: 1
hidden, C = net(input, hidden, C)
# loss 没有用 item,因为在计算构造图时整个序列 loss 的和才是最终的损失
loss += criterion(hidden, label)
_, idx = hidden.max(dim=1) # 找最大值的下标
print(idx2char[idx.item()], end='')
epoch_list.append(epoch)
loss_list.append(loss.item())
loss.backward()
optimizer.step()
print(', Epoch [%d/15] loss=%.4f' % (epoch+1, loss.item()))
plt.plot(epoch_list, loss_list)
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.show()
输出:
......
Predicted string: ohlll, Epoch [90/15] loss=4.5457
Predicted string: ohlll, Epoch [91/15] loss=4.5425
Predicted string: ohlll, Epoch [92/15] loss=4.5394
Predicted string: ohlll, Epoch [93/15] loss=4.5364
Predicted string: ohlll, Epoch [94/15] loss=4.5335
Predicted string: ohlll, Epoch [95/15] loss=4.5307
Predicted string: ohlll, Epoch [96/15] loss=4.5280
Predicted string: ohlll, Epoch [97/15] loss=4.5254
Predicted string: ohlll, Epoch [98/15] loss=4.5228
Predicted string: ohlll, Epoch [99/15] loss=4.5203
Predicted string: ohlll, Epoch [100/15] loss=4.5179
II LSTM + Embedding
代码实现:
import torch
import matplotlib.pyplot as plt
# 1、准备数据
input_size = 4
batch_size = 1
idx2char = ['e', 'h', 'l', 'o']
# x_data = [1, 0, 2, 2, 3] # hell0
x_data = torch.LongTensor([[1, 0, 2, 2, 3]]).view(5, 1)
print(x_data.shape)
y_data = [3, 1, 2, 3, 2] # 标签
labels = torch.LongTensor(y_data).view(-1, 1) # -1表示根据列数是1,自动计算其行
emb = torch.nn.Embedding(4, 10)
inputs = emb(x_data) # 将4维输入数据嵌入成10维,为后面的linear_ix等操作进行设置(5,1,10),嵌入值均是随机设定
print(inputs)
print(inputs.shape)
# 2、定义模型
class Model(torch.nn.Module):
def __init__(self):
super(Model, self).__init__()
# 上一时刻的内部状态g(x),外部状态h(x)
self.linear_ix = torch.nn.Linear(10, 4) # 输入门,输入值和上一阶段的外部状态值通过激活函数激活后成为输入值it
self.linear_fx = torch.nn.Linear(10, 4) # 遗忘门,输入值和上一阶段的外部状态值通过激活函数激活后成为遗忘值ft
self.linear_gx = torch.nn.Linear(10, 4) # 候选变量,输入值和上一阶段的外部状态值通过tanh激活函数激活后成为候选变量值
self.linear_ox = torch.nn.Linear(10, 4) # 输出值,输入值和上一阶段的外部状态值通过激活函数激活后成为输出值ot
# 隐藏层的LSTM变换
self.linear_ih = torch.nn.Linear(4, 4)
self.linear_fh = torch.nn.Linear(4, 4)
self.linear_gh = torch.nn.Linear(4, 4)
self.linear_oh = torch.nn.Linear(4, 4)
self.sigmoid = torch.nn.Sigmoid()
self.tanh = torch.nn.Tanh()
self.batch_size = batch_size
self.input_size = input_size
def forward(self, x, hidden, c):
# 输入值x和外部状态h(x)相结合,再通过激活函数激活得到内部状态的i,f,g(候选状态),o值;
i = self.sigmoid(self.linear_ix(x) + self.linear_ih(hidden))
f = self.sigmoid(self.linear_fx(x) + self.linear_fh(hidden))
g = self.tanh(self.linear_gx(x) + self.linear_gh(hidden))
o = self.sigmoid(self.linear_ox(x) + self.linear_oh(hidden))
# 候选状态g乘以输入值i,再加上上一时刻的内部状态c乘以遗忘值f,得到该时刻的更新的内部状态值c
# 输出元素 o 乘以经过激活函数激活后的该时刻的内部状态值,得到该时刻的外部状态值
c = f * c + i * g # 上一层的结果c通过遗忘门f得到最后的输出值,加上通过输入门的上一层的候选结果g;g是候选变量相比于c,是在激活函数上不同
hidden = o * self.tanh(c) # 上式得到的结果c通过输出门
return hidden, c
def init_hidden(self):
# 生成一个默认的 batchSize * hiddenSize 全0的初始隐层h0
# batchSize 只有在需要构造 h0 的时候才需要
return torch.zeros(self.batch_size, self.input_size)
net = Model()
# 3、定义损失函数和优化器
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.06)
# 4、训练
loss_list = []
epoch_list = []
for epoch in range(100):
loss = 0
optimizer.zero_grad() # 梯度先归0
hidden = net.init_hidden()
c = net.init_hidden()
print('Predicted string: ', end='')
for input, label in zip(inputs, labels):
# input: batch_size * input_size
# label: 1
hidden, c = net(input, hidden, c)
# loss 没有用 item,因为在计算构造图时整个序列 loss 的和才是最终的损失
loss += criterion(hidden, label)
_, idx = hidden.max(dim=1) # 找最大值的下标
print(idx2char[idx.item()], end='')
epoch_list.append(epoch)
loss_list.append(loss.item())
loss.backward(retain_graph=True) # 反向传播两次,梯度相加
optimizer.step()
print(', Epoch [%d/15] loss=%.4f' % (epoch+1, loss.item()))
plt.plot(epoch_list, loss_list)
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.show()
输出:
......
Predicted string: ohlll, Epoch [90/15] loss=2.8947
Predicted string: ohlll, Epoch [91/15] loss=2.8943
Predicted string: ohlll, Epoch [92/15] loss=2.8937
Predicted string: ohlll, Epoch [93/15] loss=2.8931
Predicted string: ohlll, Epoch [94/15] loss=2.8923
Predicted string: ohlll, Epoch [95/15] loss=2.8913
Predicted string: ohlll, Epoch [96/15] loss=2.8900
Predicted string: ohlll, Epoch [97/15] loss=2.8883
Predicted string: ohlll, Epoch [98/15] loss=2.8862
Predicted string: ohlll, Epoch [99/15] loss=2.8836
Predicted string: ohlll, Epoch [100/15] loss=2.8807
作业2:使用GRU