刘二大人 PyTorch深度学习实践笔记 P12 循环神经网络（基础篇）

小白*进阶ing

已于 2022-09-23 16:37:55 修改

阅读量1.5k

点赞数 8

分类专栏：刘二大人 PyTorch深度学习实践文章标签：深度学习 pytorch rnn

于 2022-09-23 16:36:41 首次发布

本文链接：https://blog.csdn.net/qq_44948213/article/details/126923734

版权

刘二大人 PyTorch深度学习实践专栏收录该内容

12 篇文章 66 订阅

订阅专栏

刘二大人 PyTorch深度学习实践笔记 P12 循环神经网络（基础篇）

P12 循环神经网络（基础篇）
一、基本概念
- 1、Basic_RNN
- 2、RNN Cell
二、RNN 的两种实现方式
三、作业

P12 循环神经网络（基础篇）

一、基本概念

1、Basic_RNN

Basic_RNN 直觉上可能觉得结构很复杂，但是其实很简单，就是对之前线性层的复用。它是专门用于处理具有时间序列的数据的问题的网络，采用权重共享的概念来减少需要训练的权重的数量。

举例： 用前三天的温度、气压、是否下雨，去预测第四天是否下雨。

在这里插入图片描述
不仅要考虑x1（第一天的天气状况）、x2（第二天的天气状况）、x3（第三天的天气状况）之间的连接关系，还要考虑到它们之前的先后时间序列关系，因为后一天的天气状况跟前一天的天气有很大程度的关系，跟前几天也有一些关系。

自然语言，金融股市等等，都是根据时间序列的数据，因此我们需要采用 RNN 来处理相关的数据。
在这里插入图片描述

2、RNN Cell

RNN Cell 的本质是一个线性层 linear ，只不过我们是采用共享权重的方式不断地使用同一个线性层，将时序数据每一项 xi 都送进 RNN Cell，只不过我们再将下一个输入 xi+1 送进线性层的同时顺手把上一个隐藏层的输出 hidden 一起传回到线性层 linear 。以此类推，循环计算 xi+2 、xi+3 …

对于 x1 也是如此，额外需要一个 h0，如果有先验知识，就把先验知识作为 h0 送给RNN，例如我们想通过图像来生成文本，我们就用 CNN+fc（全连接测层）生成 h0 来作为输入。如果没有先验知识我们就把 h0 设置成与 h1、h2… 都一样的维度，设成全 0 即可。
在这里插入图片描述
RNN Cell 具体实现，采用一个线性层不断地循环计算，类似采用 for 函数的形式。

RNN具体计算过程，tanh范围是 [-1, 1]

二、RNN 的两种实现方式

创建 RNN Cell，自己写处理序列的循环
直接使用 RNN Cell

1、RNN Cell 单元模块的实现

输入是 batch_size * input_size，输出 batch_size * hidden_size，共有 seq_len 批。

代码：


# 定义实例化的cell
cell = torch.nn.RNNCell(input_size=input_size, hidden_size=hidden_size)
#调用cell
# input 当前某一个时刻输入值的维度 batch,input_size
# hidden 当前时刻的隐层的维度 batch,hidden_size
hidden = cell(input, hidden)

举例：


import torch

# 定义参数
batch_size = 1
seq_len = 3
input_size = 4
hidden_size = 2

# 构造 RNN Cell，需要两个参数 input_size 和 hidden_size
cell = torch.nn.RNNCell(input_size=input_size, hidden_size=hidden_size)

# 最重要的：把维度弄清楚
# seq_len放在前面，相对来说比较自然
dataset = torch.randn(seq_len, batch_size, input_size)
# 隐藏层设置成全0
hidden = torch.zeros(batch_size, hidden_size)

# 训练循环
for idx, input in enumerate(dataset):
	print('=' * 20, idx, '=' * 20)
	# input 当前某一个时刻输入值的维度 batch,input_size 1, 4
	print('Input size:', input.shape)
	print(input)

	hidden = cell(input, hidden)
	# hidden 当前时刻的隐层的维度 batch,hidden_size 1, 2
	print('output size: ', hidden.shape)
	print(hidden)

输出：


==================== 0 ====================
Input size: torch.Size([1, 4])
tensor([[-0.3654,  0.4623, -0.2061,  0.6364]])
output size:  torch.Size([1, 2])
tensor([[0.6059, 0.3521]], grad_fn=<TanhBackward0>)
==================== 1 ====================
Input size: torch.Size([1, 4])
tensor([[ 0.5143, -0.4748,  1.5344,  0.3012]])
output size:  torch.Size([1, 2])
tensor([[ 0.6909, -0.6904]], grad_fn=<TanhBackward0>)
==================== 2 ====================
Input size: torch.Size([1, 4])
tensor([[-0.6538, -0.5483,  1.3151, -0.5661]])
output size:  torch.Size([1, 2])
tensor([[0.7328, 0.1342]], grad_fn=<TanhBackward0>)

2、直接使用 RNN Cell 的实现

代码：


# inputs 是全部序列的输入的维度 seqSize,batch,input_size
cell = torch.nn.RNN(input_size=input_size, hidden_size=hidden_size, num_layers=num_layers)
# hidden 是所有隐藏层的隐层维度 numLayers,batch,hidden_size
out, hidden = cell(inputs, hidden)
# out 指 h1~hn seqLen,batch,hidden_size

在这里插入图片描述

代码实现：


import torch

# 定义参数
batch_size = 1
seq_len = 3
input_size = 4
hidden_size = 2
# 多了一个参数
num_layers = 1

# 实例化
cell = torch.nn.RNN(input_size=input_size, hidden_size=hidden_size,
                    num_layers=num_layers)

# inputs: seqSize,batch,input_size
inputs = torch.randn(seq_len, batch_size, input_size)
print('Input size:', inputs.shape)
print(inputs)

# hidden: numLayers,batch,hidden_size
hidden = torch.zeros(num_layers, batch_size, hidden_size)
print('Hidden size:', hidden.shape)
print('Hidden:', hidden)

# 直接使用RNN，不需要再写循环了
# out 指 h1~hn seqLen,batch,hidden_size
out, hidden = cell(inputs, hidden)
print('Output size:', out.shape)
print('Output:', out)

输出：


Input size: torch.Size([3, 1, 4])
tensor([[[ 0.3477, -0.6693,  0.4090,  0.5715]],

        [[-0.5370,  0.6952,  1.0242, -2.5068]],

        [[-1.7401,  1.9239,  0.7664,  1.1838]]])
Hidden size: torch.Size([1, 1, 2])
Hidden: tensor([[[0., 0.]]])
Output size: torch.Size([3, 1, 2])
Output: tensor([[[ 0.8264,  0.1188]],

        [[ 0.7483, -0.9279]],

        [[ 0.6931, -0.9795]]], grad_fn=<StackBackward0>)

batch_first 设置为 True，batch_size 和 seq_len 需要更换一下位置，系统会自动更换其位置，输出纬度与上例相同。


import torch

# 定义参数
batch_size = 1
seq_len = 3
input_size = 4
hidden_size = 2
# 多了一个参数
num_layers = 1

# 实例化
cell = torch.nn.RNN(input_size=input_size, hidden_size=hidden_size,
                    num_layers=num_layers, batch_first=True)

# inputs: seqSize,batch,input_size
# inputs = torch.randn(seq_len, batch_size, input_size)
inputs = torch.randn(batch_size, seq_len, input_size)
print('Input size:', inputs.shape)
print(inputs)

# hidden: numLayers,batch,hidden_size
hidden = torch.zeros(num_layers, batch_size, hidden_size)
print('Hidden size:', hidden.shape)
print('Hidden:', hidden)

# 直接使用RNN，不需要再写循环了
# out 指 h1~hn seqLen,batch,hidden_size
out, hidden = cell(inputs, hidden)
print('Output size:', out.shape)
print('Output:', out)

输出：


Input size: torch.Size([1, 3, 4])
tensor([[[ 1.0311, -0.1898, -0.2171, -0.3104],
         [-1.6667,  0.2725,  0.9781,  0.4878],
         [-1.3343, -2.2843, -0.1111, -0.8907]]])
Hidden size: torch.Size([1, 1, 2])
Hidden: tensor([[[0., 0.]]])
Output size: torch.Size([1, 3, 2])
Output: tensor([[[-0.0367, -0.1935],
         [ 0.0298,  0.4441],
         [-0.2344, -0.0832]]], grad_fn=<TransposeBackward1>)

3、使用 RNN Cell 实现 RNN

训练一个序列由 “hello” 转换为 “ohlol" 。

在这里插入图片描述

因为字符没法往神经网络中输入，所以我们首先需要先将字符向量化，先根据字符构造一个字典，然后把输入转换成独热向量之后再进行输入。

在这里插入图片描述
由图可以看出，这是一个四分类问题，判断输出的字符属于哪一类。

代码实现：


import torch
import matplotlib.pyplot as plt

# 1、准备数据
input_size = 4
hidden_size = 4
batch_size = 1

idx2char = ['e', 'h', 'l', 'o']
x_data = [1, 0, 2, 2, 3] # hell0
y_data = [3, 1, 2, 3, 2] # ohlol

one_hot_lookup = [[1, 0, 0, 0],
                  [0, 1, 0, 0],
                  [0, 0, 1, 0],
                  [0, 0, 0, 1]]

# 将数据转换为独热向量
# data: seqLen(5) * input_size(4)
x_one_hot = [one_hot_lookup[x] for x in x_data]

# 把 inputs 改成 seqLen, batch_size, input_size 的 维度
inputs = torch.Tensor(x_one_hot).view(-1, batch_size, input_size)
# 把 labels 改成 seqLen * 1 的 维度
labels = torch.LongTensor(y_data).view(-1, 1)


# 2、定义模型
class Model(torch.nn.Module):
    def __init__(self, input_size, hidden_size, batch_size):
        super(Model, self).__init__()
        self.batch_size = batch_size
        self.input_size = input_size
        self.hidden_size = hidden_size
        # shape of inputs: batchSize,inputSize
        # shape of hidden: batchSize, hiddenSize
        self.rnncell = torch.nn.RNNCell(input_size=self.input_size,
                                        hidden_size=self.hidden_size)

    def forward(self, input, hidden):
        hidden = self.rnncell(input, hidden)
        return hidden

    def init_hidden(self):
        # 生成一个默认的 batchSize * hiddenSize 全0的初始隐层h0
        # batchSize 只有在需要构造 h0 的时候才需要
        return torch.zeros(self.batch_size, self.hidden_size)

net = Model(input_size, hidden_size, batch_size)

# 3、定义损失函数和优化器
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.1)

# 4、训练
loss_list = []
epoch_list = []
for epoch in range(15):
    loss = 0
    optimizer.zero_grad() # 梯度先归0
    hidden = net.init_hidden() # 先初始化，先算h0
    print('Predicted string: ', end='')
    # inputs: seqLen * batch_size * input_size 按seq取数据
    # labels: seqlen * 1
    for input, label in zip(inputs, labels):
        # input: batch_size * input_size
        # label: 1
        hidden = net(input, hidden)
        # loss 没有用 item，因为在计算构造图时整个序列 loss 的和才是最终的损失
        loss += criterion(hidden, label)
        _, idx = hidden.max(dim=1) # 找最大值的下标
        print(idx2char[idx.item()], end='')
    epoch_list.append(epoch)
    loss_list.append(loss.item())
    loss.backward()
    optimizer.step()
    print(', Epoch [%d/15] loss=%.4f' % (epoch+1, loss.item()))
    
plt.plot(epoch_list, loss_list)
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.show()

输出：
在这里插入图片描述


Predicted string: lhlee, Epoch [1/15] loss=7.2482
Predicted string: lhlel, Epoch [2/15] loss=5.6591
Predicted string: lhlol, Epoch [3/15] loss=4.3445
Predicted string: ohlol, Epoch [4/15] loss=3.5968
Predicted string: ohlol, Epoch [5/15] loss=3.2311
Predicted string: ohlol, Epoch [6/15] loss=2.9910
Predicted string: ohlol, Epoch [7/15] loss=2.8092
Predicted string: ohlol, Epoch [8/15] loss=2.6668
Predicted string: ohlol, Epoch [9/15] loss=2.5524
Predicted string: ohlol, Epoch [10/15] loss=2.4461
Predicted string: ohlol, Epoch [11/15] loss=2.3392
Predicted string: ohlol, Epoch [12/15] loss=2.2458
Predicted string: ohlol, Epoch [13/15] loss=2.1786
Predicted string: ohlol, Epoch [14/15] loss=2.1312
Predicted string: ohlol, Epoch [15/15] loss=2.0881

可以看到 loss 损失在不断地减少，RNN 可以把一个序列慢慢地变为另一个目标序列。

4、直接使用 RNN 实现

代码实现：


import torch
import matplotlib.pyplot as plt

# 1、准备数据
input_size = 4
hidden_size = 4
num_layers = 1
batch_size = 1
seq_len = 5

idx2char = ['e', 'h', 'l', 'o']
x_data = [1, 0, 2, 2, 3] # hell0
y_data = [3, 1, 2, 3, 2] # ohlol

one_hot_lookup = [[1, 0, 0, 0],
                  [0, 1, 0, 0],
                  [0, 0, 1, 0],
                  [0, 0, 0, 1]]

# 将数据转换为独热向量
# data: seqLen(5) * input_size(4)
x_one_hot = [one_hot_lookup[x] for x in x_data]

inputs = torch.Tensor(x_one_hot).view(seq_len, batch_size, input_size)
# label: (seqlen * batch_size, 1)
labels = torch.LongTensor(y_data)


# 2、定义模型
class Model(torch.nn.Module):
    def __init__(self, input_size, hidden_size, batch_size, num_layers=1):
        super(Model, self).__init__()
        self.num_layers = num_layers
        self.batch_size = batch_size # 构造隐层h0
        self.input_size = input_size
        self.hidden_size = hidden_size
        # shape of inputs: batchSize,inputSize
        # shape of hidden: batchSize, hiddenSize
        self.rnn = torch.nn.RNN(input_size=self.input_size,
                                hidden_size=self.hidden_size,
                                num_layers=num_layers)

    def forward(self, input):
        # hidden: num_layers,batch_size,hidden_size
        hidden = torch.zeros(self.num_layers,
                             self.batch_size,
                             self.hidden_size)
        # out: seqlen * batch_size, hidden_size
        out, _ = self.rnn(input, hidden)
        return out.view(-1, self.hidden_size)


net = Model(input_size, hidden_size, batch_size, num_layers)

# 3、定义损失函数和优化器
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.1)

# 4、训练
loss_list = []
epoch_list = []
for epoch in range(15):
    optimizer.zero_grad() # 梯度先归 0
    # inputs: seqLen * batch_size * input_size
    # outpus: seqLen * batch_size * hidden_size
    # labels: seqLen * 1
    outputs = net(inputs)
    loss = criterion(outputs, labels)
    epoch_list.append(epoch)
    loss_list.append(loss.item())
    loss.backward()
    optimizer.step()

    _, idx = outputs.max(dim=1)
    idx = idx.data.numpy()
    print('Predicted：', ''.join([idx2char[x] for x in idx]), end='')
    print(', Epoch [%d/15] loss=%.3f' % (epoch+1, loss.item()))
    
plt.plot(epoch_list, loss_list)
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.show()

输出：
在这里插入图片描述

Predicted： lllll, Epoch [1/15] loss=1.288
Predicted： ollol, Epoch [2/15] loss=1.108
Predicted： olool, Epoch [3/15] loss=1.003
Predicted： ololl, Epoch [4/15] loss=0.929
Predicted： oholl, Epoch [5/15] loss=0.872
Predicted： oholl, Epoch [6/15] loss=0.820
Predicted： oholl, Epoch [7/15] loss=0.772
Predicted： oholl, Epoch [8/15] loss=0.733
Predicted： oholl, Epoch [9/15] loss=0.696
Predicted： oholl, Epoch [10/15] loss=0.667
Predicted： oholl, Epoch [11/15] loss=0.646
Predicted： oholl, Epoch [12/15] loss=0.627
Predicted： oholl, Epoch [13/15] loss=0.612
Predicted： oholl, Epoch [14/15] loss=0.599
Predicted： oholl, Epoch [15/15] loss=0.588

5、one-hot 改进为 embedding

独热向量的缺点：

纬度过高
向量过于稀疏
硬编码，数据之间没有什么联系

在这里插入图片描述

嵌入层： 高维映射到低维，也就是常说的数据降维。

在这里插入图片描述

代码实现：


import torch
import matplotlib.pyplot as plt

num_class = 4
input_size = 4
hidden_size = 8
embedding_size = 10
num_layers = 2
batch_size = 1
seq_len = 5


idx2char = ['e', 'h', 'l', 'o']
x_data = [[1, 0, 2, 2, 3]] # (batch, seq_len)
y_data = [3, 1, 2, 3, 2] # (batch * seq_len)

inputs = torch.LongTensor(x_data)
labels = torch.LongTensor(y_data)


class Model(torch.nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.emb = torch.nn.Embedding(input_size, embedding_size)
        self.rnn = torch.nn.RNN(input_size=embedding_size,
                                hidden_size=hidden_size,
                                num_layers=num_layers,
                                batch_first=True)
        self.fc = torch.nn.Linear(hidden_size, num_class)

    def forward(self, x):
        hidden = torch.zeros(num_layers, x.size(0), hidden_size)
        x = self.emb(x)
        x, _ = self.rnn(x, hidden)
        x = self.fc(x)
        return x.view(-1, num_class)


net = Model()
criterion = torch.nn.CrossEntropyLoss()
# lr=0.05损失反而会增大
optimizer = torch.optim.Adam(net.parameters(), lr=0.1)

loss_list = []
epoch_list = []

for epoch in range(15):
    optimizer.zero_grad()
    outputs = net(inputs)
    loss = criterion(outputs, labels)
    epoch_list.append(epoch)
    loss_list.append(loss.item())
    loss.backward()
    optimizer.step()

    _, idx = outputs.max(dim=1)
    idx = idx.data.numpy()
    print('Predicted: ', ''.join([idx2char[x] for x in idx]), end='')
    print(', Epoch [%d/15] loss = %.3f' % (epoch+1, loss.item()))

plt.plot(epoch_list, loss_list)
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.show()

输出：

在这里插入图片描述


Predicted:  lleel, Epoch [1/15] loss = 1.413
Predicted:  ollol, Epoch [2/15] loss = 1.036
Predicted:  ohloh, Epoch [3/15] loss = 0.719
Predicted:  ohloh, Epoch [4/15] loss = 0.499
Predicted:  ohlol, Epoch [5/15] loss = 0.311
Predicted:  ohlol, Epoch [6/15] loss = 0.228
Predicted:  ohlol, Epoch [7/15] loss = 0.148
Predicted:  ohlol, Epoch [8/15] loss = 0.084
Predicted:  ohlol, Epoch [9/15] loss = 0.047
Predicted:  ohlol, Epoch [10/15] loss = 0.028
Predicted:  ohlol, Epoch [11/15] loss = 0.019
Predicted:  ohlol, Epoch [12/15] loss = 0.013
Predicted:  ohlol, Epoch [13/15] loss = 0.009
Predicted:  ohlol, Epoch [14/15] loss = 0.007
Predicted:  ohlol, Epoch [15/15] loss = 0.005

可以看到损失明显变小了。

三、作业

作业1：使用LSTM

在这里插入图片描述

LSTM 的计算公式

在这里插入图片描述
多一条路，减少梯度消失的症状~

在这里插入图片描述

I LSTM

代码实现：


import torch
import matplotlib.pyplot as plt

# 1、准备数据
input_size = 4
batch_size = 1

idx2char = ['e', 'h', 'l', 'o']
x_data = [1, 0, 2, 2, 3] # hell0
y_data = [3, 1, 2, 3, 2] # ohlol

one_hot_lookup = [[1, 0, 0, 0],
                  [0, 1, 0, 0],
                  [0, 0, 1, 0],
                  [0, 0, 0, 1]]

# 将数据转换为独热向量
# data: seqLen(5) * input_size(4)
x_one_hot = [one_hot_lookup[x] for x in x_data]

# 把 inputs 改成 seqLen, batch_size, input_size 的 维度
inputs = torch.Tensor(x_one_hot).view(-1, batch_size, input_size)
# 把 labels 改成 seqLen * 1 的 维度
labels = torch.LongTensor(y_data).view(-1, 1)


# 2、定义模型
class Model(torch.nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.lineari = torch.nn.Linear(4, 4)
        self.linearf = torch.nn.Linear(4, 4)
        self.linearc = torch.nn.Linear(4, 4)
        self.linearo = torch.nn.Linear(4, 4)
        self.sigmoid = torch.nn.Sigmoid()
        self.tanh = torch.nn.Tanh()
        self.batch_size = batch_size
        self.input_size = input_size

    def forward(self, x, hidden, C):
        i = self.sigmoid(self.lineari(x) + self.lineari(hidden))
        f = self.sigmoid(self.linearf(x) + self.linearf(hidden))
        c = self.sigmoid(self.linearc(x) + self.linearc(hidden))
        o = self.sigmoid(self.linearo(x) + self.linearo(hidden))
        C = f * C + i * c  # 候选状态x输入状态+遗忘状态x上一个细胞状态，得到此次细胞状态
        hidden = o * self.tanh(C)  # 此次得到的细胞状态进行激活后，再乘以输出门，最后得到隐藏层输出
        return hidden, C

    def init_hidden(self):
        # 生成一个默认的 batchSize * hiddenSize 全0的初始隐层h0
        # batchSize 只有在需要构造 h0 的时候才需要
        return torch.zeros(self.batch_size, self.input_size)

net = Model()

# 3、定义损失函数和优化器
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.03)

# 4、训练
loss_list = []
epoch_list = []
for epoch in range(100):
    loss = 0
    optimizer.zero_grad() # 梯度先归0
    hidden = net.init_hidden()
    C = net.init_hidden()
    print('Predicted string: ', end='')
    for input, label in zip(inputs, labels):
        # input: batch_size * input_size
        # label: 1
        hidden, C = net(input, hidden, C)
        # loss 没有用 item，因为在计算构造图时整个序列 loss 的和才是最终的损失
        loss += criterion(hidden, label)
        _, idx = hidden.max(dim=1) # 找最大值的下标
        print(idx2char[idx.item()], end='')
    epoch_list.append(epoch)
    loss_list.append(loss.item())
    loss.backward()
    optimizer.step()
    print(', Epoch [%d/15] loss=%.4f' % (epoch+1, loss.item()))

plt.plot(epoch_list, loss_list)
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.show()

输出：
在这里插入图片描述


......
Predicted string: ohlll, Epoch [90/15] loss=4.5457
Predicted string: ohlll, Epoch [91/15] loss=4.5425
Predicted string: ohlll, Epoch [92/15] loss=4.5394
Predicted string: ohlll, Epoch [93/15] loss=4.5364
Predicted string: ohlll, Epoch [94/15] loss=4.5335
Predicted string: ohlll, Epoch [95/15] loss=4.5307
Predicted string: ohlll, Epoch [96/15] loss=4.5280
Predicted string: ohlll, Epoch [97/15] loss=4.5254
Predicted string: ohlll, Epoch [98/15] loss=4.5228
Predicted string: ohlll, Epoch [99/15] loss=4.5203
Predicted string: ohlll, Epoch [100/15] loss=4.5179

II LSTM + Embedding

代码实现：


import torch
import matplotlib.pyplot as plt

# 1、准备数据
input_size = 4
batch_size = 1

idx2char = ['e', 'h', 'l', 'o']

# x_data = [1, 0, 2, 2, 3] # hell0
x_data = torch.LongTensor([[1, 0, 2, 2, 3]]).view(5, 1)
print(x_data.shape)

y_data = [3, 1, 2, 3, 2]  # 标签
labels = torch.LongTensor(y_data).view(-1, 1)  # -1表示根据列数是1，自动计算其行

emb = torch.nn.Embedding(4, 10)
inputs = emb(x_data)  # 将4维输入数据嵌入成10维，为后面的linear_ix等操作进行设置(5,1,10),嵌入值均是随机设定
print(inputs)
print(inputs.shape)

# 2、定义模型
class Model(torch.nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        # 上一时刻的内部状态g(x),外部状态h(x)
        self.linear_ix = torch.nn.Linear(10, 4)  # 输入门,输入值和上一阶段的外部状态值通过激活函数激活后成为输入值it
        self.linear_fx = torch.nn.Linear(10, 4)  # 遗忘门，输入值和上一阶段的外部状态值通过激活函数激活后成为遗忘值ft
        self.linear_gx = torch.nn.Linear(10, 4)  # 候选变量，输入值和上一阶段的外部状态值通过tanh激活函数激活后成为候选变量值
        self.linear_ox = torch.nn.Linear(10, 4)  # 输出值，输入值和上一阶段的外部状态值通过激活函数激活后成为输出值ot
        # 隐藏层的LSTM变换
        self.linear_ih = torch.nn.Linear(4, 4)
        self.linear_fh = torch.nn.Linear(4, 4)
        self.linear_gh = torch.nn.Linear(4, 4)
        self.linear_oh = torch.nn.Linear(4, 4)
        self.sigmoid = torch.nn.Sigmoid()
        self.tanh = torch.nn.Tanh()
        self.batch_size = batch_size
        self.input_size = input_size

    def forward(self, x, hidden, c):
        # 输入值x和外部状态h(x)相结合，再通过激活函数激活得到内部状态的i，f，g(候选状态），o值；
        i = self.sigmoid(self.linear_ix(x) + self.linear_ih(hidden))
        f = self.sigmoid(self.linear_fx(x) + self.linear_fh(hidden))
        g = self.tanh(self.linear_gx(x) + self.linear_gh(hidden))
        o = self.sigmoid(self.linear_ox(x) + self.linear_oh(hidden))
        # 候选状态g乘以输入值i，再加上上一时刻的内部状态c乘以遗忘值f，得到该时刻的更新的内部状态值c
        # 输出元素 o 乘以经过激活函数激活后的该时刻的内部状态值，得到该时刻的外部状态值
        c = f * c + i * g  # 上一层的结果c通过遗忘门f得到最后的输出值，加上通过输入门的上一层的候选结果g；g是候选变量相比于c，是在激活函数上不同
        hidden = o * self.tanh(c)  # 上式得到的结果c通过输出门
        return hidden, c


    def init_hidden(self):
        # 生成一个默认的 batchSize * hiddenSize 全0的初始隐层h0
        # batchSize 只有在需要构造 h0 的时候才需要
        return torch.zeros(self.batch_size, self.input_size)

net = Model()

# 3、定义损失函数和优化器
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.06)

# 4、训练
loss_list = []
epoch_list = []
for epoch in range(100):
    loss = 0
    optimizer.zero_grad() # 梯度先归0
    hidden = net.init_hidden()
    c = net.init_hidden()
    print('Predicted string: ', end='')
    for input, label in zip(inputs, labels):
        # input: batch_size * input_size
        # label: 1
        hidden, c = net(input, hidden, c)
        # loss 没有用 item，因为在计算构造图时整个序列 loss 的和才是最终的损失
        loss += criterion(hidden, label)
        _, idx = hidden.max(dim=1) # 找最大值的下标
        print(idx2char[idx.item()], end='')
    epoch_list.append(epoch)
    loss_list.append(loss.item())
    loss.backward(retain_graph=True) # 反向传播两次，梯度相加
    optimizer.step()
    print(', Epoch [%d/15] loss=%.4f' % (epoch+1, loss.item()))

plt.plot(epoch_list, loss_list)
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.show()

输出：


......
Predicted string: ohlll, Epoch [90/15] loss=2.8947
Predicted string: ohlll, Epoch [91/15] loss=2.8943
Predicted string: ohlll, Epoch [92/15] loss=2.8937
Predicted string: ohlll, Epoch [93/15] loss=2.8931
Predicted string: ohlll, Epoch [94/15] loss=2.8923
Predicted string: ohlll, Epoch [95/15] loss=2.8913
Predicted string: ohlll, Epoch [96/15] loss=2.8900
Predicted string: ohlll, Epoch [97/15] loss=2.8883
Predicted string: ohlll, Epoch [98/15] loss=2.8862
Predicted string: ohlll, Epoch [99/15] loss=2.8836
Predicted string: ohlll, Epoch [100/15] loss=2.8807