《PyTorch深度学习实践》第十一讲 循环神经网络(基础篇 + 高级篇)

b站刘二大人《PyTorch深度学习实践》课程第十一讲循环神经网络(基础篇 + 高级篇)笔记与代码:
https://www.bilibili.com/video/BV1Y7411d7Ys?p=12&vd_source=b17f113d28933824d753a0915d5e3a90
https://www.bilibili.com/video/BV1Y7411d7Ys?p=13&spm_id_from=pageDriver&vd_source=b17f113d28933824d753a0915d5e3a90

markdown笔记:https://github.com/Jian-wei-peng/StudyNotes/tree/main/DRL/%E5%88%98%E4%BA%8Cpytorch%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0


一、循环神经网络基础篇

回顾

  • 之前构建使用的神经网络都是稠密神经网络(Dense Neural Network)或者称为深度神经网络(Deep Neural Network)
    • 有很多的线性层,对输入的数据进行空间上的变换
    • $x_1 到 到 x_8$是样本的8个特征
image-20230703090018048
  • 另一种场景:

    • 假设有一个关于是否下雨的数据集,包含每一天的温度、气压和天气情况的数据,是否下雨与温度和气压有关。需要根据这些数据来预测是否下雨

      • 如果要预测今天是否下雨,那么就需要根据历史天气情况数据进行预测,而不是今天的温度和气压数据

        • 用过去三天的数据预测今天的,那么输入的数据就是 X = [ x 1 , x 2 , x 3 ] X = [x_1, x_2, x_3] X=[x1,x2,x3],每个 x x x又包含了温度、气压和是否下雨三个数据
        image-20230703095117884
      • 如果采用全连接神经网络,那就把 [ x 1 , x 2 , x 3 ] [x_1, x_2, x_3] [x1,x2,x3]拼成一个有9个维度的长向量来进行训练

        • 如果输入序列很长,而且每个序列的维度很高,对稠密的神经网络而言(例如全连接网络)非常难训练,计算量很大
      • RNN就是专门用于处理带有序列模式的数据,其中也采用了权重共享的概念来减少需要训练的权重数量

        • x 1 x_1 x1, x 2 x_2 x2, x 3 x_3 x3看成一个序列,在使用的时候不仅要考虑 x 1 x_1 x1 x 2 x_2 x2之间的连接关系,还要考虑到这些数据之间存在先后的时间顺序,即 x 2 x_2 x2依赖于 x 1 x_1 x1 x 3 x_3 x3依赖于 x 2 x_2 x2
        • RNN主要用于处理具有序列关系的数据,如天气、股市、自然语言等等
        image-20230703100910772

RNN Cell

image-20230703101233196
  • x t x_t xt表示序列当中时刻 t t t的数据,经过RNN Cell之后得到另外一个维度的向量 h t h_t ht,例如从三维变五维,那么这也说明RNN Cell的本质是一个线性层

    • 与一般的线性层的区别是RNN Cell是共享的
  • 假设 x 1 x_1 x1 x 2 x_2 x2 x 3 x_3 x3 x 4 x_4 x4是输入序列,输入到RNN Cell中进行一个线性变换得到输出 h h h,输出也称为 h i d d e n hidden hidden,隐藏层

    image-20230703102321731
    • 序列之间存在关联,那么在输入 x 2 x_2 x2 h 2 h_2 h2的时候不仅要包含 x 2 x_2 x2的信息,还要包含 x 1 x_1 x1的信息,需要进行信息融合(图中红色箭头)
    • 对于 x 1 x_1 x1也需要一个输入的 h 0 h_0 h0(表示先验知识),如果没有先验知识,则输入一个与 h 1 h_1 h1 h 2 h_2 h2等的维度相等的全零向量
image-20230703103143423
  • input_size:输入维度

  • hidden_size:隐藏层维度

  • 权重 w i h w_{ih} wih的维度是hidden_size*input_size,这样与输入序列 x t x_t xt相乘后才能得到维度为hidden_size*1的向量

  • 权重 w h h w_{hh} whh的维度是hidden_size*hidden_size,与 h t − 1 h_{t-1} ht1相乘后得到维度为hidden_size*1的向量

  • RNN Cell中将两个维度为hidden_size*1的向量相加即可融合 h h h x x x的信息,然后再用 t a n h tanh tanh做激活,最终计算得到 h t h_t ht

  • 注意:

    • w h h h t − 1 + b h h w_{hh}h_{t-1} + b_{hh} whhht1+bhh w i h x t + b i h w_{ih}x_{t} + b_{ih} wihxt+bih可以整合到一起

      image-20230703103531720
    • 在构建RNN Cell的时候,实际上是将 x t x_t xt h t − 1 h_{t-1} ht1拼接起来,组成一个维度为(hidden_size+input_size)*1的向量,那么权重就是一个维度为hidden_size * (hidden_size + input_size)的矩阵,因此RNN Cell本质上就是一个线性层

      image-20230703104103591

RNN Cell的实现

image-20230703102410667
cell = torch.nn.RNNCell(input_size=input_size, hidden_size=hidden_size)  # 两个参数都是输入维度
image-20230703110001944

例子:

  • 假设有一个序列,batchSize = 1,序列长度为3,输入维度为4,输出维度为2
import torch

batch_size = 1
seq_len = 3
input_size = 4
hidden_size = 2

# 构造RNNCell
cell = torch.nn.RNNCell(input_size=input_size, hidden_size=hidden_size)

# (seq, batch, features)
dataset = torch.randn(seq_len, batch_size, input_size)
# 将隐层初始化为0
hidden = torch.zeros(batch_size, hidden_size)

for idx, input in enumerate(dataset):
    print('=' * 20, idx, '=' * 20)
    print('Input size: ', input.shape)

    hidden = cell(input, hidden)

    print('outputs size: ', hidden.shape)
    print(hidden)
image-20230703110929622

RNN的实现

# input_size输入序列维度,hidden_size是隐层维度
# num_layers是RNN的层数(需要注意的是RNN的运算是很耗时的)
cell = torch.nn.RNN(input_size=input_size, hidden_size=hidden_size, num_layers=num_layers)
image-20230703111319435
  • cell的输入inputs是所有的序列( x 1 x_1 x1 x 2 x_2 x2 x 3 x_3 x3,…, x N x_N xN),hidden是 h 0 h_0 h0
  • cell的输出out是所有的隐层输出( h 1 h_1 h1 h 2 h_2 h2 h 3 h_3 h3,…, h N h_N hN),hidden是 h N h_N hN
image-20230703111718868

numLayers:

  • 同样颜色的RNN Cell是相同的,因此下图中只有三个线性层
image-20230703112138991
import torch

# 参数
batch_size = 1
seq_len = 3
input_size = 4
hidden_size = 2
num_layers = 1

# 构造RNN
cell = torch.nn.RNN(input_size=input_size, hidden_size=hidden_size,
                    num_layers=num_layers)

# (seqLen, batchSize, inputSize)
inputs = torch.randn(seq_len, batch_size, input_size)
# 将隐层初始化为0
hidden = torch.zeros(num_layers, batch_size, hidden_size)

out, hidden = cell(inputs, hidden)

print('Output size: ', out.shape)
print('Output: ', out)
print('Hidden size: ', hidden.shape)
print('Hidden: ', hidden)
image-20230703112644037 image-20230703112749094
  • batch_first设置为True,那么要把参数batch_size放在第一位
image-20230703112843647

例子 —— 使用RNNCell

image-20230703113223869
  • 例子中的字符并不是向量,直接输入的话无法计算,因此第一步要先将这些字符向量化

    • 先根据字符构造一个词典,给每个词分配一个索引
    • 再根据词典将输入字符变成相应的索引
    • 最后将索引转成向量(宽度等于词典的元素数量,高度等于输入字符数量)
      • 如果索引值为1,那么除了1以外其他的值都为0,即0100
    • 向量化处理得到的结果称为独热向量One-Hot Vectors
      • inputSize=4,输入维度是向量的宽度(列数)
    image-20230703114809843
image-20230703115333835

代码实现:

  • 定义参数

    import torch
    
    input_size = 4
    hidden_size =4
    batch_size = 1
    
  • 准备数据集

    # 数据集
    idx2char = ['e', 'h', 'l', 'o']     # 词典
    x_data = [1, 0, 2, 2, 3]            # 输入序列hello
    y_data = [3, 1, 2, 3, 2]            # 输出序列ohlol
    
    # 将输入字符的索引转成独热向量
    one_hot_lookup = [[1, 0, 0, 0],
                      [0, 1, 0, 0],
                      [0, 0, 1, 0],
                      [0, 0, 0, 1]]
    x_one_hot = [one_hot_lookup[x] for x in x_data]
    
    # reshape the inputs to (seqLen, batchSize, inputSize),-1的意思是自动计算
    inputs = torch.Tensor(x_one_hot).view(-1, batch_size, input_size)
    
    # reshape the labels to (seqLen, 1)
    labels = torch.LongTensor(y_data).view(-1, 1)
    
  • 构造模型

    class Model(torch.nn.Module):
        def __init__(self, input_size, hidden_size, batch_size):
            super(Model, self).__init__()
            # 初始化参数
            self.batch_size = batch_size
            self.input_size = input_size
            self.hidden_size = hidden_size
            # 构造RNN Cell
            self.rnncell = torch.nn.RNNCell(input_size=self.input_size,
                                            hidden_size=self.hidden_size)
    
        def forward(self, input, hidden):
            hidden = self.rnncell(input, hidden)
            return hidden
    
        def init_hidden(self):
            # 将隐层初始化为零。仅在构造h_0的时候会用上
            return torch.zeros(self.batch_size, self.hidden_size)   
    
    net = Model(input_size, hidden_size, batch_size)
    
  • 构造损失和优化器

    # Loss and Optimizer
    criterion = torch.nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(net.parameters(), lr=0.1)
    
  • 训练

    # Trainning steps
    for epoch in range(15):
        loss = 0
        optimizer.zero_grad()
        hidden = net.init_hidden()
        print('Predicted string: ', end='')
        for input, label in zip(inputs, labels):
            hidden = net(input, hidden)
            loss += criterion(hidden, label)
            _, idx = hidden.max(dim=1)
            print(idx2char[idx.item()], end='')
        loss.backward()
        optimizer.step()
        print(', Epoch [%d/15] loss=%.4f' % (epoch+1, loss.item()))
    

完整代码:

import torch

# 参数
input_size = 4
hidden_size =4
batch_size = 1

# 数据集
idx2char = ['e', 'h', 'l', 'o']     # 词典
x_data = [1, 0, 2, 2, 3]            # 输入序列hello
y_data = [3, 1, 2, 3, 2]            # 输出序列ohlol

# 将输入字符的索引转成独热向量
one_hot_lookup = [[1, 0, 0, 0],
                  [0, 1, 0, 0],
                  [0, 0, 1, 0],
                  [0, 0, 0, 1]]
x_one_hot = [one_hot_lookup[x] for x in x_data]

# reshape the inputs to (seqLen, batchSize, inputSize),-1的意思是自动计算
inputs = torch.Tensor(x_one_hot).view(-1, batch_size, input_size)

# reshape the labels to (seqLen, 1)
labels = torch.LongTensor(y_data).view(-1, 1)


# 模型
class Model(torch.nn.Module):
    def __init__(self, input_size, hidden_size, batch_size):
        super(Model, self).__init__()
        # 初始化参数
        self.batch_size = batch_size
        self.input_size = input_size
        self.hidden_size = hidden_size
        # 构造RNN Cell
        self.rnncell = torch.nn.RNNCell(input_size=self.input_size,
                                        hidden_size=self.hidden_size)

    def forward(self, input, hidden):
        hidden = self.rnncell(input, hidden)
        return hidden

    def init_hidden(self):
        return torch.zeros(self.batch_size, self.hidden_size)   # 将隐层初始化为零。仅在构造h_0的时候会用上


net = Model(input_size, hidden_size, batch_size)

# Loss and Optimizer
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.1)


# Trainning steps
for epoch in range(15):
    loss = 0
    optimizer.zero_grad()
    hidden = net.init_hidden()
    print('Predicted string: ', end='')
    for input, label in zip(inputs, labels):
        hidden = net(input, hidden)
        loss += criterion(hidden, label)
        _, idx = hidden.max(dim=1)
        print(idx2char[idx.item()], end='')
    loss.backward()
    optimizer.step()
    print(', Epoch [%d/15] loss=%.4f' % (epoch+1, loss.item()))
image-20230703125205194

例子 —— 使用RNN

import torch

# 参数
input_size = 4
hidden_size =4
num_layers = 1
batch_size = 1
seq_len = 5

# 数据集
idx2char = ['e', 'h', 'l', 'o']     # 词典
x_data = [1, 0, 2, 2, 3]            # 输入序列hello
y_data = [3, 1, 2, 3, 2]            # 输出序列ohlol

# 将输入字符的索引转成独热向量
one_hot_lookup = [[1, 0, 0, 0],
                  [0, 1, 0, 0],
                  [0, 0, 1, 0],
                  [0, 0, 0, 1]]
x_one_hot = [one_hot_lookup[x] for x in x_data]

# reshape the inputs to (seqLen, batchSize, inputSize)
inputs = torch.Tensor(x_one_hot).view(seq_len, batch_size, input_size)

# reshape the labels to (seqLen*batchSize, 1)
labels = torch.LongTensor(y_data)


# 模型
class Model(torch.nn.Module):
    def __init__(self, input_size, hidden_size, batch_size, num_layers=1):
        super(Model, self).__init__()
        # 初始化参数
        self.batch_size = batch_size
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.num_layers = num_layers

        # 构造RNN
        self.rnn = torch.nn.RNN(input_size=self.input_size,
                                hidden_size=self.hidden_size,
                                num_layers=num_layers)

    def forward(self, input):
        # shape of hidden: (numLayers, batchSize, hiddenSize)
        hidden = torch.zeros(self.num_layers,
                             self.batch_size,
                             self.hidden_size)
        out, _ = self.rnn(input, hidden)
        return out.view(-1, self.hidden_size)   # reshape out to: (seqLen*batchSize, hiddenSize)


net = Model(input_size, hidden_size, batch_size, num_layers)

# Loss and Optimizer
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.05)


# Trainning steps
for epoch in range(15):
    optimizer.zero_grad()
    outputs = net(inputs)
    loss = criterion(outputs, labels)
    loss.backward()
    optimizer.step()

    _, idx = outputs.max(dim=1)
    idx = idx.data.numpy()
    print('Predicted: ', ''.join([idx2char[x] for x in idx]), end='')
    print(', Epoch [%d/15] loss=%.3f' % (epoch+1, loss.item()))
image-20230703131556914

独热向量的缺陷

  • 维度太高(维度诅咒)
  • one-hot vectors是稀疏的(很多零)
  • 硬编码(每个字符或者词对应哪个向量都是定好的)

是否有方法能够解决上述问题:

  • 低维
  • 稠密
  • 从数据中学习

嵌入层Embedding

  • 将高维的稀疏的样本映射到一个低维的稠密的空间中(数据降维
image-20230703132217716
  • 降维方式;

    image-20230703132745061 image-20230703132855264
image-20230703132954870 image-20230703133035334
  • num_embedding就是one_shot的维度
  • emdedding_dim就是所需要的降维后的维度
image-20230703133227081

代码实现:

import torch

# 参数
num_class = 4
input_size = 4
hidden_size = 8
embedding_size = 10
num_layers = 2
batch_size = 1
seq_len = 5

# 数据集
idx2char = ['e', 'h', 'l', 'o']     # 词典
x_data = [[1, 0, 2, 2, 3]]          # 输入序列hello, (batch, seq_len)
y_data = [3, 1, 2, 3, 2]            # 输出序列ohlol, (batch * seq_len)

# reshape the inputs to (seqLen, batchSize, inputSize)
inputs = torch.LongTensor(x_data)

# reshape the labels to (seqLen*batchSize, 1)
labels = torch.LongTensor(y_data)


# 模型
class Model(torch.nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.emb = torch.nn.Embedding(input_size, embedding_size)

        # 构造RNN
        self.rnn = torch.nn.RNN(input_size=embedding_size,
                                hidden_size=hidden_size,
                                num_layers=num_layers,
                                batch_first=True)
        self.fc = torch.nn.Linear(hidden_size, num_class)

    def forward(self, x):
        hidden = torch.zeros(num_layers,
                             x.size(0),
                             hidden_size)
        x = self.emb(x)     # (batch, seqLen, embeddingSize)
        x, _ = self.rnn(x, hidden)
        x = self.fc(x)
        return x.view(-1, num_class)


net = Model()

# Loss and Optimizer
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.05)


# Trainning steps
for epoch in range(15):
    optimizer.zero_grad()
    outputs = net(inputs)
    loss = criterion(outputs, labels)
    loss.backward()
    optimizer.step()

    _, idx = outputs.max(dim=1)
    idx = idx.data.numpy()
    print('Predicted: ', ''.join([idx2char[x] for x in idx]), end='')
    print(', Epoch [%d/15] loss=%.3f' % (epoch+1, loss.item()))
image-20230703151434753

LSTM

image-20230703151609408 image-20230703151905002
  • LSTM的效果一般会比RNN好得多,但是LSTM的计算更复杂,它的运算性能比较低,时间复杂度比较高
  • 因此产生了一个LSTM和RNN的折中方法:GRU
    • 比RNN性能更好,准确率更高,学习能力更强
    • 比LSTM运算性能更高,时间复杂度更低
image-20230703151945218 image-20230703152326881

二、循环神经网络高级篇(实现一个分类器)

问题:名字分类

image-20230703152722500

上一讲中的循环神经网络结构:

  • 在本问题中要求最后输出一个大的分类,即名字对应的国家,不需要对所有隐层的输出都做线性变换
image-20230703155345332

本问题的网络结构:

image-20230703155705097
  • 使用了GRU模型
image-20230703155932629

代码实现:

Main Cycle:

if __name__ == '__main__':
    # 初始化分类模型
    # N_CHARS是字母表的大小,HIDDEN_SIZE是GRU输出隐层的维度,N_COUNTRY分类数量,N_LAYER是GRU层数
    classifier = RNNClassifier(N_CHARS, HIDDEN_SIZE, N_COUNTRY, N_LAYER)

    # 是否要使用GPU
    if USE_GRU:
        device = torch.device("cuda:0")
        classifier.to(device)

    # Loss(交叉熵) and Optimizer
    criterion = torch.nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(classifier.parameters(), lr=0.001)

    # 记录训练开始时间
    start = time.time()

    print("Training for %d epochs..." % N_EPOCHS)
    acc_list = []
    for epoch in range(1, N_EPOCHS + 1):
        # Train cycle
        trainModel()            # 模型训练
        acc = testModel()       # 模型测试
        acc_list.append(acc)    # 存储测试准确率

    # 绘制测试准确率曲线
    epoch = np.arange(1, len(acc_list) + 1, 1)
    acc_list = np.array(acc_list)
    plt.plot(epoch, acc_list)
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy')
    plt.grid()
    plt.show()

具体实现步骤:

  • Preparing Data and DataLoader

    • 用ASCII表来做词典

      image-20230703161657274
    • 序列长短不一,无法构成张量(做padding处理)

      image-20230703161904387
    • 处理输出分类(国家)

      • 将国家转成分类索引,从0开始
      image-20230703162040658
    class NameDataset(Dataset):
        def __init__(self, is_train_set=True):
            # 读取数据集数据
            filename = 'D:/pycharm_workspace/Liuer_lecturer/dataset/names_train.csv.gz' if is_train_set else 'D:/pycharm_workspace/Liuer_lecturer/dataset/names_test.csv.gz'
            with gzip.open(filename, 'rt') as f:
                reader = csv.reader(f)
                rows = list(reader)     # (name, countries)元组
    
            self.names = [row[0] for row in rows]          # 将名字(数据)提取出来
            self.len = len(self.names)                     # 计算字符长度
            self.countries = [row[1] for row in rows]      # 将名字(数据)对应的标签(国家)提取出来
    
            # 获取国家的索引
            # set将列表转成集合,即去除重复元素;sorted进行排序;list转成列表
            self.country_list = list(sorted(set(self.countries)))
            self.country_dict = self.getCountryDict()              # 将country_list这个列表转成字典
            self.country_num = len(self.country_list)               # 得到国家数量
    
        def __getitem__(self, index):
            # 为数据集提供索引访问
            return self.names[index], self.country_dict[self.countries[index]]
    
        def __len__(self):
            # 返回数据集长度
            return self.len
    
        def getCountryDict(self):
            # 将列表转成字典
            country_dict = dict()
            for idx, country_name in enumerate(self.country_list, 0):
                # 对country_list进行遍历,然后构造键值对
                country_dict[country_name] = idx
            return country_dict
    
        def idx2country(self, index):
            # 根据索引index返回国家的字符串
            return self.country_list[index]
    
        def getCountriesNum(self):
            # 获取国家数量
            return self.country_num
    
    
    trainset = NameDataset(is_train_set=True)
    trainLoader = DataLoader(trainset, batch_size=BATCH_SIZE, shuffle=True)
    testset = NameDataset(is_train_set=False)
    testloader = DataLoader(testset, batch_size=BATCH_SIZE, shuffle=False)
    
    # 模型输出的维度
    N_COUNTRY = trainset.getCountriesNum()
    
  • Model Design

    • 双向(Bi-direction)RNN / LSTM / GRU

      image-20230703172159004 image-20230703172306916
    • 沿着序列的正方向计算,利用了历史信息

      image-20230703172554609
    • 沿着序列的反方向计算一次,输出的隐层再和正向计算输出的隐层进行拼接

      image-20230703173022078
      • 调用GRU的时候有两个输出,一个是out,另一个是hidden
        • out就是各个隐层的输出
        • hidden = [ h N f , h N b ] [h_N^f, h_N^b] [hNf,hNb]
    class RNNClassifier(torch.nn.Module):
        def __init__(self, input_size, hidden_size, output_size, n_layers=1, bidirectional=True):
            super(RNNClassifier, self).__init__()
            # GRU参数
            self.hidden_size = hidden_size
            self.n_layers = n_layers
            # 如果GRU是双向即为2,单向的则为1
            self.n_directions = 2 if bidirectional else 1
    
            # 构造嵌入层。input是(seqLen, batchSize) ; output是(seqLen, batchSize, hiddenSize)
            self.embedding = torch.nn.Embedding(input_size, hidden_size)
    
            # 构造GRU。
            # input:(seqLen, batchSize, hiddenSize); hidden:(nLayers * nDirections, batchSize, hiddenSize)
            # output:(seqLen, batchSize, hiddenSize * nDirections); hidden:(nLayers * nDirections, batchSize, hiddenSize)
            # 第一个hidden_size是输入维度,第二个是输出维度,n_layers是层数,bidirectional用于说明是单向还是双向
            self.gru = torch.nn.GRU(hidden_size, hidden_size, n_layers,
                                    bidirectional=bidirectional)
    
            # 如果是双向的则要乘上self.n_directions
            self.fc = torch.nn.Linear(hidden_size * self.n_directions, output_size)
    
        def _init_hidden(self, batch_size):
            # 根据batch_size构成一个全零的张量hidden
            hidden = torch.zeros(self.n_layers * self.n_directions, batch_size,
                                 self.hidden_size)
            return create_tensor(hidden)
    
        def forward(self, input, seq_lengths):
            # input shape: B x S -> S x B
            input = input.t()   # 转置
            batch_size = input.size(1)  # 保存batch_size以便构造初始隐层
    
            hidden = self._init_hidden(batch_size)
            # result of embedding with shape:(seqLen, batchSize, hiddenSize)
            embedding = self.embedding(input)
    
            # pack them up. 为了提高运行效率
            gru_input = pack_padded_sequence(embedding, seq_lengths)
            
            # gru输出
            output, hidden = self.gru(gru_input, hidden)
            
            if self.n_directions == 2:
                hidden_cat = torch.cat([hidden[-1], hidden[-2]], dim=1)
            else:
                hidden_cat = hidden[-1]
                
            fc_output = self.fc(hidden_cat)
            return fc_output
    

    名字如何转换成张量:

    def make_tensors(names, countries):
        sequences_and_lengths = [name2list(name) for name in names]
        name_sequences = [s1[0] for s1 in sequences_and_lengths]
        seq_lengths = torch.LongTensor([s1[1] for s1 in sequences_and_lengths])
        countries = countries.long()
    
        # make tensor of name, BatchSize x SeqLen
        seq_tensor = torch.zeros(len(name_sequences), seq_lengths.max()).long()
        for idx, (seq, seq_len) in enumerate(zip(name_sequences, seq_lengths), 0):
            seq_tensor[idx, :seq_len] = torch.LongTensor(seq)
    
        # sort by length to use pack_padded_sequence
        seq_lengths, perm_idx = seq_lengths.sort(dim=0, descending=True)
        seq_tensor = seq_tensor[perm_idx]
        countries = countries[perm_idx]
    
        return create_tensor(seq_tensor), create_tensor(seq_tensor), create_tensor(countries)
    
  • One Epoch Training

    def trainModel():
        total_loss = 0
        for i, (names, countries) in enumerate(trainLoader, 1):
            inputs, seq_lengths, target = make_tensors(names, countries)
            output = classifier(inputs, seq_lengths)
            loss = criterion(output, target)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
    
            total_loss += loss.item()
            if i % 10 == 0:
                print(f' [{time_since(start)}] Epoch {epoch} ', end='')
                print(f' [{i * len(inputs)} / {len(trainset)}] ', end='')
                print(f' loss = {total_loss / (i * len(inputs))}', end='')
    
        return total_loss
    
    # 测试
    def testModel():
        correct = 0
        total = len(testset)
        print("evaluating trained model ...")
        # 不需要求梯度
        with torch.no_grad():
            for i, (names, countries) in enumerate(testloader, 1):
                inputs, seq_lengths, target = make_tensors(names, countries)
                output = classifier(inputs, seq_lengths)
                pred = output.max(dim=1, keepdim=True)[1]
                correct += pred.eq(target.view_as(pred)).sum().item()
    
            precent = '%.2f' % (100 * correct / total)
            print(f'Test set: Accuracy {correct} / {total} {precent}%')
        return correct / total
    

    完整代码:

    import csv
    import gzip
    import torch
    import time
    import math
    import numpy as np
    import matplotlib.pyplot as plt
    
    from torch.utils.data import Dataset, DataLoader
    from torch.nn.utils.rnn import pack_padded_sequence
    
    # 参数
    HIDDEN_SIZE = 100
    BATCH_SIZE = 256
    N_LAYER = 2         # GRU层数
    N_EPOCHS = 100      # 训练轮数
    N_CHARS = 128
    USE_GPU = False     # 有GPU的支持则设置为True,没有则False
    
    
    # 构造数据集。NameDataset类继承自Dataset
    class NameDataset(Dataset):
        def __init__(self, is_train_set=True):
            # 读取数据集数据
            filename = 'D:/pycharm_workspace/Liuer_lecturer/dataset/names_train.csv.gz' if is_train_set else 'D:/pycharm_workspace/Liuer_lecturer/dataset/names_test.csv.gz'
            with gzip.open(filename, 'rt') as f:
                reader = csv.reader(f)
                rows = list(reader)     # (name, countries)元组
    
            self.names = [row[0] for row in rows]                   # 将名字(数据)提取出来
            self.len = len(self.names)                              # 计算字符长度
            self.countries = [row[1] for row in rows]               # 将名字(数据)对应的标签(国家)提取出来
    
            # 获取国家的索引
            # set将列表转成集合,即去除重复元素;sorted进行排序;list转成列表
            self.country_list = list(sorted(set(self.countries)))
            self.country_dict = self.getCountryDict()               # 将country_list这个列表转成字典
            self.country_num = len(self.country_list)               # 得到国家数量
    
        def __getitem__(self, index):
            # 为数据集提供索引访问
            return self.names[index], self.country_dict[self.countries[index]]
    
        def __len__(self):
            # 返回数据集长度
            return self.len
    
        def getCountryDict(self):
            # 将列表转成字典
            country_dict = dict()
            for idx, country_name in enumerate(self.country_list, 0):
                # 对country_list进行遍历,然后构造键值对
                country_dict[country_name] = idx
            return country_dict
    
        def idx2country(self, index):
            # 根据索引index返回国家的字符串
            return self.country_list[index]
    
        def getCountriesNum(self):
            # 获取国家数量
            return self.country_num
    
    
    # 实例化数据集
    trainset = NameDataset(is_train_set=True)
    trainLoader = DataLoader(trainset, batch_size=BATCH_SIZE, shuffle=True)
    testset = NameDataset(is_train_set=False)
    testloader = DataLoader(testset, batch_size=BATCH_SIZE, shuffle=False)
    
    # 模型输出的维度
    N_COUNTRY = trainset.getCountriesNum()
    
    
    # RNN模型
    class RNNClassifier(torch.nn.Module):
        def __init__(self, input_size, hidden_size, output_size, n_layers=1, bidirectional=True):
            super(RNNClassifier, self).__init__()
            # GRU参数
            self.hidden_size = hidden_size
            self.n_layers = n_layers
            # 如果GRU是双向即为2,单向的则为1
            self.n_directions = 2 if bidirectional else 1
    
            # 构造嵌入层。input是(seqLen, batchSize) ; output是(seqLen, batchSize, hiddenSize)
            self.embedding = torch.nn.Embedding(input_size, hidden_size)
    
            # 构造GRU。
            # input:(seqLen, batchSize, hiddenSize); hidden:(nLayers * nDirections, batchSize, hiddenSize)
            # output:(seqLen, batchSize, hiddenSize * nDirections); hidden:(nLayers * nDirections, batchSize, hiddenSize)
            # 第一个hidden_size是输入维度,第二个是输出维度,n_layers是层数,bidirectional用于说明是单向还是双向
            self.gru = torch.nn.GRU(hidden_size, hidden_size, n_layers,
                                    bidirectional=bidirectional)
    
            # 如果是双向的则要乘上self.n_directions
            self.fc = torch.nn.Linear(hidden_size * self.n_directions, output_size)
    
        def _init_hidden(self, batch_size):
            # 根据batch_size构成一个全零的张量hidden
            hidden = torch.zeros(self.n_layers * self.n_directions, batch_size,
                                 self.hidden_size)
            return create_tensor(hidden)
    
        def forward(self, input, seq_lengths):
            # input shape: B x S -> S x B
            input = input.t()   # 转置
            batch_size = input.size(1)  # 保存batch_size以便构造初始隐层
    
            hidden = self._init_hidden(batch_size)
            # result of embedding with shape:(seqLen, batchSize, hiddenSize)
            embedding = self.embedding(input)
    
            # pack them up. 为了提高运行效率
            gru_input = pack_padded_sequence(embedding, seq_lengths)
    
            # gru输出
            output, hidden = self.gru(gru_input, hidden)
    
            if self.n_directions == 2:
                hidden_cat = torch.cat([hidden[-1], hidden[-2]], dim=1)
            else:
                hidden_cat = hidden[-1]
    
            fc_output = self.fc(hidden_cat)
            return fc_output
    
    
    def name2list(name):
        arr = [ord(c) for c in name]
        return arr, len(arr)
    
    
    # 判定是否要使用GPU
    def create_tensor(tensor):
        if USE_GPU:
            device = torch.device("cuda:0")
            tensor = tensor.to(device)
        return tensor
    
    
    def make_tensors(names, countries):
        sequences_and_lengths = [name2list(name) for name in names]
        name_sequences = [s1[0] for s1 in sequences_and_lengths]
        seq_lengths = torch.LongTensor([s1[1] for s1 in sequences_and_lengths])
        countries = countries.long()
    
        # make tensor of name, BatchSize x SeqLen
        seq_tensor = torch.zeros(len(name_sequences), seq_lengths.max()).long()
        for idx, (seq, seq_len) in enumerate(zip(name_sequences, seq_lengths), 0):
            seq_tensor[idx, :seq_len] = torch.LongTensor(seq)
    
        # sort by length to use pack_padded_sequence
        seq_lengths, perm_idx = seq_lengths.sort(dim=0, descending=True)
        seq_tensor = seq_tensor[perm_idx]
        countries = countries[perm_idx]
    
        return create_tensor(seq_tensor), create_tensor(seq_tensor), create_tensor(countries)
    
    
    def time_since(sice):
        s = time.time() - sice      # 当前时间减去开始时间,s为单位
        m = math.floor(s / 60)      # 转成分钟
        s -= m*60
        return '%dm %ds' % (m, s)
    
    
    # 一轮训练
    def trainModel():
        total_loss = 0
        for i, (names, countries) in enumerate(trainLoader, 1):
            inputs, seq_lengths, target = make_tensors(names, countries)
            output = classifier(inputs, seq_lengths)
            loss = criterion(output, target)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
    
            total_loss += loss.item()
            if i % 10 == 0:
                print(f' [{time_since(start)}] Epoch {epoch} ', end='')
                print(f' [{i * len(inputs)} / {len(trainset)}] ', end='')
                print(f' loss = {total_loss / (i * len(inputs))}', end='')
    
        return total_loss
    
    
    # 测试
    def testModel():
        correct = 0
        total = len(testset)
        print("evaluating trained model ...")
        # 不需要求梯度
        with torch.no_grad():
            for i, (names, countries) in enumerate(testloader, 1):
                inputs, seq_lengths, target = make_tensors(names, countries)
                output = classifier(inputs, seq_lengths)
                pred = output.max(dim=1, keepdim=True)[1]
                correct += pred.eq(target.view_as(pred)).sum().item()
    
            precent = '%.2f' % (100 * correct / total)
            print(f'Test set: Accuracy {correct} / {total} {precent}%')
        return correct / total
    
    
    if __name__ == '__main__':
        # 初始化分类模型
        # N_CHARS是字母表的大小,HIDDEN_SIZE是GRU输出隐层的维度,N_COUNTRY分类数量,N_LAYER是GRU层数
        classifier = RNNClassifier(N_CHARS, HIDDEN_SIZE, N_COUNTRY, N_LAYER)
    
        # 是否要使用GPU
        if USE_GPU:
            device = torch.device("cuda:0")
            classifier.to(device)
    
        # Loss(交叉熵) and Optimizer
        criterion = torch.nn.CrossEntropyLoss()
        optimizer = torch.optim.Adam(classifier.parameters(), lr=0.001)
    
        # 记录训练开始时间
        start = time.time()
    
        print("Training for %d epochs..." % N_EPOCHS)
        acc_list = []
        for epoch in range(1, N_EPOCHS + 1):
            # Train cycle
            trainModel()            # 模型训练
            acc = testModel()       # 模型测试
            acc_list.append(acc)    # 存储测试准确率
    
        # 绘制测试准确率曲线
        epoch = np.arange(1, len(acc_list) + 1, 1)
        acc_list = np.array(acc_list)
        plt.plot(epoch, acc_list)
        plt.xlabel('Epoch')
        plt.ylabel('Accuracy')
        plt.grid()
        plt.show()
    

    因为pytorch版本升级的原因代码里面有一个错误,目前还没跑通。有时间解决后再更新……

image-20230703184330576

https://blog.csdn.net/weixin_45084253/article/details/122269699

  • 0
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值