rnn-的最简单例子 hihello程序 pytorch教程之RNN

最新推荐文章于 2024-01-16 12:03:37 发布

www_z_dd

最新推荐文章于 2024-01-16 12:03:37 发布

阅读量922

点赞数

文章标签： python 深度学习

本文链接：https://blog.csdn.net/www_z_dd/article/details/106580541

版权

没有看到有人解释pytorch实现的RNN模型的hello，所以我来写一个。hello是一个很简单的RNN模型的程序，功能就是喂给模型hihell，模型会输出ihello。

首先导入相关的包。

import sys
import torch
import torch.nn as nn
from torch.autograd import Variable

初始化参数和input、labels

torch.manual_seed(777)  # reproducibility
#            0    1    2    3    4
idx2char = ['h', 'i', 'e', 'l', 'o']
# Teach hihell -> ihello
x_data = [0, 1, 0, 2, 3, 3]   # hihell
one_hot_lookup = [[1, 0, 0, 0, 0],  # 0
                  [0, 1, 0, 0, 0],  # 1
                  [0, 0, 1, 0, 0],  # 2
                  [0, 0, 0, 1, 0],  # 3
                  [0, 0, 0, 0, 1]]  # 4

y_data = [1, 0, 2, 3, 3, 4]    # ihello
x_one_hot = [one_hot_lookup[x] for x in x_data]
inputs = Variable(torch.Tensor(x_one_hot))
labels = Variable(torch.LongTensor(y_data))
num_classes = 5
input_size = 5  # one-hot size
hidden_size = 5  # output from the RNN. 5 to directly predict one-hot
batch_size = 1   # one sentence
sequence_length = 1  # One by one
num_layers = 1  # one-layer rnn

idx2char可以利用索引取出对应的字母。只有打印的时候会用到这个数组。
x_data就是要喂给模型的数据，他的每一个元素作为idx2char的索引，就可以取出一个字母。但是要对他进行one-hot编码。
对这个离散型变量进行one-hot编码，也可以embedding编码，这里用了比较简单的one-hot
Y_data就是target data，希望model的输出会是这个数据。
接下来是一些变量的定义，一共五个字母，所以每个字母用一个五维的字母来表示。所以model的input_size和hidden_size都是5. batch_size=1表示一次喂一个句子。sequence_length = 1表示一次喂一个字母进去。

模型的定义

然后就是模型的定义，继承nn.Module，然后实现这三个函数就可以。初始化hidden是把第一个输入的hidden初始化为0.

class Model(nn.Module):

    def __init__(self):
        super(Model, self).__init__()
        self.rnn = nn.RNN(input_size=input_size,
                          hidden_size=hidden_size, batch_first=True)

    def forward(self, hidden, x):
        # Reshape input (batch first)
        x = x.view(batch_size, sequence_length, input_size)

        # Propagate input through RNN
        # Input: (batch, seq_len, input_size)
        # hidden: (num_layers * num_directions, batch, hidden_size)
        out, hidden = self.rnn(x, hidden) #out:torch.Size([1, 1, 5])
        return hidden, out.view(-1, num_classes)  #return 的out的大小torch.Size([1, 5])

    def init_hidden(self):
        # Initialize hidden and cell states
        # (num_layers * num_directions, batch, hidden_size)
        return Variable(torch.zeros(num_layers, batch_size, hidden_size))

创建这个model的实例，可以把它打印出来看一看。
CrossEntropyLoss来计算loss，Adam来做update。

# Instantiate RNN model
model = Model()
print(model)
# Set loss and optimizer function
# CrossEntropyLoss = LogSoftmax + NLLLoss
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)
optimizer.zero_grad()
hidden = model.init_hidden()

Model(
  (rnn): RNN(5, 5, batch_first=True)
)

    loss=0
    print("predicted string: ")
    for input, label in zip(inputs, labels):
        # print(input.size(), label.size())
        label=label.view(1)
        hidden, output = model(hidden, input)
        val, idx = output.max(1) #找到output里最大的值和他的索引，这就是预测的字符，然后再把他的值和label做crossentropy
        sys.stdout.write(idx2char[idx.data[0]])
        loss += criterion(output, label)

predicted string: 
llllee

首先看一看初始的参数，可以做出什么效果的预测。我们把hihell喂进去得到的是lllll

开始训练模型

optimizer.zero_grad()把梯度置零，也就是把loss关于weight的导数变成0. 因为当我们使用loss.backward()和opimizer.step()进行梯度下降更新参数的时候，梯度并不会自动清零。

然后依次喂这五个字母进去，分别把得到的output打印出来，计算loss，把loss累加起来再做反向传播，计算梯度，优化。
其中一个output可能是[[ 0.3700, -0.3483, -0.1920, 0.8472, 0.5825]]这样，一个label可能是【3】，这是预测正确的情况，这两个做crossentropy是因为pytorch对他的实现是包括softmax的。

for epoch in range(100):
    optimizer.zero_grad()
    loss = 0
    hidden = model.init_hidden()

    print("----------------------------predicted string: ")
    for input, label in zip(inputs, labels):
        # print(input.size(), label.size())
        label=label.view(1)
        hidden, output = model(hidden, input)
        val, idx = output.max(1) #找到output里最大的值和他的索引，这就是预测的字符，然后再把他的值和label做crossentropy
        sys.stdout.write(idx2char[idx.data[0]])
        loss += criterion(output, label)
        

    if epoch%10==0:
        print(", epoch: %d, loss: %1.3f" % (epoch + 1, loss.item()))

    loss.backward()
    optimizer.step()

----------------------------predicted string: 
llllee, epoch: 1, loss: 9.814
----------------------------predicted string: 
llllll----------------------------predicted string: 
llllll----------------------------predicted string: 
ihllll----------------------------predicted string: 
ihllll----------------------------predicted string: 
ihelll----------------------------predicted string: 
ihelll----------------------------predicted string: 
ihello----------------------------predicted string: 
ihello----------------------------predicted string: 
ihello----------------------------predicted string: 
ihello, epoch: 11, loss: 4.266
----------------------------predicted string: 


ihello, epoch: 91, loss: 2.800
----------------------------predicted string: 
ihello----------------------------predicted string: 
ihello----------------------------predicted string: 
ihello----------------------------predicted string: 
ihello----------------------------predicted string: 
ihello----------------------------predicted string: 
ihello----------------------------predicted string: 
ihello----------------------------predicted string: 
ihello----------------------------predicted string: 
ihello

预测

最后，用train好的模型来预测一发（虽然也是在这个hihell上面）

    loss=0
    print("predicted string: ")
    for input, label in zip(inputs, labels):
        # print(input.size(), label.size())
        label=label.view(1)
        hidden, output = model(hidden, input)
        val, idx = output.max(1) #找到output里最大的值和他的索引，这就是预测的字符，然后再把他的值和label做crossentropy
        sys.stdout.write(idx2char[idx.data[0]])
        loss += criterion(output, label)

predicted string: 
lhello

loss

tensor(3.9026, grad_fn=<AddBackward0>)

没有预测正确，原来是因为没有对hidden进行初始化

    hidden = model.init_hidden()
    loss=0
    print("predicted string: ")
    for input, label in zip(inputs, labels):
        # print(input.size(), label.size())
        label=label.view(1)
        hidden, output = model(hidden, input)
        val, idx = output.max(1) #找到output里最大的值和他的索引，这就是预测的字符，然后再把他的值和label做crossentropy
        sys.stdout.write(idx2char[idx.data[0]])
        loss += criterion(output, label)

predicted string: 
ihello

www_z_dd

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
rnn-的最简单例子 hihello程序 pytorch教程之RNN

没有看到有人解释pytorch实现的RNN模型的hello，所以我来写一个。hello是一个很简单的RNN模型的程序，功能就是喂给模型hihell，模型会输出ihello。首先导入相关的包。import sysimport torchimport torch.nn as nnfrom torch.autograd import Variable初始化参数和input、labelstorch.manual_seed(777) # reproducibility#
复制链接

扫一扫