RNN和LSTM

Lin-Fighting

已于 2024-11-23 16:44:12 修改

阅读量682

点赞数 25

分类专栏：深度学习——基础补充文章标签：深度学习 rnn 人工智能 lstm

于 2024-11-23 16:43:14 首次发布

本文链接：https://blog.csdn.net/weixin_62665562/article/details/143994685

版权

深度学习——基础补充专栏收录该内容

4 篇文章

订阅专栏

RNN

什么是循环神经网路

Recurrent Neural Networ(RNN),是一类具有内部环状连接的人工神经网络。用于处理序列数据。

简单代码示例

# 一个简单的RNN结构示例
class SimpleRNN(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(SimpleRNN, self).__init__()
        self.rnn = nn.RNN(input_size, hidden_size, batch_first=True)
        
    def forward(self, x):
        out, _ = self.rnn(x)
        return out

网络结构

基础结构

有点像全连接层，但是在全连接层的基础上，将隐藏层最后的值多考虑了上一次隐藏层的值。也就是得到的最后的输出值，即考虑了现在的输入也考虑了以前的输入。具有记忆功能。

上图假设输入的第一个序列有两个维度分别为x1, x2, 中间有个隐藏层(hiddenl layer)有着记忆功能，最后隐藏层输出预测值。

Example

假设输入一连串序列，所有的权重都为1，没有bias。所有的激活函数都为f(x) = x。隐藏层初始化为[0,0]

先输入第一个序列[1,1]，
a1 = a1*1 + a2*1+x1*1+x2*1
= 0 + 0 + 1 + 1 = 2

a2 = a1 + a2 + x1 + x2
= 0 + 0 + 1 + 1 = 2

y1 = a1 + a2 = 4
y2 = a1 + a2 = 4
输入第二个序列 [1, 1]
a1 = a1 + a2 + x1 + x2
= 2 + 2 + 1 + 1
= 6
a2 = a1 + a2 + x1 + x2
= 2 + 2 + 1 + 1
= 6

y1 = a1 + a2 = 12

y2 = a1 + a2 = 12
输入第三个序列 [2, 2]
a1 = a1 + a2 + x1 + x2
= 6+ 6 + 2 + 2
= 16

a2 = a1 + a2 + x1 + x2
= 6+ 6 + 2 + 2
= 16
y1 = a1 + a2 = 32

y2 = a1 + a2 = 32

实际例子的直观运算过程，需要注意的是，每一次序列的运算用的都是同一组参数。（下图同种颜色的箭头用的参数代表同一组参数）

当然这个隐藏层可以是多层组成的

不同变形

Elman Network(前面讲的)

上一次hidden layer的输出作为这次hidden layer的输入。

Jordan Network

上一次的ouput的值作为下次hidden layer的输入。

Bidirectional RNN(双向RNN)

就是先运算正向RNN和反向RNN，最后结合正向和反向得到最后的输出值。分别考虑了序列的前面和后面（结合上下文）。

Long Short-term Memory(LSTM)

基本组成

有三个部分(3个阀门)组成：

Input Gate
Output Gate
Forget Gate

一共有4 Inputs, 1 Output

仔细来看，激活函数通常为sigmoid, 范围在0~1之间，代表阀门的打开程度。

整个运算过程，

首先输入z->g(z), zi->f(zi), out1 = g(z)f(zi),

然后 zf->f(zf), c’=c*f(zf) + out1,

最后z0->f(z0), c’=h(c’), out = h(c’)f(z0)

Example

四个Input的来源是序列的输入分别乘以四组不同的权重得来的。这些权重是可训练的。

第一个序列输入

计算得到
第二个序列输入

计算得到
第三个序列输入

计算得到
依次类推

完成LSTM组成

对比原来的RNN

LSTM

完整的LSTM

完整的LSTM会将上一次的输出作为这次的输入，

同时会将Cell里面的值作为这次的输入。

上图就是完整的LSTM形态。

Pytorch实现RNN

RNN代码讲解

首先数据集x:[seq_len, batch, feature_len], xt:[batch, feature_len]

假设数据集x为10个序列，3个batch, 100维特征

每次的输入xt为3个batch，100维特征。

计算过程：

hidden_len=20

x(t)@w(xh) + h(t)@w(hh)
= [3, 100] @ [20, 100].T + [3, 20] @ [20, 20].T

=[3, 20] + [3,20]

= [3, 20]

Pytorch函数

forward前向传播

forward一步到位。

代码示例

import torch
from torch import nn

rnn = nn.RNN(input_size=100, hidden_size=10, num_layers=1)
print(rnn)

x = torch.randn(10, 3, 100) # 10 seq, 3 batch, 100 dim
out, h = rnn(x)
print(out.shape, h.shape)

运行结果

多层隐藏层代码

import torch
from torch import nn

rnn = nn.RNN(input_size=100, hidden_size=10, num_layers=4)
print(rnn)

x = torch.randn(10, 3, 100) # 10 seq, 3 batch, 100 dim
out, h = rnn(x)
print(out.shape, h.shape)

单次计算每次序列

代码

import torch
from torch import nn

rnn = nn.RNNCell(input_size=100, hidden_size=10)
print(rnn)

x = torch.randn(10, 3, 100) # 10 seq, 3 batch, 100 dim
ht = torch.zeros(3, 10)
for xt in x:
    ht = rnn(xt, ht)
print(ht.shape)

多层cell代码实现

import torch
from torch import nn

rnn1 = nn.RNNCell(input_size=100, hidden_size=10)
print(rnn1)
rnn2 = nn.RNNCell(input_size=10, hidden_size=20)

x = torch.randn(10, 3, 100) # 10 seq, 3 batch, 100 dim
ht1 = torch.zeros(3, 10)
ht2 = torch.zeros(3, 20)

for xt in x:
    ht1 = rnn1(xt, ht1)
    ht2 = rnn2(ht1, ht2)

print(ht1.shape) 
print(ht2.shape)

RNN实战

时间序列预测

预测正弦曲线的下一段波形

数据 [seq, batch, dim], [50, 1, 1]

随机start, random(防止对拟合数据记忆)

假如给出0_{49的数据，需要预测1}50的数据形成下一段波形数, 或者更难预测 0+step~49+step的数据。

数据生成代码

start = np.random.randint(3, size=1)[0] # 随机取起始点进行采集数据, 如果有固定点, 会对数据进行记忆
time_steps = np.linspace(start, start+10, num_time_steps) # 生成数据X, 假设数据从0~50
data = np.sin(time_steps) # 生成标签数据Y
data = data.reshape(num_time_steps, 1) # 标签
x = torch.tensor(data[:-1]).float().view(1, num_time_steps - 1, 1) # 生成数据X, 获取数据从0~49 
y = torch.tensor(data[1:]).float().view(1, num_time_steps - 1, 1) # 生成标签Y, 标签数据从1~50

网络模型代码

# net module
class Net(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers=1):
        super().__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.num_layers = num_layers
        self.rnn = nn.RNN(
            input_size = input_size, 
            hidden_size = hidden_size,
            num_layers = num_layers,
            batch_first = True, # batch是否在第一个维度, [seq, batch, dim]: False, [batch, seq, dim]: True 
        )
        self.linear = nn.Linear(hidden_size, output_size)

    def forward(self, x, hidden_prev):
        out, hidden_prev = self.rnn(x, hidden_prev) # out shape: [batch, seq, hidden_size], hidden_prev shape: [batch, num_layers, hidden_size]
        out = out.view(-1, self.hidden_size) # 将Out展平
        out = self.linear(out) # [batch*seq, hidden_size] => [batch*seq, output_size]
        out = out.unsqueeze(dim=0) # 插入一个维度, [1, batch*seq, output_size]
        return out, hidden_prev

全部代码

import numpy as np
import torch
from torch import nn
from torch import optim
from tqdm import tqdm
from matplotlib import pyplot as plt

num_time_steps = 50
input_size = 1
hidden_size = 16
output_size = 1
lr = 0.001
num_layers = 2



# net module
class Net(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers=1):
        super().__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.num_layers = num_layers
        self.rnn = nn.RNN(
            input_size = input_size, 
            hidden_size = hidden_size,
            num_layers = num_layers,
            batch_first = True, # batch是否在第一个维度, [seq, batch, dim]: False, [batch, seq, dim]: True 
        )
        self.linear = nn.Linear(hidden_size, output_size)

    def forward(self, x, hidden_prev):
        out, hidden_prev = self.rnn(x, hidden_prev) # out shape: [batch, seq, hidden_size], hidden_prev shape: [batch, num_layers, hidden_size]
        out = out.view(-1, self.hidden_size) # 将Out展平
        out = self.linear(out) # [batch*seq, hidden_size] => [batch*seq, output_size]
        out = out.unsqueeze(dim=0) # 插入一个维度, [1, batch*seq, output_size]
        return out, hidden_prev

model = Net(input_size, hidden_size, output_size, num_layers)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr)

hidden_prev = torch.zeros(num_layers, 1, hidden_size)

for iter in tqdm(range(6000)):
    start = np.random.randint(3, size=1)[0] # 随机取起始点进行采集数据, 如果有固定点, 会对数据进行记忆
    time_steps = np.linspace(start, start+10, num_time_steps) # 生成数据X, 假设数据从0~50
    data = np.sin(time_steps) # 生成标签数据Y
    data = data.reshape(num_time_steps, 1) # 标签
    x = torch.tensor(data[:-1]).float().view(1, num_time_steps - 1, 1) # 生成数据X, 获取数据从0~49 
    y = torch.tensor(data[1:]).float().view(1, num_time_steps - 1, 1) # 生成标签Y, 标签数据从1~50

    output, hidden_prev = model(x, hidden_prev)
    hidden_prev = hidden_prev.detach()

    loss = criterion(output, y)
    model.zero_grad()
    loss.backward()
    optimizer.step()

    if iter % 100 == 0:
        print('Iteration: {} loss {}'.format(iter, loss.item()))

start = np.random.randint(3, size=1)[0] # 随机取起始点进行采集数据, 如果有固定点, 会对数据进行记忆
time_steps = np.linspace(start, start+10, num_time_steps) # 生成数据X, 假设数据从0~50
data = np.sin(time_steps) # 生成标签数据Y
data = data.reshape(num_time_steps, 1) # 标签
x = torch.tensor(data[:-1]).float().view(1, num_time_steps - 1, 1) # 生成数据X, 获取数据从0~49 
y = torch.tensor(data[1:]).float().view(1, num_time_steps - 1, 1) # 生成标签Y, 标签数据从1~50

predictions = []
input = x[:, 0, :]
for _ in tqdm(range(x.shape[1])):
    input = input.view(1,1,1)
    (pred, hidden_prev) = model(input, hidden_prev)
    input = pred
    predictions.append(pred.detach().numpy().ravel()[0])

x = x.data.numpy().ravel()
y = y.data.numpy()
plt.scatter(time_steps[:-1], x.ravel(), s=90)
plt.plot(time_steps[:-1], x.ravel())

plt.scatter(time_steps[1:], predictions)
plt.show()

结果展示

LSTM代码讲解

简单回顾

源码

参数

LSTM forward

示例代码

import torch
from torch import nn

lstm = nn.LSTM(input_size=100, hidden_size=20, num_layers=4)
print(lstm)

x = torch.randn(10, 3, 100) # 10 seq, 3 batch, 100 dim
out, (h, c) = lstm(x)

print(out.shape, h.shape, c.shape)    

lstm1 = nn.LSTM(input_size=100, hidden_size=20, num_layers=4, batch_first=True)
print(lstm1)

x = torch.randn(3, 10, 100) # 10 seq, 3 batch, 100 dim
out, (h, c) = lstm1(x)

print(out.shape, h.shape, c.shape)