【pytorch】(十一)循环神经网络(RNN)

循环神经网络(RNN)

对于循环神经网络,网上的讲解都是按照时间线展开进行的:

在这里插入图片描述
图片来自网络

我个人觉得这样复杂化了。然后,又有人用以下图片来解释这种复杂化:

在这里插入图片描述
图片来自网络

看起来挺酷。

其实,循环神经网络没那么复杂。下面,本文将以“序列动,网络不动”的角度展示循环神经网络的前向传播,并有前馈神经网络作为对照。然后,同样以“序列动,网络不动”的角度清晰地展示双向循环神经网络的前向传播。

RNN与前馈神经网络对比

循环神经网络(RNN)前馈神经网络
网络简图
输入时间序列: X = [ X 1 . X 2 , X 3 , . . . , X t − 1 , X t . . . , X n ] X=[X_1.X_2,X_3,...,X_{t-1},X_{t}...,X_n] X=[X1.X2,X3,...,Xt1,Xt...,Xn]非时间序列: X X X

前向传播(不考虑偏置 b b b f f f为激活函数)
(1)前馈神经网络:

O = f ( V H ) = f ( V f ( U X ) ) O=f(VH)=f(Vf(UX)) O=f(VH)=f(Vf(UX))

(2)循环神经网络:

X X X的元素依次输入,计算
H 1 = f ( U X 1 + W H 0 ) , H 0 常 设 为 0 H 2 = f ( U X 2 + W H 1 ) = f ( U X 2 + W U X 1 ) H 3 = f ( U X 3 + W H 2 ) = f ( U X 3 + W U ( U X 2 + W H 1 ) = f ( U X 3 + W U ( U X 2 + W U X 1 ) ) . . . H t = f ( U X t + W H t − 1 ) . . . \begin{aligned} H_1&=f(UX_1+WH_0),H_0常设为0\\ H_2&=f(UX_2+WH_1)=f(UX_2+WUX_1)\\ H_3&=f(UX_3+WH_2)=f(UX_3+WU(UX_2+WH_1)\\ &=f(UX_3+WU(UX_2+WUX_1))\\ ...\\ H_t&=f(UX_t+WH_{t-1})\\ ... \end{aligned} H1H2H3...Ht...=f(UX1+WH0),H00=f(UX2+WH1)=f(UX2+WUX1)=f(UX3+WH2)=f(UX3+WU(UX2+WH1)=f(UX3+WU(UX2+WUX1))=f(UXt+WHt1)
得到:
H = [ H 1 . H 2 , H 3 , . . . , H t − 1 , H t . . . , H n ] H=[H_1.H_2,H_3,...,H_{t-1},H_{t}...,H_n] H=[H1.H2,H3,...,Ht1,Ht...,Hn]
后接一个前馈网络:
O 1 = f ( V H 1 ) O 2 = f ( V H 2 ) O 3 = f ( V H 3 ) . . . O t = f ( V H t ) . . . \begin{aligned} O_1&=f(VH_1)\\ O_2&=f(VH_2)\\ O_3&=f(VH_3)\\ ...\\ O_t&=f(VH_t)\\ ... \end{aligned} O1O2O3...Ot...=f(VH1)=f(VH2)=f(VH3)=f(VHt)
输出:
O = [ O 1 . O 2 , O 3 , . . . , O t − 1 , O t . . . , O n ] O=[O_1.O_2,O_3,...,O_{t-1},O_{t}...,O_n] O=[O1.O2,O3,...,Ot1,Ot...,On]

就这么简单,下面举两个例子。

Pytorch实现

序列到类别

将图像(28x28)自上而下视为一个时间序列

X = [ X 1 . X 2 , X 3 , . . . , X t − 1 , X t . . . , X 28 ] X=[X_1.X_2,X_3,...,X_{t-1},X_{t}...,X_{28}] X=[X1.X2,X3,...,Xt1,Xt...,X28]

其中 X t X_t Xt为图像第 t t t行的像素。

序列输入循环神经网络,输出
H = [ H 1 . H 2 , H 3 , . . . , H t − 1 , H t . . . , H 28 ] H=[H_1.H_2,H_3,...,H_{t-1},H_{t}...,H_{28}] H=[H1.H2,H3,...,Ht1,Ht...,H28]

取其中的 H 28 H_{28} H28输入前馈神经网络,得到分类预测值。

import torch
import torch.nn as nn
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
import matplotlib

torch.manual_seed(1)
matplotlib.rcParams['font.family'] = 'SimHei'
matplotlib.rcParams['axes.unicode_minus'] = False
plt.rcParams['figure.dpi'] = 150
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(device)


class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, n_class):
        super(RNN, self).__init__()
        self.num_layers = num_layers
        self.hidden_size = hidden_size
        self.rnn = nn.RNN(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            batch_first=True,
            nonlinearity='relu'
        )
        self.linear = nn.Linear(hidden_size, n_class)

    def forward(self, x):
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)
        out, hn = self.rnn(x, h0)
        out = out[:, -1, :]  # = hn
        out = self.linear(out)
        return out


def test(dataset_test, model):
    correct = 0
    total_num = len(dataset_test)
    test_loader = DataLoader(dataset=dataset_test, batch_size=64, shuffle=False)
    for (test_images, test_labels) in test_loader:
        test_images = test_images.squeeze(1)
        test_images = test_images.to(device)
        test_labels = test_labels.to(device)
        predicted = model(test_images)
        _, predicted = torch.max(predicted, 1)
        correct += torch.sum(predicted == test_labels)
    accuracy = correct.item() / total_num
    print('测试准确率:{:.5}'.format(accuracy))
    return accuracy


def train():
    accuracy_list = []
    learning_rate = 0.001
    batch_size = 128

    model = RNN(input_size=28, hidden_size=128, num_layers=1, n_class=10)  # 图片大小是28x28
    model = model.to(device)

    dataset_train = datasets.MNIST(root='./mnist', transform=transforms.ToTensor(), train=True, download=False)
    dataset_test = datasets.MNIST(root='./mnist', transform=transforms.ToTensor(), train=False, download=False)

    train_loader = DataLoader(dataset=dataset_train, batch_size=batch_size, shuffle=True)

    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
    loss_f = nn.CrossEntropyLoss()
    for epoch in range(20):
        model.train()
        for step, (train_images, train_labels) in enumerate(train_loader):
            train_images = train_images.squeeze(1)  # [64, 1, 28, 28]->[64, 28, 28]
            train_images = train_images.to(device)
            train_labels = train_labels.to(device)
            predicted = model(train_images)
            loss = loss_f(predicted, train_labels)

            # 反向传播,更新参数
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            if (step % 100 == 0):
                print('epoch:{},step:{},loss:{}'.format(epoch + 1, step, loss.item()))

        model.eval()
        accuracy = test(dataset_test, model)
        accuracy_list.append(accuracy)
    plt.plot(accuracy_list)
    plt.title('准确率')
    plt.show()


if __name__ == '__main__':
    train()




结果:
在这里插入图片描述

序列到序列

训练:
正弦波预测。有一正弦波序列 S = [ S 1 . S 2 , S 3 , . . . , S t − 1 , S t . . . , S n ] S=[S_1.S_2,S_3,...,S_{t-1},S_{t}...,S_n] S=[S1.S2,S3,...,St1,St...,Sn]。令
X = [ S 1 . S 2 , S 3 , . . . , S t − 1 , S t . . . , S n − 1 ] X=[S_1.S_2,S_3,...,S_{t-1},S_{t}...,S_{n-1}] X=[S1.S2,S3,...,St1,St...,Sn1]

Y = [ S 2 , S 3 , . . . , S t − 1 , S t . . . , S n ] Y=[S_2,S_3,...,S_{t-1},S_{t}...,S_{n}] Y=[S2,S3,...,St1,St...,Sn]

X X X输入网络,输出
H = [ H 1 . H 2 , H 3 , . . . , H t − 1 , H t . . . , H n − 1 ] H=[H_1.H_2,H_3,...,H_{t-1},H_{t}...,H_{n-1}] H=[H1.H2,H3,...,Ht1,Ht...,Hn1]
输入前馈网络,得到输出:
O = [ O 1 . O 2 , O 3 , . . . , O t − 1 , O t . . . , O n − 1 ] O=[O_1.O_2,O_3,...,O_{t-1},O_{t}...,O_{n-1}] O=[O1.O2,O3,...,Ot1,Ot...,On1]

最小化 O O O Y Y Y之间的误差。

测试:输入[0],输出正弦波序列

import torch
import torch.nn as nn
import numpy as np
import torch.optim as optim
from matplotlib import pyplot as plt
import matplotlib

torch.manual_seed(1)
np.random.seed(1)
matplotlib.rcParams['font.family'] = 'SimHei'
matplotlib.rcParams['axes.unicode_minus'] = False
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(device)


class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size):
        super(RNN, self).__init__()
        self.num_layers = num_layers
        self.hidden_size = hidden_size

        self.rnn = nn.RNN(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            batch_first=True,
            bidirectional=False,
        )
        self.linear = nn.Linear(hidden_size, output_size)

    def forward(self, inputs, h0):
        # x: [batch_size, seq_len, input_size]
        # h0: [num_layers, batch_size, hidden_size]
        outputs, hn = self.rnn(inputs, h0)
        # out: [batch_size, seq_len, hidden_size]
        # hn: [num_layers, batch_size, hidden_size]

        # [batch, seq_len, hidden_size] => [batch * seq_len, hidden_size]
        outputs = outputs.view(-1, self.hidden_size)

        # [batch_size * seq_len, hidden_size] => [batch_size * seq_len, output_size]
        outputs = self.linear(outputs)

        # [batch_size * seq_len, output_size] => [batch_size, seq_len, output_size]
        outputs = outputs.view(inputs.size())

        return outputs, hn


def test(model, seq_len, h0):
    input = torch.tensor(0, dtype=torch.float)
    predictions = []
    for _ in range(seq_len):
        input = input.view(1, 1, 1)
        prediction, hn = model(input, h0)
        input = prediction
        h0 = hn
        predictions.append(prediction.detach().numpy().ravel()[0])
    # 画图
    time_steps = np.linspace(0, 10, seq_len)
    data = np.sin(time_steps)
    a, = plt.plot(time_steps, data, color='b')
    b = plt.scatter(time_steps[0], 0, color='black', s=10)
    c = plt.scatter(time_steps[1:], predictions[1:], color='red', s=10)
    plt.legend([a, b, c], ['正弦波', '输入', '预测'])
    plt.show()


def train():
    seq_len = 100
    input_size = 1
    hidden_size = 16
    output_size = 1
    num_layers = 1
    lr = 0.01

    model = RNN(input_size, hidden_size, num_layers, output_size)
    loss_function = nn.MSELoss()
    optimizer = optim.Adam(model.parameters(), lr)
    h0 = torch.zeros(num_layers, input_size, hidden_size)

    for i in range(3000):
        # 训练的数据:batch_size=1
        time_steps = np.linspace(0, 10, seq_len + 1)
        data = np.sin(time_steps)
        data = data.reshape(seq_len + 1, 1)
        # 去掉最后一个元素,作为输入
        inputs = torch.tensor(data[:-1]).float().view(1, seq_len, 1)
        # 去掉第一个元素,作为目标值 [batch_size, seq_len, input_size]
        target = torch.tensor(data[1:]).float().view(1, seq_len, 1)

        output, hn = model(inputs, h0)

        # 与上一个批次的计算图分离 https://www.cnblogs.com/catnofishing/p/13287322.html
        hn.detach()

        loss = loss_function(output, target)
        model.zero_grad()
        loss.backward()
        optimizer.step()

        if i % 100 == 0:
            print("迭代次数: {} loss {}".format(i, loss.item()))

    test(model, seq_len, h0)


if __name__ == '__main__':
    train()


结果:
在这里插入图片描述

双向RNN

在这里插入图片描述

原序列
H 1 = f ( U X 1 + W H 0 ) , H 0 常 设 为 0 H 2 = f ( U X 2 + W H 1 ) ) = f ( U X 2 + W U X 1 ) H 3 = f ( U X 3 + W H 2 ) ) = f ( U X 3 + W U ( U X 2 + W H 1 ) ) = f ( U X 3 + W U ( U X 2 + W U X 1 ) ) . . . H t = f ( U X t + W H t − 1 ) . . . \begin{aligned} H_1&=f(UX_1+WH_0),H_0常设为0\\ H_2&=f(UX_2+WH_1))=f(UX_2+WUX_1)\\ H_3&=f(UX_3+WH_2))=f(UX_3+WU(UX_2+WH_1))\\ &=f(UX_3+WU(UX_2+WUX_1))\\ ...\\ H_t&=f(UX_t+WH_{t-1})\\ ... \end{aligned} H1H2H3...Ht...=f(UX1+WH0)H00=f(UX2+WH1))=f(UX2+WUX1)=f(UX3+WH2))=f(UX3+WU(UX2+WH1))=f(UX3+WU(UX2+WUX1))=f(UXt+WHt1)
得到:
H = [ H 1 . H 2 , H 3 , . . . , H t − 1 , H t . . . , H T ] H=[H_1.H_2,H_3,...,H_{t-1},H_{t}...,H_T] H=[H1.H2,H3,...,Ht1,Ht...,HT]
倒序序列
H n ′ = f ( U ′ X n + W ′ H n + 1 ′ ) , H n + 1 ′ 常 设 为 0 H n − 1 ′ = f ( U ′ X n − 1 + W H n ′ ) ) = f ( U ′ X n − 1 + W ′ U ′ X n ) H n − 2 = f ( U ′ X n − 2 + W ′ H n − 1 ) ) = f ( U ′ X n − 2 + W ′ U ′ ( U ′ X n − 1 + W ′ H n ) ) = f ( U ′ X n − 2 + W ′ U ′ ( U ′ X n − 1 + W ′ U ′ X n ) ) . . . H t ′ = f ( U ′ X t + W ′ H t + 1 ) . . . \begin{aligned} H'_n&=f(U'X_n+W'H'_{n+1}),H'_{n+1}常设为0\\ H'_{n-1}&=f(U'X_{n-1}+WH'_{n}))=f(U'X_{n-1}+W'U'X_n)\\ H_{n-2}&=f(U'X_{n-2}+W'H_{n-1}))=f(U'X_{n-2}+W'U'(U'X_{n-1}+W'H_{_n}))\\ &=f(U'X_{n-2}+W'U'(U'X_{n-1}+W'U'X_n))\\ ...\\ H'_t&=f(U'X_t+W'H_{t+1})\\ ... \end{aligned} HnHn1Hn2...Ht...=f(UXn+WHn+1)Hn+10=f(UXn1+WHn))=f(UXn1+WUXn)=f(UXn2+WHn1))=f(UXn2+WU(UXn1+WHn))=f(UXn2+WU(UXn1+WUXn))=f(UXt+WHt+1)
得到:
H ′ = [ H 1 ′ . H 2 ′ , H 3 ′ , . . . , H t − 1 ′ , H t ′ . . . , H T ′ ] H'=[H'_1.H'_2,H'_3,...,H'_{t-1},H'_{t}...,H'_T] H=[H1.H2,H3,...,Ht1,Ht...,HT]

拼接:
H H ′ = [ H 1 H 1 ′ , H 2 H 2 ′ , … , H t − 1 H t − 1 ′ , H t H ′ t , … H n H n ′ ] H H^{\prime}=\left[H_{1} H_{1}^{\prime}, H_{2} H_{2}^{\prime}, \ldots, H_{t-1} H_{t-1}^{\prime}, H_{t} H^{\prime}{ }_{t}, \ldots H_{n} H_{n}^{\prime}\right] HH=[H1H1,H2H2,,Ht1Ht1,HtHt,HnHn]

后接一个前馈网络:
O 1 = f ( V [ H 1 H 1 ′ ] ) O 2 = f ( V [ H 2 H 2 ′ ] ) O 3 = f ( V [ H 3 H 3 ′ ] ) . . . O t = f ( V [ H t H t ′ ) ] . . . \begin{aligned} O_1&=f(V[H_1H'_1])\\ O_2&=f(V[H_2H'_2])\\ O_3&=f(V[H_3H'_3])\\ ...\\ O_t&=f(V[H_tH'_t)]\\ ... \end{aligned} O1O2O3...Ot...=f(V[H1H1])=f(V[H2H2])=f(V[H3H3])=f(V[HtHt)]
输出:
O = [ O 1 . O 2 , O 3 , . . . , O t − 1 , O t . . . , O T ] O=[O_1.O_2,O_3,...,O_{t-1},O_{t}...,O_T] O=[O1.O2,O3,...,Ot1,Ot...,OT]

Pytorch实现:序列到类别

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, n_class):
        super(RNN, self).__init__()
        self.num_layers = num_layers
        self.hidden_size = hidden_size
        self.rnn = nn.RNN(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            batch_first=True,
            bidirectional=True,
            nonlinearity='relu'
        )
        self.linear = nn.Linear(2*hidden_size, n_class)

    def forward(self, x):
        h0 = torch.zeros(self.num_layers*2,  x.size(0), self.hidden_size).to(device)
        out, hn = self.rnn(x, h0)
        out = out[:, -1, :]  # = hn
        out = self.linear(out)
        return out

结果:
在这里插入图片描述

附:nn.RNN类

CLASS torch.nn.RNN(*args, **kwargs)

参数说明:

  • input_size:输入序列单元 x t x_t xt的维度。
  • hidden_size:隐藏层神经元个数,或者也叫输出的维度
  • num_layers:网络的层数,默认为1
  • nonlinearity:激活函数,默认为tanh
  • bias:是否使用偏置
  • batch_first:输入数据的形式,默认是 False,形式为(seq, batch, feature),也就是将序列长度放在第一位,batch 放在第二位。如果为True,则为(batch, seq, feature)
  • dropout:dropout率, 默认为0,即不使用。如若使用将其设置成一个0-1的数字即可。
  • birdirectional:是否使用两层的双向的 rnn,默认是 False

记:
N = batch size L =  sequence length  D = 2  if bidirectional=True otherwise  1 H in =  input_size  H o u t =  hidden_size  \begin{aligned} N &=\text{batch size} \\ L &=\text { sequence length } \\ D &=2 \text { if bidirectional=True otherwise } 1 \\ H_{\text {in}}&=\text{ input\_size } \\ H_{o u t} &=\text { hidden\_size } \end{aligned} NLDHinHout=batch size= sequence length =2 if bidirectional=True otherwise 1= input_size = hidden_size 
输入:input, h_0
(1)input(输入序列)

非批量: ( L , H i n ) (L,H_{in}) (L,Hin)

批量:如果batch_first=False ( L , N , H i n ) (L,N,H_{in}) L,N,Hin);如果batch_first=True ( N , L , H i n ) (N,L,H_{in}) N,L,Hin)

(2) h_0(每一层网络的初始状态。如果未提供,则默认为零。)
非批量: ( D × num_layers , H o u t ) (D\times \text{num\_layers},H_{out}) (D×num_layers,Hout)

批量: ( D × num_layers , N , H o u t ) (D\times \text{num\_layers},N,H_{out}) (D×num_layers,N,Hout)

输出,output,hn:
(1)output:
非批量: ( L , D × H o u t ) (L,D\times H_{out}) (L,D×Hout)

批量:如果batch_first=False ( L , N , H o u t ) (L,N,H_{out}) L,N,Hout);如果batch_first=True ( N , L , H o u t ) (N,L,H_{out}) N,L,Hout)

(2)h_n:状态
非批量: ( D × num_layers , H o u t ) (D\times \text{num\_layers},H_{out}) (D×num_layers,Hout)

批量: ( D × num_layers , N , H o u t ) (D\times \text{num\_layers},N,H_{out}) (D×num_layers,N,Hout)

权重和偏置初始化

从均匀分布 U ( − k , k ) U(-\sqrt{k},\sqrt{k}) U(k ,k )采样,其中 k = 1 h i d d e n _ s i z e k=\frac{1}{hidden\_size} k=hidden_size1

  • 1
    点赞
  • 14
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

二进制人工智能

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值