循环神经网络_bihbih-CSDN博客

本文链接：https://blog.csdn.net/s14373597/article/details/120604946

视频讲解
https://www.bilibili.com/video/BV1qM4y1M7Nv?p=4&spm_id_from=pageDriver
代码博客
https://blog.csdn.net/weixin_41744192/article/details/115270178
相关视频课程
https://www.bilibili.com/video/BV1CZ4y1w7mE?p=44&spm_id_from=pageDriver
在这里插入图片描述

循环神经网络

RNN

1、RNN具有短期记忆，不但可以接受其他神经元的信息，还可以接受自身的信息，形成具有环路的网络结构。
在这里插入图片描述
a.时间步：不同时刻，把输入展开，每个输入是在不同的时间步上的
b.下一个时间步上，输入不仅有当前时间步的输入，还有上一个时间步的输出。
c.具有短期记忆。把上一个输出作为下一个的输入。

2、RNN的不同结构
one-to-one：图像分类
one-to-many：图像转文字
many-to-one：文本分类
异步的many-to-many：翻译
同步的many-to-many：视频分类
在这里插入图片描述

缺点：当序列太长时，容易导致梯度消失，参数更新只能捕捉到局部依赖关系，没法再捕捉序列之间长期关联或者依赖关系。

LSTM

记忆细胞具有选择性记忆的功能，可以选择记忆重要信息，过滤掉噪声信息，减轻记忆负担。
在这里插入图片描述

LSTM的参数：

class torch.nn.LSTM(*args, **kwargs)
参数有：
    input_size：x的特征维度
    hidden_size：隐藏层的特征维度
    num_layers：lstm隐层的层数，默认为1
    bias：False则bihbih=0和bhhbhh=0. 默认为True
    batch_first：True则输入输出的数据格式为 (batch, seq, feature)
    dropout：除最后一层，每一层的输出都进行dropout，默认为: 0
    bidirectional：True则为双向lstm默认为False

LSTM分类MNIST手写体数据集

import torch
from torch import nn
from torchvision import datasets,transforms
import torchvision.datasets as dsets
import torchvision.transforms as transforms
import matplotlib.pyplot as plt


torch.manual_seed(1)

#Hyper
EPOCH = 2
BATCH_SIZE = 64
TIME_STEP = 28 #RNN时间步数/图片高度
INPUT_SIZE = 28 #RNN每步输入值/图片每行像素
LR = 0.01
DOWNLOAD_MNIST = False #是否下载数据集在这里更改

# Mnist 手写数字，手写数据已经下载好了
train_data = torchvision.datasets.MNIST(
    root='./data/',    # 保存或者提取位置
    train=True,  # this is training data
    transform=torchvision.transforms.ToTensor(),    # 转换 PIL.Image or numpy.ndarray 成
                                                    # torch.FloatTensor (C x H x W), 训练的时候 normalize 成 [0.0, 1.0] 区间
    download=DOWNLOAD_MNIST,          # 没下载就下载, 下载了就不用再下了
)


#批训练 50samples，1channel，20*28，(50,1,28,28)
train_loader = torch.utils.data.DataLoader(dataset = train_data, batch_size = BATCH_SIZE, shuffle=True)

test_data = dsets.MNIST('data',train = False)
 # shape from (2000, 28, 28) to (2000, 1, 28, 28), value in range(0,1)
test_x = test_data.test_data.type(torch.FloatTensor)[:2000]/255.
test_y = test_data.test_labels.numpy()[:2000]


class RNN(nn.Module):
    def __init__(self):
        super(RNN,self).__init__()
        
        self.rnn = nn.LSTM(          # LSTM 效果要比 nn.RNN() 好多了
                input_size = 28,      # 图片每行的数据像素点
                hidden_size = 64,     # rnn hidden unit
                num_layers = 1,        # 有几层 RNN layers
                batch_first = True,   # input & output 会是以 batch size 为第一维度的特征集 e.g. (batch, time_step, input_size)
            )
        
        self.out = nn.Linear(64,10)   #输出层
        
    def forward(self, x):
        # x shape (batch, time_step, input_size)
        # r_out shape (batch, time_step, output_size)
        # h_n shape (n_layers, batch, hidden_size)   LSTM 有两个 hidden states, h_n 是分线, h_c 是主线
        # h_c shape (n_layers, batch, hidden_size)
        r_out,(h_n,h_c) = self.rnn(x,None) # None 表示 hidden state 会用全0的 state
        
        # 选取最后一个时间点的 r_out 输出
        # 这里 r_out[:, -1, :] 的值也是 h_n 的值
        out = self.out(r_out[:, -1, :])
        return out
    
rnn = RNN()
print(rnn)

optimizer = torch.optim.Adam(rnn.parameters(), lr = LR)
loss_func = nn.CrossEntropyLoss() #the target label is not one-hotted
for epoch in range(EPOCH):
    for step, (b_x,b_y) in enumerate(train_loader):
        b_x = b_x.view(-1,28,28)  # reshape x to (batch, time_step, input_size)
        output = rnn(b_x)
        loss = loss_func(output, b_y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        if step % 50 == 0:
            test_output = rnn(test_x)                   # (samples, time_step, input_size)
            pred_y = torch.max(test_output, 1)[1].data.numpy()
            accuracy = float((pred_y == test_y).astype(int).sum()) / float(test_y.size)
            print('Epoch: ', epoch, '| train loss: %.4f' % loss.data.numpy(), '| test accuracy: %.2f' % accuracy)

LSTM作回归预测

import torch
from torch import nn
import numpy as np
import matplotlib.pyplot as plt

# torch.manual_seed(1)    # reproducible

# Hyper Parameters
TIME_STEP = 10      # rnn time step
INPUT_SIZE = 1      # rnn input size
LR = 0.02           # learning rate

# show data
steps = np.linspace(0, np.pi*2, 100, dtype=np.float32)  # float32 for converting torch FloatTensor
x_np = np.sin(steps)
y_np = np.cos(steps)
plt.plot(steps, y_np, 'r-', label='target (cos)')
plt.plot(steps, x_np, 'b-', label='input (sin)')
plt.legend(loc='best')
plt.show()


class RNN(nn.Module):
    def __init__(self):
        super(RNN, self).__init__()

        self.rnn = nn.RNN(
            input_size=INPUT_SIZE,
            hidden_size=32,     # rnn hidden unit
            num_layers=1,       # number of rnn layer
            batch_first=True,   # input & output will has batch size as 1s dimension. e.g. (batch, time_step, input_size)
        )
        self.out = nn.Linear(32, 1)

    def forward(self, x, h_state):
        # x (batch, time_step, input_size)
        # h_state (n_layers, batch, hidden_size)
        # r_out (batch, time_step, hidden_size)
        r_out, h_state = self.rnn(x, h_state)

        outs = []    # save all predictions
        for time_step in range(r_out.size(1)):    # calculate output for each time step
            outs.append(self.out(r_out[:, time_step, :]))
        return torch.stack(outs, dim=1), h_state

        # instead, for simplicity, you can replace above codes by follows
        # r_out = r_out.view(-1, 32)
        # outs = self.out(r_out)
        # outs = outs.view(-1, TIME_STEP, 1)
        # return outs, h_state
        
        # or even simpler, since nn.Linear can accept inputs of any dimension 
        # and returns outputs with same dimension except for the last
        # outs = self.out(r_out)
        # return outs

rnn = RNN()
print(rnn)

optimizer = torch.optim.Adam(rnn.parameters(), lr=LR)   # optimize all cnn parameters
loss_func = nn.MSELoss()

h_state = None      # for initial hidden state

plt.figure(1, figsize=(12, 5))
plt.ion()           # continuously plot

for step in range(100):
    start, end = step * np.pi, (step+1)*np.pi   # time range
    # use sin predicts cos
    steps = np.linspace(start, end, TIME_STEP, dtype=np.float32, endpoint=False)  # float32 for converting torch FloatTensor
    x_np = np.sin(steps)
    y_np = np.cos(steps)

    x = torch.from_numpy(x_np[np.newaxis, :, np.newaxis])    # shape (batch, time_step, input_size)
    y = torch.from_numpy(y_np[np.newaxis, :, np.newaxis])

    prediction, h_state = rnn(x, h_state)   # rnn output
    # !! next step is important !!
    h_state = h_state.data        # repack the hidden state, break the connection from last iteration

    loss = loss_func(prediction, y)         # calculate loss
    optimizer.zero_grad()                   # clear gradients for this training step
    loss.backward()                         # backpropagation, compute gradients
    optimizer.step()                        # apply gradients

    # plotting
    plt.xlim(-0.5,330)
    plt.plot(steps, y_np.flatten(), 'r-')
    plt.plot(steps, prediction.data.numpy().flatten(), 'b-')
    plt.draw()

plt.ioff()
plt.show()

GRU是LSTM的变形

相比LSTM，使用GRU能够达到相当的效果，并且相比之下更容易进行训练，能够很大程度上提高训练效率，因此很多时候会更倾向于使用GRU。

两个输出：hidden-state,hidden-state
两个输入：x-t, hidden-state[t-1]
在这里插入图片描述

双向LSTM

不仅从前往后具有记忆功能，从后往前也要有记忆功能。
所以每个方向的LSTM都会有一个输出，最终的输出会有两部分，所以会有concat的操作。
在这里插入图片描述

PYTORCH中LSTM和GRU的API

1，单向LSTM的API
由torch.nn提供
torch.nn.LSTM(input_size,hidden_size,num_layer,batch_first,dropout,bidirectional)
1,即输入数据的形状，embedding_dim
2hidden_size:隐藏层单元，即每一层由多少个LSTM单元
3num_layer:LSTM单元的层数
4batch_first:默认值是false，输入数据需要[seq_len,batch,feature],否则batch提前。
5dropout：在最后一层进行dropout，若num_layer>1才有效果
6bidirection：是否双向，默认false

输出：output，（h_n,c_n)
output:(seq_len, batch, num_directions(单向1，双向2）hidden_size)
h_n:(num_layernum_directions,batch,hidden_size)
c_n:(num_layer*num_directions,batch,hidden_size)

output把每个时间步上的结果在seq_len这个维度上进行输出
h_n：把不同层的隐藏状态在第0个维度上进行拼接

2，双向LSTM的API
同LSTM一样，output，h_n = gru(input,h_0)
3，双向LSTM的API
1只需要把bidirectional = True
2output的拼接顺序，正向的第一个拼接反向的最后一个，在最后一个维度进行拼接
3hidden_size:正向和反向各自的形状是[batch_size,hidden_size],双向时会在第0个维度拼接，[layers*num_direction,batch_size,hidden_size],即第一层正向，第一层反向，第二层正向，第二层反向

梯度消失梯度爆炸
梯度消失：反向传播时，当权重初始过小或者使用易饱和神经元（sigmoid，tanh），sigmoid在y=0，1处梯度接近0，而无法更新参数，导致神经网络在反向传播时呈现指数倍缩小，产生消失现象。
梯度爆炸：初始参数非常大时，神经网络在反向传播时也会指数倍放大，产生爆炸现象。
在这里插入图片描述
nn.BatchNorm1d:加速模型的训练，把参数进行规范化处理，让参数计算的梯度不会过小。
nn.dropout:增强模型的稳定性，解决过拟合，增强模型的泛化能力。理解为训练后的模型是多个模型组合之后的结果。