视频讲解
https://www.bilibili.com/video/BV1qM4y1M7Nv?p=4&spm_id_from=pageDriver
代码博客
https://blog.csdn.net/weixin_41744192/article/details/115270178
相关视频课程
https://www.bilibili.com/video/BV1CZ4y1w7mE?p=44&spm_id_from=pageDriver
循环神经网络
RNN
1、RNN具有短期记忆,不但可以接受其他神经元的信息,还可以接受自身的信息,形成具有环路的网络结构。
a.时间步:不同时刻,把输入展开,每个输入是在不同的时间步上的
b.下一个时间步上,输入不仅有当前时间步的输入,还有上一个时间步的输出。
c.具有短期记忆。把上一个输出作为下一个的输入。
2、RNN的不同结构
one-to-one:图像分类
one-to-many:图像转文字
many-to-one:文本分类
异步的many-to-many:翻译
同步的many-to-many:视频分类
缺点:当序列太长时,容易导致梯度消失,参数更新只能捕捉到局部依赖关系,没法再捕捉序列之间长期关联或者依赖关系。
LSTM
记忆细胞具有选择性记忆的功能,可以选择记忆重要信息,过滤掉噪声信息,减轻记忆负担。
LSTM的参数:
class torch.nn.LSTM(*args, **kwargs)
参数有:
input_size:x的特征维度
hidden_size:隐藏层的特征维度
num_layers:lstm隐层的层数,默认为1
bias:False则bihbih=0和bhhbhh=0. 默认为True
batch_first:True则输入输出的数据格式为 (batch, seq, feature)
dropout:除最后一层,每一层的输出都进行dropout,默认为: 0
bidirectional:True则为双向lstm默认为False
LSTM分类MNIST手写体数据集
import torch
from torch import nn
from torchvision import datasets,transforms
import torchvision.datasets as dsets
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
torch.manual_seed(1)
#Hyper
EPOCH = 2
BATCH_SIZE = 64
TIME_STEP = 28 #RNN时间步数/图片高度
INPUT_SIZE = 28 #RNN每步输入值/图片每行像素
LR = 0.01
DOWNLOAD_MNIST = False #是否下载数据集在这里更改
# Mnist 手写数字,手写数据已经下载好了
train_data = torchvision.datasets.MNIST(
root='./data/', # 保存或者提取位置
train=True, # this is training data
transform=torchvision.transforms.ToTensor(), # 转换 PIL.Image or numpy.ndarray 成
# torch.FloatTensor (C x H x W), 训练的时候 normalize 成 [0.0, 1.0] 区间
download=DOWNLOAD_MNIST, # 没下载就下载, 下载了就不用再下了
)
#批训练 50samples,1channel,20*28,(50,1,28,28)
train_loader = torch.utils.data.DataLoader(dataset = train_data, batch_size = BATCH_SIZE, shuffle=True)
test_data = dsets.MNIST('data',train = False)
# shape from (2000, 28, 28) to (2000, 1, 28, 28), value in range(0,1)
test_x = test_data.test_data.type(torch.FloatTensor)[:2000]/255.
test_y = test_data.test_labels.numpy()[:2000]
class RNN(nn.Module):
def __init__(self):
super(RNN,self).__init__()
self.rnn = nn.LSTM( # LSTM 效果要比 nn.RNN() 好多了
input_size = 28, # 图片每行的数据像素点
hidden_size = 64, # rnn hidden unit
num_layers = 1, # 有几层 RNN layers
batch_first = True, # input & output 会是以 batch size 为第一维度的特征集 e.g. (batch, time_step, input_size)
)
self.out = nn.Linear(64,10) #输出层
def forward(self, x):
# x shape (batch, time_step, input_size)
# r_out shape (batch, time_step, output_size)
# h_n shape (n_layers, batch, hidden_size) LSTM 有两个 hidden states, h_n 是分线, h_c 是主线
# h_c shape (n_layers, batch, hidden_size)
r_out,(h_n,h_c) = self.rnn(x,None) # None 表示 hidden state 会用全0的 state
# 选取最后一个时间点的 r_out 输出
# 这里 r_out[:, -1, :] 的值也是 h_n 的值
out = self.out(r_out[:, -1, :])
return out
rnn = RNN()
print(rnn)
optimizer = torch.optim.Adam(rnn.parameters(), lr = LR)
loss_func = nn.CrossEntropyLoss() #the target label is not one-hotted
for epoch in range(EPOCH):
for step, (b_x,b_y) in enumerate(train_loader):
b_x = b_x.view(-1,28,28) # reshape x to (batch, time_step, input_size)
output = rnn(b_x)
loss = loss_func(output, b_y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
if step % 50 == 0:
test_output = rnn(test_x) # (samples, time_step, input_size)
pred_y = torch.max(test_output, 1)[1].data.numpy()
accuracy = float((pred_y == test_y).astype(int).sum()) / float(test_y.size)
print('Epoch: ', epoch, '| train loss: %.4f' % loss.data.numpy(), '| test accuracy: %.2f' % accuracy)
LSTM作回归预测
import torch
from torch import nn
import numpy as np
import matplotlib.pyplot as plt
# torch.manual_seed(1) # reproducible
# Hyper Parameters
TIME_STEP = 10 # rnn time step
INPUT_SIZE = 1 # rnn input size
LR = 0.02 # learning rate
# show data
steps = np.linspace(0, np.pi*2, 100, dtype=np.float32) # float32 for converting torch FloatTensor
x_np = np.sin(steps)
y_np = np.cos(steps)
plt.plot(steps, y_np, 'r-', label='target (cos)')
plt.plot(steps, x_np, 'b-', label='input (sin)')
plt.legend(loc='best')
plt.show()
class RNN(nn.Module):
def __init__(self):
super(RNN, self).__init__()
self.rnn = nn.RNN(
input_size=INPUT_SIZE,
hidden_size=32, # rnn hidden unit
num_layers=1, # number of rnn layer
batch_first=True, # input & output will has batch size as 1s dimension. e.g. (batch, time_step, input_size)
)
self.out = nn.Linear(32, 1)
def forward(self, x, h_state):
# x (batch, time_step, input_size)
# h_state (n_layers, batch, hidden_size)
# r_out (batch, time_step, hidden_size)
r_out, h_state = self.rnn(x, h_state)
outs = [] # save all predictions
for time_step in range(r_out.size(1)): # calculate output for each time step
outs.append(self.out(r_out[:, time_step, :]))
return torch.stack(outs, dim=1), h_state
# instead, for simplicity, you can replace above codes by follows
# r_out = r_out.view(-1, 32)
# outs = self.out(r_out)
# outs = outs.view(-1, TIME_STEP, 1)
# return outs, h_state
# or even simpler, since nn.Linear can accept inputs of any dimension
# and returns outputs with same dimension except for the last
# outs = self.out(r_out)
# return outs
rnn = RNN()
print(rnn)
optimizer = torch.optim.Adam(rnn.parameters(), lr=LR) # optimize all cnn parameters
loss_func = nn.MSELoss()
h_state = None # for initial hidden state
plt.figure(1, figsize=(12, 5))
plt.ion() # continuously plot
for step in range(100):
start, end = step * np.pi, (step+1)*np.pi # time range
# use sin predicts cos
steps = np.linspace(start, end, TIME_STEP, dtype=np.float32, endpoint=False) # float32 for converting torch FloatTensor
x_np = np.sin(steps)
y_np = np.cos(steps)
x = torch.from_numpy(x_np[np.newaxis, :, np.newaxis]) # shape (batch, time_step, input_size)
y = torch.from_numpy(y_np[np.newaxis, :, np.newaxis])
prediction, h_state = rnn(x, h_state) # rnn output
# !! next step is important !!
h_state = h_state.data # repack the hidden state, break the connection from last iteration
loss = loss_func(prediction, y) # calculate loss
optimizer.zero_grad() # clear gradients for this training step
loss.backward() # backpropagation, compute gradients
optimizer.step() # apply gradients
# plotting
plt.xlim(-0.5,330)
plt.plot(steps, y_np.flatten(), 'r-')
plt.plot(steps, prediction.data.numpy().flatten(), 'b-')
plt.draw()
plt.ioff()
plt.show()
GRU是LSTM的变形
相比LSTM,使用GRU能够达到相当的效果,并且相比之下更容易进行训练,能够很大程度上提高训练效率,因此很多时候会更倾向于使用GRU。
两个输出:hidden-state,hidden-state
两个输入:x-t, hidden-state[t-1]
双向LSTM
不仅从前往后具有记忆功能,从后往前也要有记忆功能。
所以每个方向的LSTM都会有一个输出,最终的输出会有两部分,所以会有concat的操作。
PYTORCH中LSTM和GRU的API
1,单向LSTM的API
由torch.nn提供
torch.nn.LSTM(input_size,hidden_size,num_layer,batch_first,dropout,bidirectional)
1,即输入数据的形状,embedding_dim
2hidden_size:隐藏层单元,即每一层由多少个LSTM单元
3num_layer:LSTM单元的层数
4batch_first:默认值是false,输入数据需要[seq_len,batch,feature],否则batch提前。
5dropout:在最后一层进行dropout,若num_layer>1才有效果
6bidirection:是否双向,默认false
输出:output,(h_n,c_n)
output:(seq_len, batch, num_directions(单向1,双向2)hidden_size)
h_n:(num_layernum_directions,batch,hidden_size)
c_n:(num_layer*num_directions,batch,hidden_size)
output把每个时间步上的结果在seq_len这个维度上进行输出
h_n:把不同层的隐藏状态在第0个维度上进行拼接
2,双向LSTM的API
同LSTM一样,output,h_n = gru(input,h_0)
3,双向LSTM的API
1只需要把bidirectional = True
2output的拼接顺序,正向的第一个拼接 反向的最后一个,在最后一个维度进行拼接
3hidden_size:正向和反向各自的形状是[batch_size,hidden_size],双向时会在第0个维度拼接,[layers*num_direction,batch_size,hidden_size],即第一层正向,第一层反向,第二层正向,第二层反向
梯度消失梯度爆炸
梯度消失:反向传播时,当权重初始过小或者使用易饱和神经元(sigmoid,tanh),sigmoid在y=0,1处梯度接近0,而无法更新参数,导致神经网络在反向传播时呈现指数倍缩小,产生消失现象。
梯度爆炸:初始参数非常大时,神经网络在反向传播时也会指数倍放大,产生爆炸现象。
nn.BatchNorm1d:加速模型的训练,把参数进行规范化处理,让参数计算的梯度不会过小。
nn.dropout:增强模型的稳定性,解决过拟合,增强模型的泛化能力。理解为训练后的模型是多个模型组合之后的结果。