循环神经网络基础

甜甜雨

已于 2024-03-10 22:18:54 修改

阅读量851

点赞数 24

文章标签： rnn 人工智能深度学习

于 2024-03-10 11:28:48 首次发布

本文链接：https://blog.csdn.net/m0_44974999/article/details/136588577

版权

RNN

1、什么是RNNs？

x1、x2....是特征，h0是先验（可以是经过cnn得到的h0）

输出h1 输出h2 输出h3

| | |

h0--->RNN Cell--->RNN Cell--->RNN Cell--->.......

| | |

输入x1 输入x2 输入x3

h2中不仅要包含x2信息，还要包含x1的信息，因此把h1送入第二次RNN Cell中

（注意是同一个RNN Cell，一直反复循环----细胞内部为线性运算）

本时刻的wx+b，与上一时刻的wh+b相加后，做tanh激活（tanh结果在-1~1之间）

------得出这一层的输出ht

# RNN细胞
cell = torch.nn.RNNCell(input_size=input_size,hidden_size=hidden_size)

hidden = cell(input,hidden)

#输入
input 0f shape(batch,input_size)    #n,x
hidden of shape(batch,hidden_size)  # n,h

#输出 hidden
hidden of shape(batch,hidden_size)  #n,h

2、怎么用RNNCell

例子：

batchSize=1；------------每次循环用几个样本--------------每次一个样本

seqLen=3；---------------样本中的序列数 -------------------每个样本3天的天气数据

inputSize=4；------------每个序列的特征数------------------每天的数据有几个特征

hiddenSize=2------------每一个hidden都是有两个元素的二维向量-------用2维的隐藏状态来学习预测

1）输入维度：batchsize x inputsize

输出维度：batchsize x hiddensize

序列维度：seqLen x batchSize x inputSize

import torch

# 每次处理一个样本，每个样本有3组数据，每组数据有4个特征
batch_size = 1
seq_len = 3
input_size = 4
hidden_size = 2


#构造RNNCell
cell = torch.nn.RNNCell(input_size=input_size,hidden_size=hidden_size)


dataset = torch.randn(seq_len,batch_size,input_size)   #序列数据dataset
hidden = torch.zeros(batch_size,hidden_size)  #初始hidden设置为全0


# 循环
for idx,input in enumerate(dataset):
    print('=' * 20,idx,'=' * 20)
    print('input size:',input.shape)

    hidden = cell(input,hidden)    #这一次hidden=这次输入+上次输出的隐层

    print('outputs size:',hidden.shape)
    print(hidden)

3、怎么用RNN

num_layers:RNN有多少层

out指的是过程中的h1.......hn

hidden指的是最终的hn

inputs指的是 x1........xn

cell = torch.nn.RNN(input_size=input_size,hidden_size=hidden_size,num_layers=numm_layers)

out = cell(inputs,hidden)
hidden = cell(inputs,hidden)

用RNN的话，自动循环，只需要把整个的inputs输进去，再给一个h0，就得到最终的hn

batch--------一次弄几组（几个序列）

seqsize------每个序列中有多少个样本

inputsize-----每个样本有多少个特征

1）（输入）输入维度：seqsize x batch x inputsize

（输入）隐层维度：numlayers x batch x hiddensize

（输出）输出维度：seqsize x batch x hiddensize

（输出）隐层维度：numlayers x batch x hiddensize

numlayers--多层rnn

图中紫色为输入，绿色为输出

RNN不用写循环，直接进行调用

import torch

batch_size = 1
seq_len = 3
input_size = 4
hidden_size = 2
num_layers = 1

cell = torch.nn.RNN(input_size=input_size,hidden_size=hidden_size,num_layers=num_layers)


# 输入
inputs = torch.randn(seq_len,batch_size,input_size)
hidden = torch.zeros(num_layers,batch_size,hidden_size)

#输出
out,hidden = cell(inputs,hidden)

print('output size:',out.shape)
print('output:',out)
print('hidden size:',hidden.shape)
print('hidden:',hidden)

4、例子

hello---->ohlol

1)用RNNCell

①字符向量化

根据输入的hello，构造个词典，得到索引

character	index
e	0
h	1
l	2
o	3

根据字典，得到hello的索引为10223

再将其转变为向量

	e	h	l	o
h(1)	0	1	0	0
e(0)	1	0	0	0
l(2)	0	0	1	0
l(2)	0	0	1	0
o(3)	0	0	0	1

得到绿色的向量--------one-hot独热向量。将独热向量作为输入送入网络，inputsize=4

②

其实演变成分类问题，输出的字符属于字典中哪一类，因此输出的是4维的

输出o 输出h 输出l 输出o 输出l

| | | | |

h0--->RNN Cell--->RNN Cell--->RNN Cell--->RNN Cell--->RNN Cell--->hn

| | | | |

输入h 输入e 输入l 输入l 输入o

[0 [1 [0 [0 [0

1 0 0 0 0

0 0 1 1 0

0] 0] 0] 0] 1]

③整个过程

绿色的是交叉熵损失

xt---->RNN Cell----->softmax----->算出概率p--->noolLoss<-------one hot<----yt

| |

h(t-1) loss

④代码

import torch

# hello---->ohlol

input_size = 4     #e h o l 一共四个字符，输入维度为4，比如输入h：0，1，0，0
hidden_size = 4
batch_size = 1


#1、准备数据

#构造字典，让字符变成向量
idx2char = ['e','h','l','o']  #对应的序号 0，1，2，3
x_data = [1,0,2,2,3]    #输入的hello对应字典，为10223
y_data = [3,1,2,3,2]    #输出的ohlol对应字典，为31232

#构造简单的独热向量，对应的是 e   h   l    o
one_hot_lookup = [[1,0,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1]]
# 输入的x_data为10223，因此拿的独热向量就为[0,1,0,0],[1,0,0,0],[0,0,1,0],[0,0,1,0],[0,0,0,1]
x_one_hot = [one_hot_lookup[x] for x in x_data]


inputs = torch.Tensor(x_one_hot).view(-1,batch_size,input_size)
labels = torch.LongTensor(y_data).view(-1,1)






#2、构造模型
class Model(torch.nn.Module):
    def __init__(self, input_size, hidden_size,batch_size):
        super(Model, self).__init__()
        self.hidden_size = hidden_size
        self.batch_size = batch_size
        self.input_size = input_size
        self.rnncell = torch.nn.RNNCell(input_size=self.input_size,
                                        hidden_size=self.hidden_size)
    #执行：前馈
    def forward(self,input,hidden):
        #rnncell，把输入和隐层转化为下一个隐层  ht = cell(xt,ht-1)
        hidden = self.rnncell(input, hidden)
        return hidden
    # 工具：做一个初始的隐层--全0
    def init_hidden(self):
        return torch.zeros(self.batch_size, self.hidden_size)

net = Model(input_size, hidden_size, batch_size)




#3、损失函数和优化器        交叉熵损失和adam优化器
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(),lr=0.1)




# 4、训练模型
for epoch in range(20):
    loss = 0
    #每一轮的优化器归0
    optimizer.zero_grad()
    #①每一轮的第一步，算h0
    hidden = net.init_hidden()
    print('predicted string:',end='')
    #②循环，对inputs遍历。这样就按序列来进行输入了 输入维度：seq x batchsize x input
    for input,label in zip(inputs,labels):
        #net模型名，输入到模型中得出最新的h
        hidden = net(input, hidden)
        #这里损失为加法，第一个序列的损失+第二个序列的损失+.....
        loss += criterion(hidden, label)
        #找hidden的最大值
        _,idx = hidden.max(dim=1)
        #输出一下这一轮模型认为最大可能是的字符串
        print(idx2char[idx.item()],end=' ')
    #循环结束，这一轮跑完，进行反馈优化
    loss.backward()
    #优化器归0
    optimizer.step()
    print(',Epoch [%d/15] loss=%.4f' %(epoch+1,loss.item()))

2）用RNN

import torch

# hello---->ohlol

input_size = 4     #e h o l 一共四个字符，输入维度为4，比如输入h：0，1，0，0
hidden_size = 4
batch_size = 1
num_layers = 1
seq_len = 5


#1、准备数据

#构造字典，让字符变成向量
idx2char = ['e','h','l','o']  #对应的序号 0，1，2，3
x_data = [1,0,2,2,3]    #输入的hello对应字典，为10223
y_data = [3,1,2,3,2]    #输出的ohlol对应字典，为31232

#构造简单的独热向量，对应的是 e   h   l    o
one_hot_lookup = [[1,0,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1]]
# 输入的x_data为10223，因此拿的独热向量就为[0,1,0,0],[1,0,0,0],[0,0,1,0],[0,0,1,0],[0,0,0,1]
x_one_hot = [one_hot_lookup[x] for x in x_data]


inputs = torch.Tensor(x_one_hot).view(seq_len,batch_size,input_size)
labels = torch.LongTensor(y_data)





# 2、构造模型
class Model(torch.nn.Module):
    def __init__(self,input_size,hidden_size,batch_size,num_layers):
        super(Model, self).__init__()
        self.num_layers = num_layers
        self.batch_size = batch_size
        self.hidden_size = hidden_size
        self.input_size = input_size
        self.rnn = torch.nn.RNN(input_size=input_size,
                                hidden_size=hidden_size,
                                num_layers=num_layers)
    def forward(self,input):
        hidden = torch.zeros(self.num_layers, self.batch_size, self.hidden_size)
        out,_ = self.rnn(input, hidden)
        return out.view(-1,self.hidden_size)


net = Model(input_size,hidden_size,batch_size,num_layers)



# 3、损失函数和优化器
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(),lr=0.05)


# 4、训练模型
for epoch in range(15):
    optimizer.zero_grad()
    outputs = net(inputs)
    loss = criterion(outputs, labels)
    loss.backward()
    optimizer.step()
    _,idx = outputs.max(dim=1)
    idx = idx.data.numpy()
    print('Predicted: ',''.join([idx2char[x] for x in idx]),end='')
    print(',Epoch [%d/15] loss = %.3f' % (epoch +1,loss.item()))

5、embedding

①映射的维度太高

②矩阵稀疏

③硬编码，并不是学习出来的

由此引入embedding------高维稀疏样本，映射到稠密低维

6、例子：名字分类器

18个国家的名字分类

输入名字，得到国家

1）准备数据

输入的名字是字符串，要转化为一个个字符
做词典：用ASCLL码表做24个字母的词典

另外，这些序列长短不一，还需要做padding

18个国家，也要做成词典，索引0-17

import csv
import gzip
import time
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader, BatchSampler



# 1、准备数据
#设置参数
HIDDEN_SIZE = 100
BATCH_SIZE = 256   #每批256个名字
N_LAYER = 2        #2层GRU
N_EPOCHS = 100     #训练100轮
N_CHARS = 128      #输入，字符集的长度为128
USE_GPU = False


#①输入的名字是字符串，要转化为一个个字符
#②做词典：用ASCLL码表做24个字母的词典
class NameDataset(Dataset):
    def __init__(self,is_train_set=True):
        #是否要训练集，是的话从路径1取数据，否的话从路径2取数据
        filename = 'E:\A-\pythonProject\RNN/names_train.csv.gz' if is_train_set else 'E:\A-\pythonProject\RNN/names_test.csv.gz'
        with gzip.open(filename,'rt') as f:
            reader = csv.reader(f)
            rows = list(reader)    #数据集的所有行都读出来
        # 把每一行的第0个元素拿出来，放入names列表
        self.names = [row[0] for row in rows]
        self.len = len(self.names)  #记录names长度
        # 把每一行第1个元素拿出来，放入countries列表
        self.countries = [row[1] for row in rows]
        #为了把国家做成字典：①set把列表变为集合，去除重复元素。②sorted：排序 ③list：再次变为一个列表
        self.country_list = list(sorted(set(self.countries)))
        #调用getCountryDict()，把国家做成词典
        self.country_dict = self.getCountryDict()
        self.country_num = len(self.country_list)

    #根据名字获取对应国家的国家字典索引
    def __getitem__(self,index):
        #返回 index对应的 名字列表中的名字，以及该国家在国家字典对应的索引（根据国家列表中的国家，去国家字典中查到国家字典索引）
        return self.names[index],self.country_dict[self.countries[index]]
    #返回数据集长度
    def __len__(self):
        return self.len

    #构建国家字典的方法
    def getCountryDict(self):
        country_dict = dict()  #做个空字典
        for idx,country_name in enumerate(self.country_list,0): #对country list遍历，拿到countryname就做索引0.1.2.3.。。
            country_dict[country_name] = idx
        return country_dict

    #根据索引，返回国家名
    def idx2country(self,index):
        return self.country_list[index]

    #此方法可知道到底有多少个国家
    def getCountryNum(self):
        return self.country_num

#③构建训练集和测试集
trainset = NameDataset(is_train_set=True)
trainloader = DataLoader(trainset,batch_size=BATCH_SIZE,shuffle=True)
testset = NameDataset(is_train_set=False)
testloader = DataLoader(testset,batch_size=BATCH_SIZE,shuffle=False)
#调用以上定义的方法，获取国家总数----其实就决定着模型最终输出的维度
N_COUNTRY = trainset.getCountryNum()