NNDL 作业9 RNN-SRN简单循环神经网络 [HBU]

最新推荐文章于 2024-07-25 22:26:10 发布

洛杉矶县牛肉板面

最新推荐文章于 2024-07-25 22:26:10 发布

阅读量256

点赞数 4

分类专栏：深度学习文章标签： rnn 人工智能深度学习

本文链接：https://blog.csdn.net/weixin_63010525/article/details/134714870

版权

深度学习专栏收录该内容

27 篇文章 0 订阅

订阅专栏

4.简单总结nn.RNNCell、nn.RNN

nn.RNNCell

nn.RNN

5.谈一谈对“序列”、“序列到序列”的理解

6.总结本周理论课和作业，写心得体会

作业布置来自河北大学-魏老师博客：【23-24 秋学期】NNDL 作业9 RNN - SRN-CSDN博客

知识预备

简单循环网络（Simple Recurrent Network，SRN）只有一个隐藏层的神经网络．下图是关于简单循环网络的介绍，来自邱锡鹏《神经网络与深度学习》：

循环神经网络的参数与基本结构：

1. 实现SRN

（1）使用Numpy

根据图示的简单循环神经网络结构，实现流程与代码如下：

import numpy as np

#初始化权值，共有8个w权值
w1,w2,w3,w4,w5,w6,w7,w8 = 1.,1.,1.,1.,1.,1.,1.,1.
#u是隐藏层至存储器连接线上的权重值
u1,u2,u3,u4 = 1.,1.,1.,1.

#初始化输入序列
inputs_seq = np.array(
    [
        [1,1],
        [1,1],
        [2,2]
    ],dtype=float
)
print('inputs_seq: ','\n',inputs_seq)

#初始化存储器
state_t = np.zeros(2)
print('state_t is :',state_t)
print('---------------------------')
for i in inputs_seq:
    print('=========')
    print('inputs is:',i)
    print('state_t is:',state_t)

    in_h1 = np.dot([w1,w3],i) + np.dot([u2,u4],state_t)
    in_h2 = np.dot([w2,w4],i) + np.dot([u1,u3],state_t)

    state_t = in_h1,in_h2 #更新存储器，为隐藏层的值

    out_y1 = np.dot([w5,w7],[in_h1,in_h2])
    out_y2 = np.dot([w6,w8],[in_h1,in_h2])

    print('outputs_y1,y2 is:',out_y1,out_y2)

运行结果为：

（2）在1的基础上，增加激活函数tanh

在第一问的基础上，将in_h1,in_2的值再激活一次，存入到存储器(延时单位)中：

import numpy as np

#初始化权值，共有8个w权值
w1,w2,w3,w4,w5,w6,w7,w8 = 1.,1.,1.,1.,1.,1.,1.,1.
#u是隐藏层至存储器连接线上的权重值
u1,u2,u3,u4 = 1.,1.,1.,1.

#初始化输入序列
inputs_seq = np.array(
    [
        [1,1],
        [1,1],
        [2,2]
    ],dtype=float
)
print('inputs_seq: ','\n',inputs_seq)

#初始化存储器
state_t = np.zeros(2)
print('state_t is :',state_t)
print('---------------------------')
for i in inputs_seq:
    print('=========')
    print('inputs is:',i)
    print('state_t is:',state_t)

    # in_h1 = np.dot([w1,w3],i) + np.dot([u2,u4],state_t)
    # in_h2 = np.dot([w2,w4],i) + np.dot([u1,u3],state_t)

    #添加激活函数tanh
    in_h1 = np.tanh(np.dot([w1,w3],i) + np.dot([u2,u4],state_t))
    in_h2 = np.tanh(np.dot([w2,w4],i) + np.dot([u1,u3],state_t))

    state_t = in_h1,in_h2 #更新存储器，为隐藏层的值

    out_y1 = np.dot([w5,w7],[in_h1,in_h2])
    out_y2 = np.dot([w6,w8],[in_h1,in_h2])

    print('outputs_y1,y2 is:',out_y1,out_y2)

运行结果：

（3）使用nn.RNNCell实现

>关于RNNCell和RNN的用法详细看这篇博客：

【PyTorch学习笔记】21：nn.RNN和nn.RNNCell的使用-CSDN博客

看不明白的可以去B站找课听，我听了一遍课程视频大概就懂了！

代码：

import torch

batch_size = 1 #批次大小
seq_len = 3 #序列长度
input_size = 2 #输入序列维度
hidden_size = 2 #隐藏层维度
output_size = 2 #输出层维度

#创建一个RNNCell，这是一个RNN单元，接收输入和隐藏层，并输出新的隐藏状态。
#未指定output_size的值，因为RNNCell的输出大小默认为隐藏层大小
cell = torch.nn.RNNCell(input_size=input_size , hidden_size=hidden_size)

#初始化RNNCell参数
for name,param in cell.named_parameters():
    if name.startswith('weight'):
        #权重初始化为1
        torch.nn.init.ones_(param)
    else:
        #偏置初始化为0
        torch.nn.init.zeros_(param)

#线性层(全连接层) 这个全连接层将RNN的隐藏状态映射到输出空间
liner = torch.nn.Linear(hidden_size,output_size)
liner.weight.data = torch.Tensor( [[1,1],[1,1]] ) #权重项设置为1
liner.bias.data = torch.Tensor([0.0])  #偏置项设置为0

#创建输入序列seq
seq = torch.Tensor([
    [[1,1]],
    [[1,1]],
    [[2,2]]
])
#print(seq)

#初始化隐藏状态和输出状态
hidden = torch.zeros(batch_size,hidden_size)
output = torch.zeros(batch_size,output_size)

#查看一下hidden和output的状态
print('hidden:',hidden,'hidden.shape:',hidden.shape)
print('output:',output,'output.shape:',output.shape)
print('-' * 30) #分割线

#在序列上循环，使用RNNCell进行前向传播
#将输入和当前隐藏状态传递给RNNCell，得到新的隐藏状态
for idx,input in enumerate(seq):
    print('='*10 , idx, '='*10) #分割线

    print('Input :',input)
    print('hidden :',hidden)

    hidden = cell(input,hidden)
    print('cell_hidden is:',hidden)

    output = liner(hidden) #将hidden放入到线性层中进行前向传播
    print('output is :',output)

结果为：

（4）使用nn.RNN实现

代码为：

import torch

batch_size = 1 #批次大小
seq_len = 3 #序列长度
input_size = 2 #输入序列维度
hidden_size = 2 #隐藏层维度
output_size = 2 #输出层维度
num_layers = 1 #RNN层数

#创建RNN单元，指定输入 隐藏层大小和层数
cell = torch.nn.RNN(input_size=input_size,hidden_size=hidden_size
                    ,num_layers=num_layers)

#初始化RNN参数
for name,param in cell.named_parameters():
    if name.startswith('weight'):
        #权重初始化为1
        torch.nn.init.ones_(param)
    else:
        #偏置初始化为0
        torch.nn.init.zeros_(param)

#线性层(全连接层)
liner = torch.nn.Linear(hidden_size,output_size)
liner.weight.data = torch.Tensor( [[1,1],[1,1]] ) #权重项设置为1
liner.bias.data = torch.Tensor([0.0])  #偏置项设置为0

#创建输入序列seq
input_seq = torch.Tensor([
    [[1,1]],
    [[1,1]],
    [[2,2]]
])
#print(input_seq)

#在前向传播之前，需要初始化隐藏状态和输出状态
hidden = torch.zeros(num_layers,batch_size,hidden_size)
output,hidden = cell(input_seq,hidden)
#查看一下hidden和output的状态
print('hidden:',hidden,'hidden.shape:',hidden.shape)
print('output:',output,'output.shape:',output.shape)
print('-' * 30) #分割线

print('Input :',input_seq[0])
print('hidden:',0,0)
print('Output:',liner(output[0]))
print('-' * 20)

print('Input :',input_seq[1])
print('hidden:',output[0])
print('Output:',liner(output[1]))
print('-' * 20)

print('Input :',input_seq[2])
print('hidden:',output[1])
print('Output:',liner(output[2]))

结果为:

2. 实现“序列到序列”

观看视频，学习RNN原理，并实现视频P12中的教学案例

12.循环神经网络（基础篇）_哔哩哔哩_bilibili

* 观看视频的笔记和总结我写到了最后

3. “编码器-解码器”的简单实现

# code by Tae Hwan Jung(Jeff Jung) @graykode, modify by wmathor
import torch
import numpy as np
import torch.nn as nn
import torch.utils.data as Data

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# S: Symbol that shows starting of decoding input
# E: Symbol that shows starting of decoding output
# ?: Symbol that will fill in blank sequence if current batch data size is short than n_step

letter = [c for c in 'SE?abcdefghijklmnopqrstuvwxyz']
letter2idx = {n: i for i, n in enumerate(letter)}

seq_data = [['man', 'women'], ['black', 'white'], ['king', 'queen'], ['girl', 'boy'], ['up', 'down'], ['high', 'low']]

# Seq2Seq Parameter
n_step = max([max(len(i), len(j)) for i, j in seq_data])  # max_len(=5)
n_hidden = 128
n_class = len(letter2idx)  # classfication problem
batch_size = 3


def make_data(seq_data):
    enc_input_all, dec_input_all, dec_output_all = [], [], []

    for seq in seq_data:
        for i in range(2):
            seq[i] = seq[i] + '?' * (n_step - len(seq[i]))  # 'man??', 'women'

        enc_input = [letter2idx[n] for n in (seq[0] + 'E')]  # ['m', 'a', 'n', '?', '?', 'E']
        dec_input = [letter2idx[n] for n in ('S' + seq[1])]  # ['S', 'w', 'o', 'm', 'e', 'n']
        dec_output = [letter2idx[n] for n in (seq[1] + 'E')]  # ['w', 'o', 'm', 'e', 'n', 'E']

        enc_input_all.append(np.eye(n_class)[enc_input])
        dec_input_all.append(np.eye(n_class)[dec_input])
        dec_output_all.append(dec_output)  # not one-hot

    # make tensor
    return torch.Tensor(enc_input_all), torch.Tensor(dec_input_all), torch.LongTensor(dec_output_all)


'''
enc_input_all: [6, n_step+1 (because of 'E'), n_class]
dec_input_all: [6, n_step+1 (because of 'S'), n_class]
dec_output_all: [6, n_step+1 (because of 'E')]
'''
enc_input_all, dec_input_all, dec_output_all = make_data(seq_data)


class TranslateDataSet(Data.Dataset):
    def __init__(self, enc_input_all, dec_input_all, dec_output_all):
        self.enc_input_all = enc_input_all
        self.dec_input_all = dec_input_all
        self.dec_output_all = dec_output_all

    def __len__(self):  # return dataset size
        return len(self.enc_input_all)

    def __getitem__(self, idx):
        return self.enc_input_all[idx], self.dec_input_all[idx], self.dec_output_all[idx]


loader = Data.DataLoader(TranslateDataSet(enc_input_all, dec_input_all, dec_output_all), batch_size, True)


# Model
class Seq2Seq(nn.Module):
    def __init__(self):
        super(Seq2Seq, self).__init__()
        self.encoder = nn.RNN(input_size=n_class, hidden_size=n_hidden, dropout=0.5)  # encoder
        self.decoder = nn.RNN(input_size=n_class, hidden_size=n_hidden, dropout=0.5)  # decoder
        self.fc = nn.Linear(n_hidden, n_class)

    def forward(self, enc_input, enc_hidden, dec_input):
        # enc_input(=input_batch): [batch_size, n_step+1, n_class]
        # dec_inpu(=output_batch): [batch_size, n_step+1, n_class]
        enc_input = enc_input.transpose(0, 1)  # enc_input: [n_step+1, batch_size, n_class]
        dec_input = dec_input.transpose(0, 1)  # dec_input: [n_step+1, batch_size, n_class]

        # h_t : [num_layers(=1) * num_directions(=1), batch_size, n_hidden]
        _, h_t = self.encoder(enc_input, enc_hidden)
        # outputs : [n_step+1, batch_size, num_directions(=1) * n_hidden(=128)]
        outputs, _ = self.decoder(dec_input, h_t)

        model = self.fc(outputs)  # model : [n_step+1, batch_size, n_class]
        return model


model = Seq2Seq().to(device)
criterion = nn.CrossEntropyLoss().to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

for epoch in range(5000):
    for enc_input_batch, dec_input_batch, dec_output_batch in loader:
        # make hidden shape [num_layers * num_directions, batch_size, n_hidden]
        h_0 = torch.zeros(1, batch_size, n_hidden).to(device)

        (enc_input_batch, dec_intput_batch, dec_output_batch) = (
            enc_input_batch.to(device), dec_input_batch.to(device), dec_output_batch.to(device))
        # enc_input_batch : [batch_size, n_step+1, n_class]
        # dec_intput_batch : [batch_size, n_step+1, n_class]
        # dec_output_batch : [batch_size, n_step+1], not one-hot
        pred = model(enc_input_batch, h_0, dec_intput_batch)
        # pred : [n_step+1, batch_size, n_class]
        pred = pred.transpose(0, 1)  # [batch_size, n_step+1(=6), n_class]
        loss = 0
        for i in range(len(dec_output_batch)):
            # pred[i] : [n_step+1, n_class]
            # dec_output_batch[i] : [n_step+1]
            loss += criterion(pred[i], dec_output_batch[i])
        if (epoch + 1) % 1000 == 0:
            print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.6f}'.format(loss))

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()


# Test
def translate(word):
    enc_input, dec_input, _ = make_data([[word, '?' * n_step]])
    enc_input, dec_input = enc_input.to(device), dec_input.to(device)
    # make hidden shape [num_layers * num_directions, batch_size, n_hidden]
    hidden = torch.zeros(1, 1, n_hidden).to(device)
    output = model(enc_input, hidden, dec_input)
    # output : [n_step+1, batch_size, n_class]

    predict = output.data.max(2, keepdim=True)[1]  # select n_class dimension
    decoded = [letter[i] for i in predict]
    translated = ''.join(decoded[:decoded.index('E')])

    return translated.replace('?', '')


print('test')
print('man ->', translate('man'))
print('mans ->', translate('mans'))
print('king ->', translate('king'))
print('black ->', translate('black'))
print('up ->', translate('up'))

运行结果：

4.简单总结nn.RNNCell、nn.RNN

我更倾向于看官网的介绍，不仅可以锻炼英语能力，还能看到准确无误的一手信息

nn.RNNCell

pytorch官网链接：RNNCell — PyTorch 2.1 documentation

官网图如下，首先介绍了RNNCell的各个parameters参数：input_size,hidden_size,bias,dtype, nonlinearity指非线性激活函数，device目前没有接触过，译为装置。

以下图片介绍了输入层，隐藏层，输出层以及存储单元中的数据信息、每一层的数据形状：

以下介绍了Variables变量，weight和bias:

同时官网还给出了示例代码，我运行了一遍后，结果如下：

输出了很长很长的output,由于output是list类型的，不可以直接使用.shape去查看它的形状，所以我取了output[0],对这其中一维进行数据查看，比对形状：

结果为：

可以看到，output的长度为6。我们输出了output[0]，并且查看了output[0]的形状为[3,20]，这都与输入数据形状和隐藏数据的形状对应。

B站刘二大人的视频中对RNN Cell的解释，RNN Cell本质上就是一个线性单元Linear,将RNN Cell拆分开来看，里面有两个线性操作分别对x和h进行了线性转换操作，但是根据矩阵的性质，我们可以将这两次线性变换操作融合为一次线性操作：

《PyTorch深度学习实践》完结合集_哔哩哔哩_bilibili

nn.RNN

官网链接：RNN — PyTorch 2.1 documentation

同上，官网首先介绍RNN的参数，相比于RNNCell,RNN所接受的参数更多，除了输入维度、隐藏维度、偏置、非线性激活函数、dtype和device，还增加了num_layers (RNN层数)、batch_first、dropout 和bidirectional(翻译为‘双向’)。

再介绍输入层和隐藏状态：

输出层的信息：

变量信息：

而后官网贴出了一个例子，我们试着运行一下：

最终显示了output和最终隐藏状态h0的形状。

在B站刘二大人的视频中展示了使用RNN时输入数据、隐层以及输出数据的形状：

《PyTorch深度学习实践》完结合集_哔哩哔哩_bilibili

num_layers的设置，致使循环神经网络的结构变成如下：

5.谈一谈对“序列”、“序列到序列”的理解

借鉴文章：序列到序列模型，了解一下 - 知乎 (zhihu.com)

“序列”和“序列到序列”这两个概念通常在自然语言处理和机器学习领域中使用。

“序列”通常指的是一组有序的数据，其中每个数据都称为一个元素或一个符号。这个序列可以是一串文本、一串数字、一串DNA序列等等。在处理这样的数据时，我们通常需要考虑数据的顺序，因为它们是按照某种顺序排列的。

“序列到序列”（Seq2Seq）是一种深度学习模型，它主要用于处理序列到序列的学习问题。这种模型通常用于机器翻译、语音识别、文本摘要等任务。在这些任务中，需要将一个序列（输入序列）转化为另一个序列（输出序列）。Seq2Seq模型通过使用循环神经网络（RNN）来处理这种问题。在Seq2Seq模型中通常使用编码器-解码器架构。编码器将输入序列转化为一个固定长度的向量，而解码器则将这个向量转化为输出序列。这种模型可以处理变长输入和输出序列。

在现实生活中，我能列举的“序列到序列”的例子有：微信语音自动转文字，和Siri对话，AI有声读物等等。