深度学习之循环神经网络（RNN）

最新推荐文章于 2023-06-27 09:56:47 发布

DL方少

最新推荐文章于 2023-06-27 09:56:47 发布

阅读量595

点赞数

分类专栏：深度学习文章标签：神经网络深度学习机器学习卷积神经网络 rnn

本文链接：https://blog.csdn.net/qq_46439619/article/details/112745537

版权

深度学习专栏收录该内容

9 篇文章 0 订阅

订阅专栏

文章目录

循环神经网络
循环核
循环核按时间展开
循环计算层
TF描述循环计算层
循环计算过程1
循环计算过程2

循环神经网络

在这里插入图片描述
循环神经网络和卷积神经网络的区别

卷积神经网络：通过卷积核提取空间特征后，送入全连接网络
循环神经网络：借助循环核提取时间特征后，送入全连接网络

循环核

鱼离不开__，我们会下意识的填水字，这是因为我们的脑具有记忆，我们的记忆体记住了上文提到的“鱼离不开"这几个字，我们下意识的预测出了“水”的可能性最大，这种预测就是通过脑记忆体提取历史数据特征，预测出接下来最可能发生的情况
循环核相当于脑记忆体，循环核具有记忆力，通过不同时刻的参数共享，实现了对时间序列的信息提取
循环体内存储着每个时刻的状态信息（参数）
循环核:
在这里插入图片描述

$y_t=softmax(h_tw_{hy}+b_y)$ 相当于一层全连接

$h_t=tanh(x_tw_{xh}+h_{t-1}w_{hh}+bh)$ 相当于提取当前输入的信息，继承上一时刻循环核保存的信息。

我们可以设定记忆体的容量而改变记忆容量，当记忆体的个数被指定，输入xt,yt的维度被指定，周围的参数维度也就被限定了。

前向传播时：记忆体内存储的状态信息ht，在每个时刻都被刷新，三个参数矩阵wxh,whh,why自始至终都是固定不变的
反向传播时：三个参数矩阵wxh,whh,why被梯度下降法更新。(我们的目的就是训练这三个参数为合适的值)

循环核按时间展开

我们脑中的记忆体，每个时刻都根据当前的输入而更新，当前的预测推理，是根据你以往的知识积累，用固化下来的参数矩阵进行推理判断
循环核的原理跟这是一样的，每个时刻的状态信息h_t被刷新，每个时刻状态信息h_t和上一时刻状态信息h_t-1有关，参数矩阵是训练得到的最有效的参数

前向传播循环核按时间展开如下
在这里插入图片描述

循环计算层

向着输出方向增长的，每个循环核表示一个循环计算层
一个循环核可以有多个记忆体
在这里插入图片描述
反向传播梯度下降时：由于RNN每个时刻的节点都可能有一个输出，所以RNN总损失为所以时刻上损失的和

TF描述循环计算层

tf.keras.layers.SimpleRNN(记忆体个数，activation=‘激活函数’，return_sequences=是否每个时刻输出ht到下一层)
activation=‘激活函数’（不写，默认使用tanh）

return_sequences=Turn 各时间步输出ht（通常当下一层是依然是RNN时）
在这里插入图片描述

return_sequences=False 仅最后时间步输出ht(默认，通常当下一层是全连接层时)

在这里插入图片描述
入RNN时，x_train维度：
[送入样本数，循环核时间展开步数，每个时间步输入特征个数]
RNN层期待维度：[2,1,3]

RNN层期待维度：
[1,4,2]

循环计算过程1

字母预测：输入a预测出b,输入b预测出c,输入c预测出d,输入d预测出e,输入e预测出a(输入一个字母预测出下一个字母)
每个字母的独热码表示

10000	a
01000	b
00100	c
00010	d
00001	e

随机生成why,whh,wxh三个参数矩阵，记忆体的个数选取3
在这里插入图片描述

记忆体状态信息ht为 $h_t=tanh(x_tW_{xh}+h_{t-1}W_{hh}+b_h)\\=tanh([-2.3\quad0.8\quad1.1]+0+[0.5\quad0.3\quad-0.2])\\=tanh[-1.8\quad1.1\quad0.9]\\=[-0.9\quad0.8\quad0.7]$
输出yt为：
$y_t=softmax(h_tW_{hy}+b_y)\\=softmax([-0.7\quad -0.6\quad2.9\quad0.7\quad-0.8]+[0.0\quad0.1\quad0.4\quad-0.7\quad0.1])\\softmax([-0.7\quad-0.5\quad3.3\quad0.0\quad-0.7])\\=[0.02\quad0.02\quad0.91\quad0.03\quad0.02]$
W_hy，W_hh,W_xh，by,bh矩阵参数前向传播不更新。

可见模型认为有91%的可能性输出字母c，所以循环网络输出结果c.
字母预测的代码实现

import tensorflow as tf
import numpy as np
import tensorflow.keras.layers import Dense,SimpleRNN
import matplotlib.pyplot as plt
import os
input_word="abcde"
w_to_id={'a':0,'b':1,'c':2,'d':3,'e':4}#单词映射到数值id的词条
id_to_onehot={0:[1,0,0,0,0],1:[0,1,0,0,0],2:[0,0,1,0,0],3:[0,0,0,1,0],4:[0,0,0,0,1]} #id编码为one-hot
x_train=[id_to_onehot[w_to_id['a']],id_to_onehot[w_to_id['b']],id_to_onehot[w_to_id['c']],id_to_onehot[w_to_id['d']],id_to_onehot[w_to_id['e']]]#训练数据
y_train=[w_to_id['b'],w_to_id['c'],w_to_id['d'],w_to_id['e'],w_to_id['a']]#训练标签
np.random.seed(7)
#打乱输入特征顺序
np.random.seed(7)
np.random.shuffle(x_train)
np.random.seed(7)
np.random.shuffle(y_train)
tf.random.set_seed(7)
x_train=np.reshape(x_train,(len(x_train),1,5))#送入样本数为len(x_train)，循环核展开步数为1，每个时间输入特征个数为5
y_train=np.array(y_train)#变为numpy格式
model=tf.keras.Sequential([SimpleRNN(3),Dense(5,activation='softmax')])#具有三个记忆体的循环层
model.compile(optimizer=tf.keras.optimizers.Adam(0.01),loss=tf.keras.losses.sparse_categorical_crossentropy(from_logits=False),
              metrics=['sparse_categorical_accuracy'])
checkpoint_save_path="zimuyuce.py"
if os.path.exists(checkpoint_save_path+'.index'):
    print('-------load the model--------')
    model.load_weights(checkpoint_save_path)
cp_callback=tf.keras.callbacks.ModelCheckpoint(filepath==checkpoint_save_path,save_weights_only=True,save_best_only=True,
                                               monitor='loss')
history=model.fit(x_train,y_train,batch_size=32,epochs=100,callbacks=[cp_callback])
model.summary()
file=open('./weight.txt','w')#参数提取
for v in model.trainable_variables:
    file.write(str(v.name)+'\n')
    file.write(str(v.numpy())+'\n')
    file.write(str(v.shap)+'\n')
file.close()
#显示训练集和验证集的acc和loss曲线
acc=history.history['sparse_categorical_accuracy']
loss=history.history['loss']
plt.subplot(1,2,1)
plt.plot(acc,label='Training Accuracy')
plt.title('Training Loss')
plt.legend()
plt.show()
preNum=int(input("input the number of test alphabet"))
for i in range(preNum):
    alphabet1=input("input test alphabet:")
    alphabet=[id_to_onehot[w_to_id[alphabet1]]]
    alphabet=np.reshape(alphabet,(1,1,5))
    result=model.predict([alphabet]) 
    alphabet=np.reshape(alphabet,(1,1,5))
    result=model.predict([alphabet])
    pred=tf.argmax(result,axis=1)
    pred=int(pred)
    tf.print(alphabet1+'->'+input_word[pred])

运行结果
在这里插入图片描述

input the number of test alphabet:3

input test alphabet:b
b->c

input test alphabet:a
a->b

input test alphabet:e
e->a

循环计算过程2

连续输入多个字母预测下一个字母
比如使用3个记忆体，初试时刻记忆体是0，用一套已经训练好的参数矩阵感受循环计算的前向传播过程，每个时刻参数矩阵是固定的，记忆体会在每个时刻被更新
以输入bcde预测a为例：
在这里插入图片描述
说明有71%的可能是字母a

代码实现

import tensorflow as tf
import numpy as np
import tensorflow.keras.layers import Dense,SimpleRNN
import matplotlib.pyplot as plt
import os
input_word="abcde"
w_to_id={'a':0,'b':1,'c':2,'d':3,'e':4}#单词映射到数值id的词条
id_to_onehot={0:[1,0,0,0,0],1:[0,1,0,0,0],2:[0,0,1,0,0],3:[0,0,0,1,0],4:[0,0,0,0,1]} #id编码为one-hot
x_train=[[id_to_onehot[w_to_id['a']],id_to_onehot[w_to_id['b']],id_to_onehot[w_to_id['c']],id_to_onehot[w_to_id['d']]],
         [id_to_onehot[w_to_id['b']],id_to_onehot[w_to_id['c']],id_to_onehot[w_to_id['d']],id_to_onehot[w_to_id['e']]],
         [id_to_onehot[w_to_id['c']],id_to_onehot[w_to_id['d']],id_to_onehot[w_to_id['e']],id_to_onehot[w_to_id['a']]],
         [id_to_onehot[w_to_id['d']],id_to_onehot[w_to_id['e']],id_to_onehot[w_to_id['a']],id_to_onehot[w_to_id['b']]],
         [id_to_onehot[w_to_id['e']],id_to_onehot[w_to_id['a']],id_to_onehot[w_to_id['b']],id_to_onehot[w_to_id['c']]],]#训练数据
y_train=[w_to_id['e'],w_to_id['a'],w_to_id['b'],w_to_id['c'],w_to_id['d']]#训练标签
#打乱输入特征顺序
np.random.seed(7)
np.random.shuffle(x_train)
np.random.seed(7)
np.random.shuffle(y_train)
tf.random.set_seed(7)
x_train=np.reshape(x_train,(len(x_train),4,5))#样本数为len(x_train),时间展开步数为4，每步输入的特征数为5
y_train=np.array(y_train)
model=tf.keras.Sequential([
    SimpleRNN(3)#三个记忆体
    Dense(5,activation='softmax')#输出层5个神经元
])
model.compile(optimizer=tf.keras.optimizers.Adam(o.o1),loss=tf.keras.losses.sparse_categorical_crossentropy(from_logits=False),
              metrics=['sparse_categorical_accuracy'])
checkpoint_save_path="./checkpoint/zimuyuce"
if os.path.exists(checkpoint_save_path+'.index'):
    print('----------load the model-----------')
    model.load_weights(checkpoint_save_path)
cp_callback=tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_save_path,save_weights_only=True,save_best_only=True,
                                               monitor='loss')#根据loss,保存最优模型
history=model.fit(x_train,y_train,batch_size=32,epochs=100,callbacks=[cp_callback])
model.summary()
file=open('.weights.txt','w')#参数提取
for v in model.trainable_variables:
    file.write(str(v.name)+'\n')
    file.write(str(v.shape)+'\n')
    file.write(str(v.numpy())+'\n')
file.close()
#显示训练集和验证集的acc和loss曲线
acc=history.history['sparse_categorical_accuracy']
loss=history.history['loss']
plt.subplot(1,2,1)
plt.plot.(acc,label='Training Accuracy')
plt.title('Training Accuracy')
plt.legend()
plt.subplot(1,2,2)
plt.plot.(acc,label='Training loss')
plt.title('Training Accuracy')
plt.legend()
plt.show()
preNum=int(input("input the number of test alphabet:"))
for i in range(preNum):
    alphabet1=input("input test alphabet:")
    alphabet=[id_to_onehot[w_to_id[a]] for  a in alphabet1]
    alphabet=np.reshape(alphabet,(1,4,5))
    result=model.predict([alphabet])
    pred=tf.argmax(result,axis=1)#取出softmax映射后概率最大的值
    pred=int(pred)
    tf.print(alphabet+'->'+input_word[pred])

运行结果

input the number of test alphabet:5

input test alphabet:abcd
abcd->e

input test alphabet:eabc
eabc->d

DL方少

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录