RNN 循环神经网络入门

最新推荐文章于 2024-07-17 14:38:52 发布

生产队的驴儿

最新推荐文章于 2024-07-17 14:38:52 发布

阅读量1.2k

点赞数 2

分类专栏：卷积神经网络深度学习时间序列预测文章标签：深度学习机器学习数据挖掘

本文链接：https://blog.csdn.net/weixin_46969441/article/details/121584330

版权

深度学习同时被 3 个专栏收录

16 篇文章 4 订阅

订阅专栏

卷积神经网络

7 篇文章 0 订阅

订阅专栏

时间序列预测

3 篇文章 1 订阅

订阅专栏

RNN 循环神经网络

定义：借助循环核提取时间特征后，送入全连接神经网络，实现连续数据的预测。

对比卷积神经网络

卷积神经网络

CBAPD 五个模块在这里插入图片描述
通过卷积核提取空间信息，送入全连接神经网络。

eg: 卷积核提取图片特征，送入网络，进行分类。

但是对于时间序列的预测，得使用循环神经网络。

循环神经网络

循环核：参数时间共享，循环层提取时间信息。

下图是一个记忆体：存储每个时刻状态的信息
设定记忆体个数
改变记忆体容量

当记忆体个数被指定，输入x，输出y被指定
在这里插入图片描述
记忆体当天时刻存储信息为 Ht
等于 tanh(当前时刻输入特征Xt 乘以矩阵 Wxh 加上一时刻记忆体存储状态Ht-1 乘以矩阵 Whh 加上偏置项Bh)

Yt是当前状态的输出特征
在这里插入图片描述
可以理解为一个全连接的神经网络，输出最终的结果。

一共三个参数 Wxh, Whh, Why 需要更新。

在这里插入图片描述

将循环核部分进行展开

沿的是时间轴的方向展开的。

前向传播是更新记忆体的状态（记忆体内存储的状态信息ht在每个时刻都被刷新），而三个参数矩阵wxh、whh、why和两个偏置项bh和by自始至终都是固定不变的。
反向传播是更新三个参数 Why Whh Wxh（三个参数和偏置项有梯度下降法更新）。

RNN每个时刻的节点都可能有一个输出，所以 RNN 的总损失为所有时刻（或部分时刻）上的损失和。

循环计算层数

1个循环核就是 1层
层数是沿着输出方向增加的

如下图所示，
一层就一个记忆体
两层就两个记忆体
三层就三个记忆体
在这里插入图片描述

代码实现循环核

在这里插入图片描述

最后一个循环核，设置为 return_sequences = False
中间核的循环层，设置 return_sequences = True, 每个时间步都把Ht输出给下一层

return_sequences = True 各个时间步都输出 ht
return_sequences = False 仅最后时间步输出ht

下图结果是 return_sequences = False 的示意图

在这里插入图片描述

举个例子理解：

SimpleRNN（3，return_sequences=True）
表示：三个循环核，只在最后一个循环核输出Ht

循环神网络的输入必须是三维的

三个维度分别如下：
第一个维度：送入样本数。
第二个维度：循环核展开步数。
第三个维度：时间步输入特征个数。

下图是一个例子：
要送入两组数据
每组数据要经过一个时间步，得到输出结果
每个时间步输入特征数值为 3
该循环网络的输入维度就是：【2，1，3】

在这里插入图片描述
第二个例子：
1组数据
四个时间步
每组数据特征数为 2
该循环网络的输入维度就是：【1，4，2】

举例子理解神经网络循环计算过程

字符预测

规则：
在这里插入图片描述
a->b
b->c
c->d
d->e
e->a

将五个字母使用数字，通过独热码表示
在这里插入图片描述
随机生产三个W参数 Wxh， Whh, Why

在这里插入图片描述

记忆体为 Ht
在这里插入图片描述

网络结构示意图
在这里插入图片描述
最开始时候，记忆体状态为0

开始计算
这样脑中的记忆就因为当前的输入，产生了更新，新的记忆体出来了。

在这里插入图片描述
然后，需要进入输出网络，输出yt
把提取到的时间信息，通过全连接进行识别预测。
整个网络的输出层计算如下：

在这里插入图片描述

代码部分（RNN实现字母输出）

导包

import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense, SimpleRNN
import matplotlib.pyplot as plt
import os

送入字符

# 五个字母 abcde
input_word = "abcde"

转换成数字表示字母

# 把字母 用 数字 替代
w_to_id = {'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4}  # 单词映射到数值id的词典
# 转换成独热码表示
id_to_onehot = {0: [1., 0., 0., 0., 0.], 
				1: [0., 1., 0., 0., 0.], 
				2: [0., 0., 1., 0., 0.], 
				3: [0., 0., 0., 1., 0.],
                4: [0., 0., 0., 0., 1.]}  # id编码为one-hot

生成训练集

x_train = [ id_to_onehot[w_to_id['a']], 
			id_to_onehot[w_to_id['b']], 
			id_to_onehot[w_to_id['c']],
            id_to_onehot[w_to_id['d']], 
            id_to_onehot[w_to_id['e']]]
            
y_train = [w_to_id['b'], 
			w_to_id['c'], 
			w_to_id['d'], 
			w_to_id['e'], 
			w_to_id['a']]

注意：
输入特征a 对应标签b
输入特征b 对应标签c
输入特征c 对应标签d
输入特征d 对应标签e
输入特征e 对应标签a

打乱顺序

np.random.seed(7)
np.random.shuffle(x_train)
np.random.seed(7)
np.random.shuffle(y_train)
tf.random.set_seed(7)

将输入数据形状变为 RNN网络输入的形状
[送入样本数，循环核时间展开步数，每个时间步输入特征个数]。
样本数： x的len。
循环核展开步数：输入一个字母，就直接展开输出，所以为1。
输入特征的个数：这里用了one_hot编码，所以是5个。

# 使x_train符合SimpleRNN输入要求：[送入样本数， 循环核时间展开步数， 每个时间步输入特征个数]。
# 此处整个数据集送入，送入样本数为len(x_train)；
# 输入1个字母出结果，循环核时间展开步数为1; 表示为独热码有5个输入特征，每个时间步输入特征个数为5
x_train = np.reshape(x_train, (len(x_train), 1, 5))
y_train = np.array(y_train)

第一个维度表示输入的句子数目。
第二个维度表示每次句子包含的词向量个数（需要统一）。
第三个维度就是每个词向量的维度，也要维度一致。

搭建网络模型

model = tf.keras.Sequential([
    SimpleRNN(3),     # 这里设置记忆体的个数，记忆体个数越多，占用资源越多，记忆力越好 
    Dense(5, activation='softmax') # 独热码 5个字母，映射为5 # 这层是全连接层
])

区分循环核时间展开步数和记忆体个数

循环核时间展开步数举例子理解：
这个例子需要输入四个字母，才能预测，那么我们就需要四个时间步。
比如：输入abcd，预测e；输入bcde，预测a；输入cdea，预测b；输入deab，预测c；输入eabc，预测d。

下图红色部分就是记忆体
在这里插入图片描述

配置模型参数

model.compile(optimizer=tf.keras.optimizers.Adam(0.01),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['sparse_categorical_accuracy'])

设置模型保存
和
断点继续导入旧模型

model.compile(optimizer=tf.keras.optimizers.Adam(0.01),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['sparse_categorical_accuracy'])

 checkpoint_save_path = "./checkpoint/rnn_onehot_1pre1.ckpt"

if os.path.exists(checkpoint_save_path + '.index'):
    print('-------------load the model-----------------')
    model.load_weights(checkpoint_save_path)

cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_save_path,
                                                 save_weights_only=True,
                                                 save_best_only=True,
                                                 monitor='loss')  # 由于fit没有给出测试集，不计算测试集准确率，根据loss，保存最优模型

训练模型

history = model.fit(x_train, y_train, batch_size=32, epochs=100, callbacks=[cp_callback])

查看网络结构

model.summary()

保存参数

# print(model.trainable_variables)
file = open('./weights.txt', 'w')  # 参数提取
for v in model.trainable_variables:
    file.write(str(v.name) + '\n')
    file.write(str(v.shape) + '\n')
    file.write(str(v.numpy()) + '\n')
file.close()

绘制loss 和 acc

# 显示训练集和验证集的acc和loss曲线
acc = history.history['sparse_categorical_accuracy']
loss = history.history['loss']

plt.subplot(1, 2, 1)
plt.plot(acc, label='Training Accuracy')
plt.title('Training Accuracy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(loss, label='Training Loss')
plt.title('Training Loss')
plt.legend()
plt.show()

进行预测

输入需要执行几次预测任务
等待输入字母
将字母转换为独热码
reshape为RNN的输入形状


preNum = int(input("input the number of test alphabet:"))
for i in range(preNum):
    alphabet1 = input("input test alphabet:")
    alphabet = [id_to_onehot[w_to_id[alphabet1]]]
    # 使alphabet符合SimpleRNN输入要求：[送入样本数， 循环核时间展开步数， 每个时间步输入特征个数]。此处验证效果送入了1个样本，送入样本数为1；输入1个字母出结果，所以循环核时间展开步数为1; 表示为独热码有5个输入特征，每个时间步输入特征个数为5
    alphabet = np.reshape(alphabet, (1, 1, 5))
    result = model.predict([alphabet])
    pred = tf.argmax(result, axis=1)
    pred = int(pred)
    tf.print(alphabet1 + '->' + input_word[pred])

全部代码

import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense, SimpleRNN
import matplotlib.pyplot as plt
import os

input_word = "abcde"
w_to_id = {'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4}  # 单词映射到数值id的词典
id_to_onehot = {0: [1., 0., 0., 0., 0.], 1: [0., 1., 0., 0., 0.], 2: [0., 0., 1., 0., 0.], 3: [0., 0., 0., 1., 0.],
                4: [0., 0., 0., 0., 1.]}  # id编码为one-hot

x_train = [id_to_onehot[w_to_id['a']], id_to_onehot[w_to_id['b']], id_to_onehot[w_to_id['c']],
           id_to_onehot[w_to_id['d']], id_to_onehot[w_to_id['e']]]
y_train = [w_to_id['b'], w_to_id['c'], w_to_id['d'], w_to_id['e'], w_to_id['a']]

np.random.seed(7)
np.random.shuffle(x_train)
np.random.seed(7)
np.random.shuffle(y_train)
tf.random.set_seed(7)

# 使x_train符合SimpleRNN输入要求：[送入样本数， 循环核时间展开步数， 每个时间步输入特征个数]。
# 此处整个数据集送入，送入样本数为len(x_train)；输入1个字母出结果，循环核时间展开步数为1; 表示为独热码有5个输入特征，每个时间步输入特征个数为5
x_train = np.reshape(x_train, (len(x_train), 1, 5))
y_train = np.array(y_train)

model = tf.keras.Sequential([
    SimpleRNN(3),
    Dense(5, activation='softmax')
])

model.compile(optimizer=tf.keras.optimizers.Adam(0.01),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['sparse_categorical_accuracy'])

checkpoint_save_path = "./checkpoint/rnn_onehot_1pre1.ckpt"

if os.path.exists(checkpoint_save_path + '.index'):
    print('-------------load the model-----------------')
    model.load_weights(checkpoint_save_path)

cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_save_path,
                                                 save_weights_only=True,
                                                 save_best_only=True,
                                                 monitor='loss')  # 由于fit没有给出测试集，不计算测试集准确率，根据loss，保存最优模型

history = model.fit(x_train, y_train, batch_size=32, epochs=100, callbacks=[cp_callback])

model.summary()

# print(model.trainable_variables)
file = open('./weights.txt', 'w')  # 参数提取
for v in model.trainable_variables:
    file.write(str(v.name) + '\n')
    file.write(str(v.shape) + '\n')
    file.write(str(v.numpy()) + '\n')
file.close()

###############################################    show   ###############################################

# 显示训练集和验证集的acc和loss曲线
acc = history.history['sparse_categorical_accuracy']
loss = history.history['loss']

plt.subplot(1, 2, 1)
plt.plot(acc, label='Training Accuracy')
plt.title('Training Accuracy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(loss, label='Training Loss')
plt.title('Training Loss')
plt.legend()
plt.show()

############### predict #############

preNum = int(input("input the number of test alphabet:"))
for i in range(preNum):
    alphabet1 = input("input test alphabet:")
    alphabet = [id_to_onehot[w_to_id[alphabet1]]]
    # 使alphabet符合SimpleRNN输入要求：[送入样本数， 循环核时间展开步数， 每个时间步输入特征个数]。此处验证效果送入了1个样本，送入样本数为1；输入1个字母出结果，所以循环核时间展开步数为1; 表示为独热码有5个输入特征，每个时间步输入特征个数为5
    alphabet = np.reshape(alphabet, (1, 1, 5))
    result = model.predict([alphabet])
    pred = tf.argmax(result, axis=1)
    pred = int(pred)
    tf.print(alphabet1 + '->' + input_word[pred])

例子改进：

变为连续输入四个字母，预判下一个字母的输出情况的可能性

在这里插入图片描述

代码部分：
导包

import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense, SimpleRNN
import matplotlib.pyplot as plt
import os

传入数据，并用数字表示y_train
one_hot 处理 x_train

input_word = "abcde"
w_to_id = {'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4}  # 单词映射到数值id的词典
id_to_onehot = {0: [1., 0., 0., 0., 0.], 1: [0., 1., 0., 0., 0.], 2: [0., 0., 1., 0., 0.], 3: [0., 0., 0., 1., 0.],
                4: [0., 0., 0., 0., 1.]}  # id编码为one-hot

创建训练集

x_train = [
    [id_to_onehot[w_to_id['a']], id_to_onehot[w_to_id['b']], id_to_onehot[w_to_id['c']], id_to_onehot[w_to_id['d']]],
    [id_to_onehot[w_to_id['b']], id_to_onehot[w_to_id['c']], id_to_onehot[w_to_id['d']], id_to_onehot[w_to_id['e']]],
    [id_to_onehot[w_to_id['c']], id_to_onehot[w_to_id['d']], id_to_onehot[w_to_id['e']], id_to_onehot[w_to_id['a']]],
    [id_to_onehot[w_to_id['d']], id_to_onehot[w_to_id['e']], id_to_onehot[w_to_id['a']], id_to_onehot[w_to_id['b']]],
    [id_to_onehot[w_to_id['e']], id_to_onehot[w_to_id['a']], id_to_onehot[w_to_id['b']], id_to_onehot[w_to_id['c']]],
]
y_train = [w_to_id['e'], w_to_id['a'], w_to_id['b'], w_to_id['c'], w_to_id['d']]

打乱训练集

np.random.seed(7)
np.random.shuffle(x_train)
np.random.seed(7)
np.random.shuffle(y_train)
tf.random.set_seed(7)

将训练集 reshape成RNN输入维度

# 使x_train符合SimpleRNN输入要求：[送入样本数， 循环核时间展开步数， 每个时间步输入特征个数]。
# 此处整个数据集送入，送入样本数为len(x_train)；输入4个字母出结果，循环核时间展开步数为4; 表示为独热码有5个输入特征，每个时间步输入特征个数为5
x_train = np.reshape(x_train, (len(x_train), 4, 5))
y_train = np.array(y_train)

搭建网络结构

model = tf.keras.Sequential([
    SimpleRNN(3),
    Dense(5, activation='softmax')
])

给网络结构配置参数

model.compile(optimizer=tf.keras.optimizers.Adam(0.01),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['sparse_categorical_accuracy'])

设置模型保存

checkpoint_save_path = "./checkpoint/rnn_onehot_4pre1.ckpt"

if os.path.exists(checkpoint_save_path + '.index'):
    print('-------------load the model-----------------')
    model.load_weights(checkpoint_save_path)

cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_save_path,
                                                 save_weights_only=True,
                                                 save_best_only=True,
                                                 monitor='loss')  # 由于fit没有给出测试集，不计算测试集准确率，根据loss，保存最优模型

训练模型

history = model.fit(x_train, y_train, batch_size=32, epochs=100, callbacks=[cp_callback])

查看网络结构

model.summary()

创建txt用于保存模型训练的参数

file = open('./weights.txt', 'w')  # 参数提取
for v in model.trainable_variables:
    file.write(str(v.name) + '\n')
    file.write(str(v.shape) + '\n')
    file.write(str(v.numpy()) + '\n')
file.close()

打印出模型的loss和acc曲线

# 显示训练集和验证集的acc和loss曲线
acc = history.history['sparse_categorical_accuracy']
loss = history.history['loss']

plt.subplot(1, 2, 1)
plt.plot(acc, label='Training Accuracy')
plt.title('Training Accuracy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(loss, label='Training Loss')
plt.title('Training Loss')
plt.legend()
plt.show()

进行预测

preNum = int(input("input the number of test alphabet:"))
for i in range(preNum):
    alphabet1 = input("input test alphabet:")
    alphabet = [id_to_onehot[w_to_id[a]] for a in alphabet1]
    # 使alphabet符合SimpleRNN输入要求：[送入样本数， 循环核时间展开步数， 每个时间步输入特征个数]。此处验证效果送入了1个样本，送入样本数为1；输入4个字母出结果，所以循环核时间展开步数为4; 表示为独热码有5个输入特征，每个时间步输入特征个数为5
    alphabet = np.reshape(alphabet, (1, 4, 5)) # 数据为1个，时间步为4，特征为5
    result = model.predict([alphabet])
    pred = tf.argmax(result, axis=1) # 选出可能性最大的作为输出
    pred = int(pred)
    tf.print(alphabet1 + '->' + input_word[pred])

Embedding 新的编码方法（不同于one_hot编码）

出现原因：独热码位宽要和词汇量保存一致。
词汇量大的话，导致资源浪费。

在这里插入图片描述

tf.keras.layers.Embedding(词汇表大小，编码维度)

词汇表大小：编码表示多少单词。
编码维度：几个数字表示一个单词。

tf.keras.layers.Embedding(100,3)
表示编码100个单词， 3个数字表示一个单词。
在这里插入图片描述

数据送入Embedding时，数据必须是二维的
【送入样本数，循环核时间展开步数】

前面不改变的部分：

import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense, SimpleRNN, Embedding
import matplotlib.pyplot as plt
import os

input_word = "abcde"
w_to_id = {'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4}  # 单词映射到数值id的词典

x_train = [w_to_id['a'], w_to_id['b'], w_to_id['c'], w_to_id['d'], w_to_id['e']]
y_train = [w_to_id['b'], w_to_id['c'], w_to_id['d'], w_to_id['e'], w_to_id['a']]

np.random.seed(7)
np.random.shuffle(x_train)
np.random.seed(7)
np.random.shuffle(y_train)
tf.random.set_seed(7)

将上面预测字母的案例，one_hot部分修改为Embedding部分：

# 使x_train符合Embedding输入要求：[送入样本数， 循环核时间展开步数] ，
# 此处整个数据集送入所以送入，送入样本数为len(x_train)；输入1个字母出结果，循环核时间展开步数为1。
x_train = np.reshape(x_train, (len(x_train), 1))
y_train = np.array(y_train)

reshape 训练集 x_train 部分，
第一个参数：len(x_train) 是送入样本个数。这是是当前样本集的个数，5个
第二个参数：1 表示循环核时间展开步数，意思就是一个输入，才会有一个输出。

model = tf.keras.Sequential([
    Embedding(5, 2), # Embedding层，对输入数据进行编码。生成5行，2列的可训练参数矩阵
    SimpleRNN(3),
    Dense(5, activation='softmax')
])

注意：Embedding层只能作为模型的第一层

关于embedding通俗的理解：

首先是得理解什么是 one_hot编码：

eg: 假设一个句子有10个字，每个字都刚好不一样，那么用字母0-9替代
就是如下

我从哪里来要到何处去

0 1 2 3 4 5 6 7 8 9

转变其为 one_hot编码就是


# 我从哪里来，要到何处去
[
[1 0 0 0 0 0 0 0 0 0]
[0 1 0 0 0 0 0 0 0 0]
[0 0 1 0 0 0 0 0 0 0]
[0 0 0 1 0 0 0 0 0 0]
[0 0 0 0 1 0 0 0 0 0]
[0 0 0 0 0 1 0 0 0 0]
[0 0 0 0 0 0 1 0 0 0]
[0 0 0 0 0 0 0 1 0 0]
[0 0 0 0 0 0 0 0 1 0]
[0 0 0 0 0 0 0 0 0 1]

但是如果对于一篇文章来说，假设其有100w条句子，这篇文章里面有10w个不同的字，
那是不是得写成 10w X 100w, 就是 10w 行， 100w 列的超大矩阵。
这样的话，太浪费空间了。

因此选择将其进行矩阵转换，即改变维度。
在这里插入图片描述

就比如说，这是一个2 x 6 的矩阵通过和一个6 x 3的矩阵相乘，它就可以变成一个2x3的矩阵。

这不就形成一个降维，说白了，就是将特征进行合并了，就跟 1x1维度的向量对卷积层的降维一样。

那么为什么要降维呢？
这是作者写的实在是太好了！
在这里插入图片描述
这是两张兔子的图片，对比其，找出不同的地方。

当我们距离图片 1 米远的时候，我们更容易一眼就看出图中间有个红色的爱心是不同处之一。

当我距离0.5米，会发现，右上角，省略号不同。

当我们距离25cm时候，发现，耳朵有一只是不同的。

当我们距离更近一些，我们发现，兔子的脸上，也是有一些不同的。

再近一些，发现右边，天空，的白云不同。

总结：
距离的远近会影响我们的观察效果。
同理也是一样的，低维的数据可能包含的特征是非常笼统的，
通过不停地拉近拉远来改变感受野，让我们对这幅图有不同的观察点，找出不同之处。

embedding 不仅仅是降低数据的维度，它还可以对数据进行升维。
对低维的数据进行升维时，可能把一些其他特征给放大了，或者把笼统的特征给分开了。

比如：通过来回靠近和远离屏幕，发现45厘米是最佳观测点，这个距离能10秒就把5个不同点找出来了。

当然这也是 CNN层数越深准确率越高，卷积层卷了又卷，池化层池了又升，升了又降，全连接层连了又连。
因为我们也不知道它什么时候突然就学到了某个有用特征。
但是不管怎样，学习都是好事，所以让机器多卷一卷，多连一连，反正错了多少我会用交叉熵告诉你，怎么做才是对的我会用梯度下降算法告诉你，只要给你时间，你迟早会学懂。
因此，理论上，只要层数深，只要参数足够，NN能拟合任何特征。

剩余一样的部分：

model.compile(optimizer=tf.keras.optimizers.Adam(0.01),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['sparse_categorical_accuracy'])

checkpoint_save_path = "./checkpoint/run_embedding_1pre1.ckpt"

if os.path.exists(checkpoint_save_path + '.index'):
    print('-------------load the model-----------------')
    model.load_weights(checkpoint_save_path)

cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_save_path,
                                                 save_weights_only=True,
                                                 save_best_only=True,
                                                 monitor='loss')  # 由于fit没有给出测试集，不计算测试集准确率，根据loss，保存最优模型

history = model.fit(x_train, y_train, batch_size=32, epochs=100, callbacks=[cp_callback])

model.summary()

# print(model.trainable_variables)
file = open('./weights.txt', 'w')  # 参数提取
for v in model.trainable_variables:
    file.write(str(v.name) + '\n')
    file.write(str(v.shape) + '\n')
    file.write(str(v.numpy()) + '\n')
file.close()

acc = history.history['sparse_categorical_accuracy']
loss = history.history['loss']

plt.subplot(1, 2, 1)
plt.plot(acc, label='Training Accuracy')
plt.title('Training Accuracy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(loss, label='Training Loss')
plt.title('Training Loss')
plt.legend()
plt.show()

进行预测的时候，注意需要对预测的数据进行 reshape一下
分别是送入样本数，和循环核时间的展开步数都要写出来。

preNum = int(input("input the number of test alphabet:"))
for i in range(preNum):
    alphabet1 = input("input test alphabet:")
    alphabet = [w_to_id[alphabet1]]
    # 使alphabet符合Embedding输入要求：[送入样本数， 循环核时间展开步数]。
    # 此处验证效果送入了1个样本，送入样本数为1；输入1个字母出结果，循环核时间展开步数为1。
    alphabet = np.reshape(alphabet, (1, 1))
    result = model.predict(alphabet)
    pred = tf.argmax(result, axis=1)
    pred = int(pred)
    tf.print(alphabet1 + '->' + input_word[pred])

案例： embedding 实现多个字母输入，预测一个字母的情况

import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense, SimpleRNN, Embedding
import matplotlib.pyplot as plt
import os

设置输入
将字母转换成数字

input_word = "abcdefghijklmnopqrstuvwxyz"
w_to_id = {'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4,
           'f': 5, 'g': 6, 'h': 7, 'i': 8, 'j': 9,
           'k': 10, 'l': 11, 'm': 12, 'n': 13, 'o': 14,
           'p': 15, 'q': 16, 'r': 17, 's': 18, 't': 19,
           'u': 20, 'v': 21, 'w': 22, 'x': 23, 'y': 24, 'z': 25}  # 单词映射到数值id的词典

training_set_scaled = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
                       11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
                       21, 22, 23, 24, 25]

建立两个 list 用于存储训练用的数据集

x_train = []
y_train = []

通过for 循环，每四个数作为输入特征添加到 x_train 中，
第五个数字，作为标签，添加到 y_train中。

for i in range(4, 26):
    x_train.append(training_set_scaled[i - 4:i])
    y_train.append(training_set_scaled[i])

同步打乱训练集标签和特征的顺序

np.random.seed(7)
np.random.shuffle(x_train)
np.random.seed(7)
np.random.shuffle(y_train)
tf.random.set_seed(7)

将输入特征变为 Embedding 期待输入的形状
第一个维度：送入样本的数量，这里为整个数据集，是22个，len(x_train)
第二个维度：循环核时间的展开步数，连续四个输入才会有一个输出。所以为 4—>1。

# 使x_train符合Embedding输入要求：[送入样本数， 循环核时间展 开步数] ，
# 此处整个数据集送入所以送入，送入样本数为len(x_train)；输入4个字母出结果，循环核时间展开步数为4。
x_train = np.reshap e(x_train, (len(x_train), 4))
y_train = np.array(y_train)

搭建模型

model = tf.keras.Sequential([ 
    Embedding(26, 2),    # 26 表示 词汇量26，2表示每个单词用两个数值编码
    SimpleRNN(10), # 10个记忆体 的 循环层
    Dense(26, activation='softmax')   #全连接层 实现 输出的计算
])

设置模型保存

checkpoint_save_path = "./checkpoint/rnn_embedding_4pre1.ckpt"

if os.path.exists(checkpoint_save_path + '.index'):
    print('-------------load the model-----------------')
    model.load_weights(checkpoint_save_path)

cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_save_path,
                                                 save_weights_only=True,
                                                 save_best_only=True,
                                                 monitor='loss')  # 由于fit没有给出测试集，不计算测试集准确率，根据loss，保存最优模型

训练模型

history = model.fit(x_train, y_train, batch_size=32, epochs=100, callbacks=[cp_callback])

查看网络结构

model.summary()

将训练参数保存到txt文件内

file = open('./weights.txt', 'w')  # 参数提取
for v in model.trainable_variables:
    file.write(str(v.name) + '\n')
    file.write(str(v.shape) + '\n')
    file.write(str(v.numpy()) + '\n')
file.close()

绘制 loss 和 acc曲线

# 显示训练集和验证集的acc和loss曲线
acc = history.history['sparse_categorical_accuracy']
loss = history.history['loss']

plt.subplot(1, 2, 1)
plt.plot(acc, label='Training Accuracy')
plt.title('Training Accuracy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(loss, label='Training Loss')
plt.title('Training Loss')
plt.legend()
plt.show()

输入数据进行预测

preNum = int(input("input the number of test alphabet:"))
for i in range(preNum):
    alphabet1 = input("input test alphabet:")
    alphabet = [w_to_id[a] for a in alphabet1]
    # 使alphabet符合Embedding输入要求：[送入样本数， 时间展开步数]。
    # 此处验证效果送入了1个样本，送入样本数为1；输入4个字母出结果，循环核时间展开步数为4。
    alphabet = np.reshape(alphabet, (1, 4))
    result = model.predict([alphabet])
    pred = tf.argmax(result, axis=1)
    pred = int(pred)
    tf.print(alphabet1 + '->' + input_word[pred])

references:

https://blog.csdn.net/weixin_42078618/article/details/82999906?spm=1001.2101.3001.6650.1&utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7Edefault-1.no_search_link&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7Edefault-1.no_search_link

生产队的驴儿

关注

2
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
RNN 循环神经网络入门

RNN 循环神经网络定义：借助循环核提取时间特征后，送入全连接神经网络，实现连续数据的预测。对比卷积神经网络卷积神经网络CBAPD 五个模块通过卷积核提取空间信息，送入全连接神经网络。eg: 卷积核提取图片特征，送入网络，进行分类。但是对于时间序列的预测，得使用循环神经网络。循环神经网络循环核：参数时间共享，循环层提取时间信息。下图是一个记忆体：存储每个时刻状态的信息设定记忆体个数改变记忆体容量当记忆体个数被指定，输入x，输出y被指定记忆体当天时刻
复制链接

扫一扫