[tensorflow2笔记十一] 循环网络_tensorflow记忆体个数怎么确定-CSDN博客

本文链接：https://blog.csdn.net/qq_38276972/article/details/114534580

用RNN实现连续数据的预测。

文章目录

1.循环核

循环核：具有记忆力，通过不同时刻的参数共享，实现对时间序列的信息提取。包括记忆体、三组待训练的参数矩阵。
可以设定记忆体的个数，来改变记忆容量，当记忆体个数被指定，输入xt、输出yt维度被指定，周围待训练参数的维度也就被限定了。
ht = tanh(xt * wxh + ht-1 * whh + bh)， yt = softmax(ht * why + by)
记忆体内存储着每个时刻的状态信息ht，输入特征xt，记忆体上一时刻存储的状态信息ht-1，偏置项bh；循环核的输出特征yt，偏置项by。
前向传播的时候，记忆体内存储的状态信息ht，在每个时刻都被刷新，三个参数矩阵wxh，whh，why自始至终都是固定不变的；反向传播的时候，三个参数矩阵被梯度下降法更新。
循环神经网络就是借助循环核实现的时间特征提取，送入全连接网络，实现连续数据的预测。

2.循环计算层

每个循环核构成一层循环计算层。循环核纵向连接，循环计算层的层数是向输出方向增长的。
每个循环核中记忆体的个数，根据需求任意指定。
tf对送入循环层的数据维度有要求。数据必须是三维的。[送入样本的总数量（组）, 循环核时间展开步数,每个时间步输入特征数]

tf.keras.layers.SimpleRNN(记忆体个数, 
						  activation='激活函数',
                          return_sequences=是否每个时刻输出ht到下一层)

# 注意
# 激活函数默认tanh
# return_sequences	# True各时间步输出ht	False仅最后时间步输出ht（默认）
# 一般最后一层的循环核用False，中间层的循环核用True

3.循环计算过程

（1）用RNN实现输入一个字母，预测下一个字母（One hot 编码）

# 1.导入模块
import tensorflow as tf
from tensorflow.keras.layers import Dense,SimpleRNN
import numpy as np
import os

# 2.读入数据
input_word = "abcde"
w_to_id = {'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4}                     # 字符->数字
id_to_onehot = {0: [1., 0., 0., 0., 0.], 1: [0., 1., 0., 0., 0.],      # 数字->独热码
                2: [0., 0., 1., 0., 0.], 3: [0., 0., 0., 1., 0.],
                4: [0., 0., 0., 0., 1.]}

x_train = [id_to_onehot[w_to_id['a']], id_to_onehot[w_to_id['b']], id_to_onehot[w_to_id['c']],
           id_to_onehot[w_to_id['d']], id_to_onehot[w_to_id['e']]]

y_train = [w_to_id['b'], w_to_id['c'], w_to_id['d'], w_to_id['e'], w_to_id['a']]

# 打乱数据集顺序
np.random.seed(2)
np.random.shuffle(x_train)
np.random.seed(2)
np.random.shuffle(y_train)
tf.random.set_seed(2)

# 满足循环核输入数据的要求  [送入样本数、时间展开步数、每步输入特征数]
x_train = np.reshape(x_train, (len(x_train), 1, 5))
y_train = np.array(y_train)

# 3.搭建网络
model = tf.keras.Sequential([
    SimpleRNN(3),       # 3个记忆体，个数越多，记忆力越好，占用资源越多
    Dense(5, activation='softmax')
])

# 4.配置方法
model.compile(optimizer=tf.keras.optimizers.Adam(0.01),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['sparse_categorical_accuracy'])
checkpoint_save_path = "./checkpoint/rnn.ckpt"
if os.path.exists(checkpoint_save_path + '.index'):
    print("----------------load the model-------------------")
    model.load_weights(checkpoint_save_path)

cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_save_path,
                                                 save_weights_only=True,
                                                 save_best_only=True,
                                                 monitor='loss') # 由于没有测试集，根据loss保存最优模型
# 5.训练
history = model.fit(x_train, y_train, batch_size=32, epochs=100, callbacks=[cp_callback])
model.summary()

# 6.预测
inputs = input("请输入字母：")
data = [id_to_onehot[w_to_id[inputs]]]
data = np.reshape(data, (1, 1, 5))
result = model(data)
pred = tf.argmax(result, axis=1)
pred = int(pred)
print(inputs + '->' + input_word[pred])