循环神经网络介绍与案例解析

最新推荐文章于 2024-09-13 19:22:25 发布

猿心不灭

最新推荐文章于 2024-09-13 19:22:25 发布

阅读量1k

点赞数 1

分类专栏： Computer View 文章标签： python 深度学习神经网络 tensorflow rnn

本文链接：https://blog.csdn.net/weixin_46297209/article/details/114994033

版权

Computer View 专栏收录该内容

22 篇文章 3 订阅

订阅专栏

循环神经网络

一：循环核

循环核：参数时间共享，循环层提取时间信息。

循环核具有记忆力，通过不同时刻的参数共享，实现了对时间序列的信息提取。

结构图如下：
在这里插入图片描述

可以通过设定记忆体的个数来改变记忆容量，当记忆体的个数被指定，输入x_t，输出y_t维度被指定，那么那些待训练参数的维度也就被限定了记忆体内存储着每个时刻的状态h_t，记忆体当前时刻存储的状态信息h_t公式如下：
$h_t = tanh(x_tw_{xh} + h_{t-1}w_{hh} + bh)$
其中h_t-1表示记忆体上一时刻存储的状态信息h_t-1，bh为偏置项

而当前时刻循环核的输出特征y_t公式如下：
$y_t = softmax(h_tw_{hy} + by)$
其中by为偏置项

前向传播时：记忆体内存储的状态信息h_t，在每个时刻都被刷新，三个参数矩阵w_xh，w_hh，w_hy自始至终都是固定不变的。

反向传播时：三个参数矩阵w_xh，w_hh，w_hy被梯度下降法更新。

二：循环核按时间步展开

上图循环核按时间步展开图如下：
在这里插入图片描述

循环神经网络就是借助循环核实现的时间特征提取，再把提取到的信息送入全连接网络，实现连续数据的预测。y_t是整个循环网络的末层，就是一个全连接网络，借助全连接网络，实现连续数据预测。

三：Tensorflow描述循环计算层

每个循环核构成一层循环计算层，循环计算层的层数是向输出方向增长的，其中每个循环核中记忆体的个数是根据你的需求任意指定的。在tensorflow中是按如下方式定义的：

tf.kerax.layers.SimpleRNN(
	记忆体个数, 
	activation="激活函数",  # 默认使用tanh
	return_sequences=s是否每个时刻输出ht到下一层,  # True表示各时间步输出ht,False表示仅最后时间步输出ht，默认为False
)

一般最后一层的循环核用False，尽在最后一个时间步输出h_t，中间的循环核用True，每个时间步都把h_t输出给下一层。

API对送入循环层的数据维度是有要求的，要求送入循环层的数据是三维的，第一维是送入样本的总数量，第二维是循环核按时间展开的步数，第三维是每个时间步输入特征的个数

四：代码实现字母预测

用RNN实现输入一个字母，预测下一个字母

需求：输入a输出b，输入b输出c，输入c输出d，输入d输出e，输入e输出a

训练代码

import tensorflow as tf
import numpy as np
from tensorflow.keras.layers import Dense, SimpleRNN
import matplotlib.pyplot as plt
import os

'''
字母预测：输入a输出b，输入b输出c，输入c输出d，输入d输出e，输入e输出a
'''
input_words = "abcde"
w_to_id = {"a": 0, "b": 1, "c": 2, "d": 3, "e": 4}  # 单词映射到数值id的词典
id_to_onhot = {0: [1., 0., 0., 0., 0.], 1: [0., 1., 0., 0., 0.], 2: [0., 0., 1., 0., 0.], 3: [0., 0., 0., 1., 0.], 4: [0., 0., 0., 0., 1.]}  # ont_hot编码

x_train = [id_to_onhot[w_to_id["a"]], id_to_onhot[w_to_id["b"]], id_to_onhot[w_to_id["c"]], id_to_onhot[w_to_id["d"]], id_to_onhot[w_to_id["e"]]]
y_train = [w_to_id["b"], w_to_id["c"], w_to_id["d"], w_to_id["e"], w_to_id["a"]]

# 打乱数据集
np.random.seed(7)
np.random.shuffle(x_train)
np.random.seed(7)
np.random.shuffle(y_train)
np.random.seed(7)

# 使x_train符合SimpleRNN输入要求：(送入样本数，循环核时间展开步数，每个时间输入特征个数)
# 此处整个数据集送入，送入样本数为len(x_train)，输入1个字母出结果，循环核时间展开步数为1，表示为独热码后有5个输入特征，每个时间输入特征个数为5
x_train = np.reshape(x_train, (len(x_train), 1, 5))
y_train = np.array(y_train)

# 搭建RNN网络结构
model = tf.keras.Sequential([
    SimpleRNN(3),
    Dense(5, activation="softmax")
])

# 配置网络结构
model.compile(
    optimizer=tf.keras.optimizers.Adam(0.01),
    loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
    metrics=["sparse_categorical_accuracy"]
)

# 模型保存位置
checkpoint_save_path = "./checkpoint1/word_pre.ckpt"

if os.path.exists(checkpoint_save_path + ".index"):
    print("-------------load model-------------")
    model.load_weights(checkpoint_save_path)

cp_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_save_path,
    save_weights_only=True,
    save_best_only=True,
    monitor="loss"  # 由于fit没有给出测试集，不计算测试集准确率，根据loss，保存最优模型
)

history = model.fit(x_train, y_train, epochs=100, batch_size=32, callbacks=[cp_callback])

model.summary()

# 参数提取
with open("./weithts.txt", "w") as f:
    for v in model.trainable_variables:
        f.write(str(v.name) + "\n")
        f.write(str(v.shape) + "\n")
        f.write(str(v.numpy()) + "\n")

# 绘图
acc = history.history["sparse_categorical_accuracy"]
loss = history.history["loss"]

plt.subplot(1,2,1)
plt.plot(acc, label="Training Accuracy")
plt.title("Training Accuracy")
plt.legend()

plt.subplot(1,2,2)
plt.plot(loss, label="Training Loss")
plt.title("Training Loss")
plt.legend()

plt.show()

测试代码

import tensorflow as tf
import numpy as np
from tensorflow.keras.layers import Dense, SimpleRNN

# 定义模型保存路径
model_path = "./checkpoint1/word_pre.ckpt"

# 搭建模型
model = tf.keras.Sequential([
    SimpleRNN(3),
    Dense(5, activation="softmax")
])

# 读取参数
model.load_weights(model_path)

# 模型预测
input_word = "abcde"
w_to_id = {"a": 0, "b": 1, "c": 2, "d": 3, "e": 4}  # 单词映射到数值id的词典
id_to_onhot = {0: [1., 0., 0., 0., 0.], 1: [0., 1., 0., 0., 0.], 2: [0., 0., 1., 0., 0.], 3: [0., 0., 0., 1., 0.],
                   4: [0., 0., 0., 0., 1.]}
nums = input("请输入预测次数：")
for i in range(int(nums)):
    pre_word = input("请输入预测字母：")

    x_input = id_to_onhot[w_to_id[pre_word]]
    x_input = np.reshape(x_input, (1, 1, 5))
    result = model.predict(x_input)
    pred = tf.argmax(result, axis=1)
    pred = int(pred)
    tf.print(pre_word + "---->" + input_word[pred])

五：Embedding编码

Embedding编码简介

独热码：位宽与词汇量一致，词汇量增大时，非常浪费资源，且映射间是独立的，没有表现出关联性。

Embedding：是一种单词编码方法，用低维向量实现了编码，这种编码通过神经网络训练优化，能表达出单词间的相关性。

tensorflow调用格式如下：

tf.keras.layers.Embedding(词汇表大小，编码维度)

词汇表大小就是你的编码一共要表示多少个单词，编码维度就是用几个数字表达一个单词，例如对1-100进行编码，[4]的编码为[0.25，0.1，0.11]，tensorflow的输入格式为：

tf.keras.layers.Embedding(100,3)

入Embedding时，x_train维度：[送入样本数，循环核时间展开步数]

使用Embedding编码的单字母预测模型代码如下

import tensorflow as tf
import numpy as np
import os

import matplotlib.pyplot as plt
from tensorflow.keras.layers import Embedding, SimpleRNN, Dense

input_word = "abcde"
w_to_id = {"a": 0, "b": 1, "c": 2, "d": 3, "e": 4}
x_train = [w_to_id["a"], w_to_id["b"], w_to_id["c"], w_to_id["d"], w_to_id["e"]]
y_train = [w_to_id["b"], w_to_id["c"], w_to_id["d"], w_to_id["e"], w_to_id["a"]]

# 打乱顺序
np.random.seed(7)
np.random.shuffle(x_train)
np.random.seed(7)
np.random.shuffle(y_train)
np.random.seed(7)

# 使x_train符合Embedding输入要求；[送入样本数：循环核时间展开步数]
x_train = np.reshape(x_train, (len(x_train), 1))
y_train = np.array(y_train)

# 搭建模型
model = tf.keras.Sequential([
    Embedding(5, 2),
    SimpleRNN(3),
    Dense(5, activation="softmax")
])

# 模型配置
model.compile(
    optimizer=tf.keras.optimizers.Adam(0.01),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
    metrics=["sparse_categorical_accuracy"]
)

# 定义模型保存地址
checkpoint_save_path = "./checkpoint3/embedd_encode.ckpt"
if os.path.exists(checkpoint_save_path):
    print("------------load model---------------")
    model.load_weights(checkpoint_save_path)

cp_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_save_path,
    save_best_only=True,
    save_weights_only=True,
    monitor="loss"
)

# 训练模型
history = model.fit(x_train, y_train, batch_size=32, epochs=100, callbacks=[cp_callback])

# 打印信息
model.summary()

# 记录参数到txt文件中
with open("./weights.txt", "w") as f:
    for v in model.trainable_variables:
        f.write(str(v.name) + "\n")
        f.write(str(v.shape) + "\n")
        f.write((str(v.numpy())) + "\n")

# 绘图
acc = history.history["sparse_categorical_accuracy"]
loss = history.history["loss"]

plt.subplot(1, 2, 1)
plt.plot(acc, label="Training Accuracy")
plt.title("Training Accuracy")
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(loss, label="Training Loss")
plt.title("Training Loss")
plt.legend()

plt.show()

使用Embedding编码的连续字母预测模型代码如下

import tensorflow as tf
import numpy as np
import os

from tensorflow.keras.layers import SimpleRNN, Dense, Embedding

input_word = "abcdefghijkl"
w_to_id = {0:"a", 1:"b", 2:"c", 3:"d", 4:"e", 5:"f", 6:"g", 7:"h", 8:'i', 9:"g", 10:"k", 11:"l"}
traing_set_scalsd = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

x_train = []
y_train = []

for i in range(3, 11):
    x_train.append(traing_set_scalsd[i-3:i])
    y_train.append(traing_set_scalsd[i])

# 乱序
np.random.seed(7)
np.random.shuffle(x_train)
np.random.seed(7)
np.random.shuffle(y_train)
np.random.seed(7)

# enbedding编码 
x_train = np.reshape(x_train, (len(x_train), 3))
y_train = np.array(y_train)

# 搭建模型
model = tf.keras.Sequential([
    Embedding(11, 3),
    SimpleRNN(10),
    Dense(11, activation="softmax")
])

# 模型配置
model.compile(
    optimizer=tf.keras.optimizers.Adam(0.01),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
    metrics=["sparse_categorical_accuracy"]
)

# 保存模型
model_save_path = "./checkpoint4/en_many.ckpt"
if os.path.exists(model_save_path):
    print("---------------load model--------------")
    model.load_weights()

cp_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=model_save_path,
    save_weights_only=True,
    save_best_only=True,
    monitor="loss"
)

# 模型训练
model.fit(x_train, y_train, batch_size=32, epochs=100, callbacks=[cp_callback])

# 打印信息
model.summary()

# 保存到txt文件中
with open(r"./weights1.txt", "w") as f:
    for v in model.trainable_variables:
        f.write(str(v.name) + "\n")
        f.write(str(v.shape) + "\n")
        f.write(str(v.numpy()) + "\n")

六：股票预测案例

from tensorflow.keras.layers import SimpleRNN, Dense, Dropout, LSTM
from sklearn.preprocessing import MinMaxScaler
import matplotlib.pyplot as plt
import tensorflow as tf
import pandas as pd
import numpy as np
import os
import tushare as ts


# 下载数据
df1 = ts.get_k_data("600519", ktype="D", start="2010-04-26", end="2021-03-17")
datapath1 = "./SH600519.csv"
df1.to_csv(datapath1)

# 读取股票文件
maotai = pd.read_csv("SH600519.csv")

#划分训练集和测试集，最后300天作为测试集
traning_set = maotai.iloc[0:2642-300, 2:3].values
test_set = maotai.iloc[2642-300:, 2:3].values

# 归一化
sc = MinMaxScaler(feature_range=(0, 1))  # 定义归一化：归一化到(0,1)之间
traning_set_scaled = sc.fit_transform(traning_set)  # 求得训练集的最大值、最小值这些训练集固有的属性，并在训练集上进行归一化
test_set = sc.transform(test_set)  # 利用训练集的属性对测试集进行归一化

x_train = []
y_train = []

x_test = []
y_test = []

# 利用for循环，遍历整个训练集，提取训练集中连续60天的开盘价格作为输入特征x_train，第61天的特征作为y_train
for i in range(60, len(traning_set_scaled)):
    x_train.append(traning_set_scaled[i-60:i, 0])
    y_train.append(traning_set_scaled[i, 0])

# 乱序
np.random.seed(7)
np.random.shuffle(x_train)
np.random.seed(7)
np.random.shuffle(y_train)
np.random.seed(7)

# 训练集转换格式
x_train = np.array(x_train)
x_train = np.reshape(x_train, (x_train.shape[0], 60, 1))  # 使x_train符合RNN输入格式
y_train = np.array(y_train)

# 测试集格式转换
# 利用for循环，遍历整个测试集，提取测试集中连续60天的数据作为输入特征x_test, 第61天的数据作为y_test
for i in range(60, len(test_set)):
    x_test.append(test_set[i-60:i, 0])
    y_test.append(test_set[i, 0])

# 转换格式
x_test = np.array(x_test)
x_test = np.reshape(x_test, (x_test.shape[0], 60, 1))
y_test = np.array(y_test)

# 搭建神经网络
model = tf.keras.Sequential([
    SimpleRNN(80, return_sequences=True),
    # LSTM(80, return_sequences=True),
    Dropout(0.2),
    SimpleRNN(100),
    # LSTM(100),
    Dropout(0.2),
    Dense(1)
])

# 参数配置
# 该应用只观测loss数值，不观测准确率，所以删去metrics选项，一会在每个epoch迭代显示时只显示loss值
model.compile(
    optimizer=tf.keras.optimizers.Adam(0.001),
    loss="mean_squared_error"  # 损失函数用均方误差
)

checkpoint_save_path = "./checkpoint/rnn_stock.ckpt"

if os.path.exists(checkpoint_save_path + ".index"):
    print("------------load model-----------")
    model.load_weights(checkpoint_save_path)

cp_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_save_path,
    save_best_only=True,
    save_weights_only=True,
    monitor="val_loss"
)

# 训练模型
history = model.fit(x_train, y_train, batch_size=64, epochs=10, validation_data=(x_test, y_test), validation_freq=1, callbacks=[cp_callback])

# 参看模型结构
model.summary()

# 保存模型参数到txt文本中
with open("weights.txt", "w") as f:
    for v in model.trainable_variables:
        f.write(str(v.name) + "\n")
        f.write(str(v.shape) + "\n")
        f.write(str(v.numpy()) + "\n")

loss = history.history["loss"]
val_loss = history.history["val_loss"]

# 绘图
plt.plot(loss, label="Traing Loss")
plt.plot(val_loss, label="Validition Loss")
plt.title("Training And Validition Loss")
plt.legend()
plt.show()

# 测试集输入模型进行预测
predicted_stock_price = model.predict(x_test)
# 对预测数据进行还原---从（0，1）反归一化到原始范围
predicted_stock_price = sc.inverse_transform(predicted_stock_price)
# 对真实数据进行还原---从(0,1)反归一化到原始范围
real_stock_price = sc.inverse_transform(test_set[60:])
# 画出真实数据和预测数据的对比曲线
plt.plot(predicted_stock_price, color="red", label="Predicted Price")
plt.plot(real_stock_price, color="blue", label="Real Price")
plt.title("Price Prediction")
plt.xlabel("Time")
plt.ylabel("Price")
plt.legend()
plt.show()