四十四.长短期记忆网络(LSTM)过程和keras实现股票预测

最新推荐文章于 2024-02-03 09:30:00 发布

stackooooover

最新推荐文章于 2024-02-03 09:30:00 发布

阅读量813

点赞数 1

分类专栏：机器学习实战机器学习理论基础文章标签： lstm keras 深度学习

本文链接：https://blog.csdn.net/weixin_36128607/article/details/120042706

版权

机器学习理论基础同时被 2 个专栏收录

39 篇文章 8 订阅

订阅专栏

机器学习实战

31 篇文章 4 订阅

订阅专栏

1.概述

传统循环网络RNN可以通过记忆体实现短期记忆进行连续数据的预测。
但是，当连续数据的序列变长时，会使展开时间步过长，在反向传播更新参数的过程中，梯度要按时间步连续相乘，会导致梯度消失或者梯度爆炸。
LSTM是RNN的变体，通过门结构，有效的解决了梯度爆炸或者梯度消失问题。
LSTM在RNN的基础上引入了三个门结构和记录长期记忆的细胞态以及归纳出新知识的候选态。

2.LSTM结构

(1)短期记忆

短期记忆 $h^{t}$ 即为RNN中的记忆体，在LSTM中，它通过输出门 $o^{t}$ 和经过tanh函数的长期记忆 $c^{t}$ 的哈达玛积得到：
$h^{t}=o^{t}\odot \tanh (c^{t})$

(2)长期记忆(细胞态)和候选状态

长期记忆记录了当前时刻的历史信息：
$c^{t}=f^{t}\odot c^{t-1}+i^{t}\odot \widetilde{c}^{t}$
其中， $f^{t}$ 为遗忘门， $c^{t-1}$ 为上一时刻的长期记忆， $i^{t}$ 为输入门， $\widetilde{c}^{t}$ 为候选态，表示在本时间段归纳出的新知识：
$\widetilde{c}^{t}=\tanh (\mathbf{W}_{c}x^{t}+\mathbf{U}_{c}h^{t-1}+\mathbf{b}_{c})$

(3)输入门、遗忘门、输出门

三个都是当前时刻的输入特征 $x^{t}$ 和上个时刻的短期记忆 $h^{t-1}$ 的函数。
遗忘门通过sigmod函数，将上一层隐藏状态 $h^{t-1}$ 和本层输入 $x^{t}$ 映射到[0,1]，表示上一层的长期记忆 $c^{t-1}$ 需要遗忘多少信息：
$f^{t}=sigmoid (\mathbf{W}_{f}x^{t}+\mathbf{U}_{f}h^{t-1}+\mathbf{b}_{f})$
输入门 $i^{t}$ 控制当前候选状态 $\widetilde{c}^{t}$ 有多少信息需要保存:
$i^{t}=sigmoid (\mathbf{W}_{i}x^{t}+\mathbf{U}_{i}h^{t-1}+\mathbf{b}_{i})$
输出门 $o^{t}$ 控制当前长期记忆 $c^{t}$ 有多少信息需要传递给短期记忆 $h^{t}$ :
$o^{t}=sigmoid (\mathbf{W}_{o}x^{t}+\mathbf{U}_{o}h^{t-1}+\mathbf{b}_{o})$

3.LSTM流程

(1)根据上一时间戳的短期记忆 $h^{t-1}$ 和当前时间戳的输入 $x^{t}$ ，计算出三个门和候选状态：
$f^{t}=sigmoid (\mathbf{W}_{f}x^{t}+\mathbf{U}_{f}h^{t-1}+\mathbf{b}_{f})\\ i^{t}=sigmoid (\mathbf{W}_{i}x^{t}+\mathbf{U}_{i}h^{t-1}+\mathbf{b}_{i})\\ o^{t}=sigmoid (\mathbf{W}_{o}x^{t}+\mathbf{U}_{o}h^{t-1}+\mathbf{b}_{o})\\ \widetilde{c}^{t}=\tanh (\mathbf{W}_{c}x^{t}+\mathbf{U}_{c}h^{t-1}+\mathbf{b}_{c})$
(2)求产期记忆：
$c^{t}=f^{t}\odot c^{t-1}+i^{t}\odot \widetilde{c}^{t}$
(3)更新隐藏状态
$h^{t}=o^{t}\odot \tanh (c^{t})$
(4)当前时刻输出
$\widehat{y}^{t}=\sigma (\mathbf{V}h^{t}+\mathbf{c})$
(5)反向传播，利用梯度下降等优化方法更新参数矩阵和偏置。

4.keras+LSTM实现股票预测

# 导入依赖包
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from tensorflow.keras.layers import Dense,Dropout,LSTM
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_absolute_error,mean_squared_error

#读取数据
maotai = pd.read_csv('./SH600519.csv')
training_set = maotai.iloc[0:2126,2:3].values
test_set = maotai.iloc[2126:,2:3].values
print(training_set.shape,test_set.shape)

#归一化
sc = MinMaxScaler(feature_range=(0,1))
training_set = sc.fit_transform(training_set)
test_set = sc.fit_transform(test_set)

#划分训练数据和测试数据
x_train,y_train,x_test,y_test=[],[],[],[]
for i in range(60,len(training_set)):
    x_train.append(training_set[i-60:i,0])
    y_train.append(training_set[i,0])
np.random.seed(7)
np.random.shuffle(x_train)
np.random.seed(7)
np.random.shuffle(y_train)
tf.random.set_seed(7)
x_train,y_train = np.array(x_train),np.array(y_train)
x_train = np.reshape(x_train, (x_train.shape[0], 60, 1))
for i in range(60, len(test_set)):
    x_test.append(test_set[i - 60:i, 0])
    y_test.append(test_set[i, 0])
x_test, y_test = np.array(x_test), np.array(y_test)
x_test = np.reshape(x_test, (x_test.shape[0], 60, 1))

#搭建网络
model = tf.keras.Sequential([
    LSTM(80,return_sequences=True),
    Dropout(0.2),
    LSTM(100),
    Dropout(0.2),
    Dense(1)
])

#配置网络
model.compile(optimizer=tf.keras.optimizers.Adam(0.001),
              loss='mean_squared_error')

#开始训练
history = model.fit(x_train, y_train, batch_size=64, epochs=50, 
                    validation_data=(x_test, y_test), validation_freq=1)

#loss曲线
loss =  history.history['loss']
val_loss = history.history['val_loss']
plt.plot(loss,label='Training Loss')
plt.plot(val_loss,label='Validation Loss')
plt.legend()
plt.title('Loss')
plt.show()

在这里插入图片描述

#预测结果与真实值比较
predict_price = model.predict(x_test)
predict_price = sc.inverse_transform(predict_price)
real_price = sc.inverse_transform(test_set[60:])
plt.plot(real_price, color='red', label='MaoTai Stock Price')
plt.plot(predict_price, color='blue', label='Predicted MaoTai Stock Price')
plt.title('MaoTai Stock Price Prediction')
plt.xlabel('Time')
plt.ylabel('MaoTai Stock Price')
plt.legend()
plt.show()

在这里插入图片描述

stackooooover

关注

1
点赞
踩
11

收藏

觉得还不错? 一键收藏
0
评论
四十四.长短期记忆网络(LSTM)过程和keras实现股票预测

目录1.概述2.LSTM结构(1)短期记忆(2)长期记忆(细胞态)(3)输入门、遗忘门、输出门1.概述传统循环网络RNN可以通过记忆体实现短期记忆进行连续数据的预测。但是，当连续数据的序列变长时，会使展开时间步过长，在反向传播更新参数的过程中，梯度要按时间步连续相乘，会导致梯度消失或者梯度爆炸。LSTM是RNN的变体，通过门结构，有效的解决了梯度爆炸或者梯度消失问题。LSTM在RNN的基础上引入了三个门结构和记录长期记忆的细胞态以及归纳出新知识的候选态。2.LSTM结构(1)短期记忆短期记忆
复制链接

扫一扫