使用LSTM模型进行股票价格预测

最新推荐文章于 2024-06-09 10:45:12 发布

练习两年半的工程师

最新推荐文章于 2024-06-09 10:45:12 发布

阅读量2.2k

点赞数 2

本文链接：https://blog.csdn.net/weixin_57266891/article/details/131502820

版权

Python 专栏收录该内容

47 篇文章 1 订阅

订阅专栏

这篇文章是关于如何使用一种循环神经网络LSTM预测苹果股票价格。（注意：本文仅供学习，切勿作为投资参考）

本文的运行环境为python notebook

下载yfinance来使用雅虎上的市场数据。

%pip install yfinance

引入所需的package。

import math
from pandas_datareader import data as pdr
import numpy as np
import pandas as pd
import yfinance as yfin
import datetime as dt
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, LSTM
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

提取苹果(AAPL)由2013年7月1日到现在的股票价格数据。

yfin.pdr_override()

df = pdr.get_data_yahoo('AAPL', start='2013-07-01', end=dt.datetime.today())

print(df)

输出如下：
在这里插入图片描述
我们只对收盘价感兴趣，先将收盘价历史数据可视化。

#Visualize the closing price history
plt.figure(figsize=(16,8))
plt.title('Closing Price History')
plt.plot(df['Close'])
plt.xlabel('Date', fontsize=18)
plt.ylabel('Close Price USD ($)', fontsize=18)
plt.show()

输出如下：在这里插入图片描述

将收盘价提取出来并转换成一个numpy array。我们把数据的80%用作training data。

#Create a new dataframe with only the 'Close' column
data = df.filter(['Close'])

#Convert the dataframe to a numpy array
dataset = data.values

#Get the number of rows to train the model on
training_data_len = math.ceil(len(dataset)*0.8)

training_data_len

使用MinMaxScaler处理数据，将所有数据normalize成0到1之间。

#Scale the data
scaler = MinMaxScaler(feature_range=(0,1))
scaled_data = scaler.fit_transform(dataset)

scaled_data

数据变为0到1之间：
在这里插入图片描述
接下来，建立training dataset。这里我们把window size设定为60，model会用60天的数据来预测第61天的数据。

#Create the training data set
#Create the scaled training data set
train_data = scaled_data[0:training_data_len, :]

#Split the data into x_train and y_train data sets
x_train = []
y_train = []

for i in range(60, len(train_data)):
    x_train.append(train_data[i-60:i, 0])
    y_train.append(train_data[i, 0])
    if i<=61:
        print(x_train)
        print(y_train)
        print()

这里的输入如下。我们在for loop的第一步，将数据集的前60个数据作为training data的第一组数据，并将第61个数据作为对应的y。
在这里插入图片描述

下一步我们把数据类型转换为numpy array，并改变数据的shape，令它符合LSTM模型的输入要求。

# Convert the x_train and y_train to numpy arrays
x_train, y_train = np.array(x_train), np.array(y_train)

#Reshape the data
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
x_train.shape

数据处理好后，我们可以开始构建模型。

#Build the LSTM model
model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape=(x_train.shape[1], 1)))
model.add(LSTM(50, return_sequences=False))
model.add(Dense(25))
model.add(Dense(1))

开始训练。

#Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

#Train the model
model.fit(x_train, y_train, batch_size=1, epochs=1)

训练完毕后，我们开始构建testing data用于评估模型的效能。这里的x还是以60个为一组。

#Create the testing data set
#Create a new array containing scaled values from index 1955 to 2518
test_data = scaled_data[training_data_len - 60:, :]

#Create the data sets x_test and y_test
x_test = []
y_test = dataset[training_data_len:, :]
for i in range(60, len(test_data)):
    x_test.append(test_data[i-60:i, 0])

还是一样，先将test数据转换成numpy array，再改变数据的shape，令它可以被输入到model中。

#Convert the data to a numpy array
x_test = np.array(x_test)

#Reshape the data
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1))

使用模型得到predictions，并将predictions由的0到1的范围变为原来的范围。

#Get the models predicted price values
predictions = model.predict(x_test)
predictions = scaler.inverse_transform(predictions)

使用均方根误差(RMSE)评估模型，RMSE越小，model的表现越好。

#Get the root mean squared error (RMSE)
from sklearn.metrics import mean_squared_error
import math

MSE = mean_squared_error(y_test, predictions)
RMSE = math.sqrt(MSE)

RMSE

绘画出原数据和预测值的曲线。

#Plot the data
train = data[:training_data_len]
valid = data[training_data_len:]
valid['Predictions'] = predictions

#Visualize the data
plt.figure(figsize=(16,8))
plt.title('Model')
plt.xlabel('Date', fontsize=18)
plt.ylabel('Close Price USD ($)', fontsize=18)
plt.plot(train['Close'])
plt.plot(valid[['Close', 'Predictions']])
plt.legend(['Train', 'Val', 'Predictions'], loc='lower right')
plt.show()

输出如下：
在这里插入图片描述
我们可以看看真实数据和predictions之间的差别。

#Show the valid and predicted prices
valid

在这里插入图片描述

我们可以使用一支股票最后60天的数据来预测未来一天的股价。

#Get the quote
apple_quote = pdr.get_data_yahoo('AAPL', start='2013-07-01', end=dt.datetime.today())

#Create a new dataframe
new_df = apple_quote.filter(['Close'])

#Get the last 60 day closing price values and convert the dataframe to an array
last_60_days = new_df[-60:].values

#Scale the data to be values betweem 0 and 1
last_60_days_scaled = scaler.transform(last_60_days)

#Create an empty list
X_test = []

#Append the past 60 days
X_test.append(last_60_days_scaled)

#Convert the X_test data set to a numpy array
X_test = np.array(X_test)

#Reshape the data
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))

#Get the predicted scaled price
pred_price = model.predict(X_test)

#Undo the scaling
pred_price = scaler.inverse_transform(pred_price)

print(pred_price)

练习两年半的工程师

关注

2
点赞
踩
43

收藏

觉得还不错? 一键收藏
3
评论
使用LSTM模型进行股票价格预测

我们在for loop的第一步，将数据集的前60个数据作为training data的第一组数据，并将第61个数据作为对应的y。这里我们把window size设定为60，model会用60天的数据来预测第61天的数据。还是一样，先将test数据转换成numpy array，再改变数据的shape，令它可以被输入到model中。下一步我们把数据类型转换为numpy array，并改变数据的shape，令它符合LSTM模型的输入要求。我们可以使用一支股票最后60天的数据来预测未来一天的股价。
复制链接

扫一扫