基于TensorFlow2.0利用LSTM框架进行实时预测价格

基于TensorFlow-gpu2.0,利用LSTM框架进行实时预测比特币价格

利用kaggle给的数据集,链接:https://www.kaggle.com/mczielinski/bitcoin-historical-data#coinbaseUSD_1-min_data_2014-12-01_to_2019-01-09.csv

下载数据集后,解压,利用coinbaseUSD_1-min_data_2014-12-01_to_2019-01-09.csv文件

刚开始使用cpu跑的代码,CPU使用率为100%,内存也达到了15G,所以这次把代码改良了,采用GPU(GTX 1080Ti)跑,基本CPU32%,GPU12%,内存6G

比特币的价格数据是基于时间序列的,因此比特币的价格预测大多采用LSTM模型来实现。
长期短期记忆(LSTM)是一种特别适用于时间序列数据(或具有时间 / 空间 / 结构顺序的数据,例如电影、句子等)的深度学习模型,是预测加密货币的价格走向的理想模型。

在对应的conda环境中打开jupyter notebook,新建一个ipynb文件,然后输入下面代码。

import需要使用的库

import pandas as pd
import numpy as np
import tensorflow as tf

from sklearn.preprocessing import MinMaxScaler, LabelEncoder
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout

from matplotlib import pyplot as plt
%matplotlib inline

数据加载

raw_data = pd.read_csv("coinbaseUSD_1-min_data_2014-12-01_to_2019-01-09.csv")

查看原始数据

raw_data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2099760 entries, 0 to 2099759
Data columns (total 8 columns):
 #   Column             Dtype  
---  ------             -----  
 0   Timestamp          int64  
 1   Open               float64
 2   High               float64
 3   Low                float64
 4   Close              float64
 5   Volume_(BTC)       float64
 6   Volume_(Currency)  float64
 7   Weighted_Price     float64
dtypes: float64(7), int64(1)
memory usage: 128.2 MB

在的数据一共有2099760条,数据由Timestamp、Open、High、Low、Close、Volume_(BTC)、Volume_(Currency)、Weighted_Price这几列组成。其中除去Timestamp列以外,其余的数据列都是float64数据类型。

现在查看前10行数据

raw_data.head(10)
TimestampOpenHighLowCloseVolume_(BTC)Volume_(Currency)Weighted_Price
01417411980300.0300.0300.0300.00.013.0300.0
11417412040NaNNaNNaNNaNNaNNaNNaN
21417412100NaNNaNNaNNaNNaNNaNNaN
31417412160NaNNaNNaNNaNNaNNaNNaN
41417412220NaNNaNNaNNaNNaNNaNNaN
51417412280NaNNaNNaNNaNNaNNaNNaN
61417412340NaNNaNNaNNaNNaNNaNNaN
71417412400300.0300.0300.0300.00.013.0300.0
81417412460NaNNaNNaNNaNNaNNaNNaN
91417412520NaNNaNNaNNaNNaNNaNNaN

删除包含NaN值的任何行,把处理后的数据给data

# 删除包含NaN值的任何行
data = raw_data.dropna(axis = 0)
data.head(10)
TimestampOpenHighLowCloseVolume_(BTC)Volume_(Currency)Weighted_Price
01417411980300.00300.0300.00300.00.0100003.00000300.000000
71417412400300.00300.0300.00300.00.0100003.00000300.000000
511417415040370.00370.0370.00370.00.0100003.70000370.000000
771417416600370.00370.0370.00370.00.0265569.82555370.000000
14361417498140377.00377.0377.00377.00.0100003.77000377.000000
17661417517940377.75378.0377.75378.04.0000001511.93750377.984375
17711417518240378.00378.0378.00378.04.9000001852.20000378.000000
17721417518300378.00378.0378.00378.05.2000001965.60000378.000000
22301417545780378.00378.0378.00378.00.10000037.80000378.000000
22451417546680378.00378.0378.00378.00.793600299.98080378.000000

先查看下数据是否含有nan的数据,可以看到我们的数据中没有nan的数据

data.isnull().sum()
Timestamp            0
Open                 0
High                 0
Low                  0
Close                0
Volume_(BTC)         0
Volume_(Currency)    0
Weighted_Price       0
dtype: int64

可以看出现在已经没有NaN的数据了

再查看下0数据,可以看到我们的数据中含有0值,我们需要对0值做下处理

(data == 0).astype(int).any()
Timestamp            False
Open                 False
High                 False
Low                  False
Close                False
Volume_(BTC)         False
Volume_(Currency)    False
Weighted_Price       False
dtype: bool

处理0数据的方式是使用上个列值进行前向填充

data['Weighted_Price'].replace(0, np.nan, inplace=True)
data['Weighted_Price'].fillna(method='ffill', inplace=True)
data['Open'].replace(0, np.nan, inplace=True)
data['Open'].fillna(method='ffill', inplace=True)
data['High'].replace(0, np.nan, inplace=True)
data['High'].fillna(method='ffill', inplace=True)
data['Low'].replace(0, np.nan, inplace=True)
data['Low'].fillna(method='ffill', inplace=True)
data['Close'].replace(0, np.nan, inplace=True)
data['Close'].fillna(method='ffill', inplace=True)
data['Volume_(BTC)'].replace(0, np.nan, inplace=True)
data['Volume_(BTC)'].fillna(method='ffill', inplace=True)
data['Volume_(Currency)'].replace(0, np.nan, inplace=True)
data['Volume_(Currency)'].fillna(method='ffill', inplace=True)
E:\360Anaconda\envs\tf2.1\lib\site-packages\pandas\core\generic.py:6746: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._update_inplace(new_data)
E:\360Anaconda\envs\tf2.1\lib\site-packages\pandas\core\generic.py:6245: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._update_inplace(new_data)
(data == 0).astype(int).any()
Timestamp            False
Open                 False
High                 False
Low                  False
Close                False
Volume_(BTC)         False
Volume_(Currency)    False
Weighted_Price       False
dtype: bool

再看下数据的分布跟走势,这个时候曲线已经非常的连续

plt.plot(data['Weighted_Price'], label='Price')
plt.ylabel('Price')
plt.legend()
plt.show()

在这里插入图片描述

训练数据集和测试数据集划分

将数据归一化到0-1

data_set = data.drop('Timestamp', axis=1).values
data_set = data_set.astype('float32')
mms = MinMaxScaler(feature_range=(0, 1))
data_set = mms.fit_transform(data_set)

以2:8划分测试数据集跟训练数据集

ratio = 0.8
train_size = int(len(data_set) * ratio)
test_size = len(data_set) - train_size
train, test = data_set[0:train_size,:], data_set[train_size:len(data_set),:]

创建训练数据集跟测试数据集,以1天作为窗口期来创建我们的训练数据集跟测试数据集。

def create_dataset(data):
    window = 1
    label_index = 6
    x, y = [], []
    for i in range(len(data) - window):
        x.append(data[i:(i + window), :])
        y.append(data[i + window, label_index])
    return np.array(x), np.array(y)
train_x, train_y = create_dataset(train)
test_x, test_y = create_dataset(test)

loss为平均绝对误差(Mean Absolute Error,MAE)

def create_model():
    model = Sequential()
    model.add(LSTM(50, input_shape=(train_x.shape[1], train_x.shape[2])))
    model.add(Dense(1))
    model.compile(loss='mae', optimizer='adam')
    model.summary()
    return model

model = create_model()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm (LSTM)                  (None, 50)                11600     
_________________________________________________________________
dense (Dense)                (None, 1)                 51        
=================================================================
Total params: 11,651
Trainable params: 11,651
Non-trainable params: 0
_________________________________________________________________

这里节约时间,只训练20代,利用tensorflow-gpu=2.x,其中也有Keras,使用GPU训练,会更快

history = model.fit(train_x, train_y, epochs=20, batch_size=64, validation_data=(test_x, test_y), verbose=1, shuffle=False)
Epoch 1/20
24884/24884 [==============================] - 76s 3ms/step - loss: 0.0017 - val_loss: 0.0303
Epoch 2/20
24884/24884 [==============================] - 76s 3ms/step - loss: 0.0014 - val_loss: 0.0188
Epoch 3/20
24884/24884 [==============================] - 76s 3ms/step - loss: 0.0011 - val_loss: 0.0135
Epoch 4/20
24884/24884 [==============================] - 76s 3ms/step - loss: 0.0012 - val_loss: 0.0145
Epoch 5/20
24884/24884 [==============================] - 76s 3ms/step - loss: 0.0011 - val_loss: 0.0128
Epoch 6/20
24884/24884 [==============================] - 77s 3ms/step - loss: 0.0011 - val_loss: 0.0136
Epoch 7/20
24884/24884 [==============================] - 77s 3ms/step - loss: 0.0011 - val_loss: 0.0135
Epoch 8/20
24884/24884 [==============================] - 76s 3ms/step - loss: 9.6527e-04 - val_loss: 0.0102
Epoch 9/20
24884/24884 [==============================] - 76s 3ms/step - loss: 8.4701e-04 - val_loss: 0.0083
Epoch 10/20
24884/24884 [==============================] - 76s 3ms/step - loss: 7.4637e-04 - val_loss: 0.0066
Epoch 11/20
24884/24884 [==============================] - 76s 3ms/step - loss: 6.7190e-04 - val_loss: 0.0059
Epoch 12/20
24884/24884 [==============================] - 76s 3ms/step - loss: 5.7592e-04 - val_loss: 0.0050
Epoch 13/20
24884/24884 [==============================] - 76s 3ms/step - loss: 5.3660e-04 - val_loss: 0.0053
Epoch 14/20
24884/24884 [==============================] - 76s 3ms/step - loss: 5.3742e-04 - val_loss: 0.0050
Epoch 15/20
24884/24884 [==============================] - 76s 3ms/step - loss: 5.2245e-04 - val_loss: 0.0053
Epoch 16/20
24884/24884 [==============================] - 76s 3ms/step - loss: 4.8314e-04 - val_loss: 0.0046
Epoch 17/20
24884/24884 [==============================] - 76s 3ms/step - loss: 4.8415e-04 - val_loss: 0.0054
Epoch 18/20
24884/24884 [==============================] - 76s 3ms/step - loss: 4.7891e-04 - val_loss: 0.0053
Epoch 19/20
24884/24884 [==============================] - 77s 3ms/step - loss: 4.6439e-04 - val_loss: 0.0048
Epoch 20/20
24884/24884 [==============================] - 76s 3ms/step - loss: 4.4422e-04 - val_loss: 0.0044
plt.plot(history.history['loss'], label='train')
plt.plot(history.history['val_loss'], label='test')
plt.legend()
plt.show()

在这里插入图片描述

train_x, train_y = create_dataset(train)
test_x, test_y = create_dataset(test)

预测

predict = model.predict(test_x)
plt.plot(predict, label='predict')
plt.plot(test_y, label='ground true')
plt.legend()
plt.show()

在这里插入图片描述

这只是作为数据分析的一个学习例子使用。
代码放在我的码云里,链接https://gitee.com/rengarwang/LSTM-forecast-price

  • 9
    点赞
  • 88
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值